DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 12 and 13 objected to because of the following informalities:  both claims recite “one or features of the user’s voice,” but should recite “one or more features of the user’s voice.”  Appropriate correction is required.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 14 and 19, and therefore, claims 2-13, 15-18 and 20 which depend therefrom, are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Specifically, claims 1, 4 and 19 all recite the limitation "one or more microphones" in two separate portions of the claims, however the second reference 
Further, regarding claims 5 and 18, and claim 6 which depends from claim 5, line 1 of claims 5 and 18 recite “the monitoring.” There is no antecedent basis for this limitation, and it is unclear and indefinite to what other possible earlier recited limitation “the monitoring” is intended to refer to.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.

3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-7, and 9-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lesso et al., (US 2015/0039303 A1, herein “Lesso”) further in view of Finkelstein et al., (US 2018/0293221 A1, herein “Finkelstein”).
Regarding claim 1, Lesso teaches a method implemented using one or more processors, comprising (Lesso paras. [0049]-[0050], voice recognition process where paras. [0058]-[0062] teach the voice recognition functionality being performed in a multi-phase process using two or more processors): 
executing in an inactive listening state at least in part on a computing device operated by a user (Lesso paras. [0047], [0050]-[0051] and [0058]-[0059], on a mobile telephone with a processor (computing device), phase 1 processing of a signal received from a microphone operates in an always on fashion, but performs only signal activity detection and not trigger phase detection, consuming very low power, thus inactive but listening for a signal activity); 
while in the inactive listening state, detecting an event that triggers transition of from the inactive listening state into an active listening state (Lesso paras. [0051]-[0055], while in phase 1, it is determined whether the received signal contains speech which triggers a transition to phase 2, then phase 3 and eventually a speech recognition enabling stage (any of the subsequent phases after phase 1 considered as an “event” triggered by the determination of phase 1)), wherein first data indicative of sound Lesso paras. [0072]-[0073], buffer 110 continually operates to store the last 2-10 seconds of received signal (thus only temporarily stored) for use in the phase 1 determination, where fig. 5b and para. [0080] give an example of the incoming speech from a microphone including ambient noise appearing before a trigger phrase (pre-event input stream), and sentences appearing after the trigger phrase, and where figs. 2 and 5a, and paras. [0039] and [0065]-[0066] teach the buffer being part of the mobile telephone); 
obtaining, as a post-event input stream, second data indicative of sound captured by one or more of the microphones after detecting the event (Lesso para. [0129], when the trigger phrase detection occurs and it is determined that the trigger phrase is spoken by an authorized user, a signal DNR out from a noise reduction block buffer is provided to the speech recognition engine (obtaining post-event input stream), where fig. 7k and para. [0149] teach the output of the noise reduction block buffer includes sound data from after the trigger phrase (post-event input stream)); 
while in the active listening state, determining post-event input streams for triggering responsive action (Lesso paras. [0119]-[0120], once all the phases of processing have been enabled and input speech is sent to the speech recognition engine (thus in the active listening state), the speech recognition engine identifies commands (to trigger a responsive action such as placing a phone call) spoken by the user after speaking the trigger phrase, where the commands are acted upon (triggering a responsive action) to control an aspect of the operation of the mobile telephone); and 
Lesso para. [0119], the commands are acted upon for control purposes to control an aspect (responsive action) of the operation of the mobile telephone).
Although Lesso discloses an applications processor on a cellphone that handles various processes involving using the cell phone to play music or generate satellite navigation, Lesso does not explicitly teach “an automated assistant,” or “by the automated assistant.”
Lesso further does not explicitly teach respective measures of fitness of the pre-event and post-event input streams.
Lesso still further does not explicitly teach based on the respective measures of fitness, selecting the pre-event input stream or post-event input stream.
Finklestein teaches an automated assistant and by the automated assistant (Finklestein para. [0032], all in one assistant device which is an intelligent assistant system).
Finklestein further teaches respective measures of fitness of the pre-event and post-event input streams (Finklestein paras. [0058], [0062], [0067], and paras. [0216]-[0219], the intent of a user in a spoken utterance is determined to be either a command to be immediately executed or a commitment, and this intent determination generates a probable intent from the user utterance which represents the fitness of a pre-event to an intent of an command to be immediately executed or to be executed after a trigger, where for an add commitment intent, a user utters a statement before an event (pre-event), and then a command is executed per the user statement once a triggering event occurs, where the commitment engine stores the commitments and determines whether context information which includes user input (post-event input stream) and then calculates a cost function (fitness) to determine whether to perform the action based on the content information). 
Finklestein still further teaches based on the respective measures of fitness, selecting the pre-event input stream or post-event input stream (Finklestein paras. [0219] and [0256] at a contextually appropriate time, when the cost function of the commitment manager determines (based on respective measures of fitness) that an action should be executed, then the commitment command stored will execute, and where the context information includes the user input).
Therefore, taking the teachings of Lesso and Finklestein together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and operations of Lesso with the commitment engine operations specifically cited to above in Finklestein at least because doing so would enable natural user interface experiences to provide positive user experiences (Finklestein paras. [0002] and [0029]).
Regarding claims 2 and 15, Lesso teaches further comprising monitoring sound captured by one or more of the microphones for one or more hot words (Lesso paras. [0050], [0102] and [0116]-[0117] phase three processing including validating that a trigger phrase (one or more hot words) was spoken by an authorized user of the device by operating on the received signal section t3-t0 from a selected microphone), wherein the event that triggers transition of the automated assistant from the inactive listening state into the active listening state comprises detection of one or more of the hot words during the monitoring (Lesso paras. [0054]-[0056], fig. 4, process moves to step 86  where it is determined that the trigger phrase is spoken by a user with whom the trigger phrase is associated, and when it is determined, then the speech recognition engine is enabled (active listening, since subsequent speech is now passed to the speech recognition engine rather than the system continuing the simple monitoring state (inactive listening))).
Regarding claims 3 and 16, Lesso teaches further comprising performing speaker recognition processing on at least a portion of the sound captured by the one or more microphones to determine an identity of the user (Lesso paras. [0050], [0054] and [0102], fig. 4, in phase three processing, the system determines whether the signal containing speech, coming from the microphones, was spoken by a user with whom the trigger phrase is associated, where each user has their own trigger phrase), wherein transition of the automated assistant from the inactive listening state into the active listening state is further triggered in response to a determination that the identity of the user satisfies one or more criteria (Lesso paras. [0054]-[0056], fig. 4, process moves to step 86  and when it is determined that the trigger phrase is spoken by a user with whom the trigger phrase is associated (satisfying a criteria of the phrase matching the user speech), then the speech recognition engine is enabled (active listening, since subsequent speech is now passed to the speech recognition engine rather than the system continuing the simple monitoring state (inactive listening))).
Regarding claims 4 and 17, Lesso teaches wherein the one or more criteria include the identity of the user matching an owner of the computing device (Lesso paras. [0054], [0116]-[0119], the system determines whether the trigger phrase is being spoken by an authorized user of the device, where the authorized user speech detection is trained by the user speaking a sample trigger phrase that is used later and compared to a current user input speech to make the determination as to whether the speech comes from a particular user, where multiple users have their own associated trigger phrase).
Although Lesso states that an authorized user is detected and not explicitly that “the owner” is detected, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the user training system of Lesso to include an owner as an authorized user at least because it is a known technique for an owner of a device to have their own configurations/privileges set on their own device. Therefore, such a modification would be Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results. See MPEP 2143(I)(D).
Regarding claims 5 and 18, Lesso teaches wherein the monitoring is performed using a digital signal processor integral with the computing device operated by the user (Lesso paras. [0047]-[0048], a digital signal processor performs the key phrase detection (monitoring)).
Regarding claim 6, Lesso teaches wherein the digital signal processor is integral with a digital signal processing chip, and wherein the memory buffer comprises a digital signal processing buffer onboard the digital signal processing chip (Lesso paras. [0048] and [0062], and [0127], first and second phase processing (which includes the key phrase detection performed by the DSP) is performed on one integrated circuit (chip) and where the buffer 110 is provided in the integrated circuit).
Regarding claim 7, Lesso does not explicitly teach the limitations of claim 7.
Finklestein paras. [0120], [0123]-[0124], and [0057]-[0061], parser and intent handler analyzes utterances and determines that a spoken phrase contains a trigger (hot word) consisting of the word “he” (pronoun) and that a resolver module is able to determine (resolving) that the “he” is referring to a male named Justin from a previously established commitment that is comprised of word that fill a template (tokens), the words having been extracted from the commitment input (extracted from the selected input stream)).
Therefore, taking the teachings of Lesso and Finklestein together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and operations of Lesso with the speech recognition and commitment engine operations specifically cited to above in Finklestein at least because doing so would enable natural user interface experiences to provide positive user experiences (Finklestein paras. [0002] and [0029]).
Regarding claim 9, Lesso teaches wherein the memory buffer is configured to temporarily store audio data captured over a predetermined time interval (Lesso paras. [0072]-[0073], the buffer stores signals representing a period of sound between 2-10 seconds, but the buffer can be sized (predetermined time period) to store the required period of time for signals, and where the signal is stored in a first-in-first-out (temporary) basis).
Regarding claim 10, Lesso does not explicitly teach the limitations of claim 10.
Finklestein paras. [0044], [0052]-[0053], [0057]-[0060], [0218], speech recognition on the audio input (including both speech from when the commitment is established, and audio (context information) when a commitment is triggered and executed) generating confidence values (similarity measures) for the statistical likelihood that that the corresponding recognized text is accurate, where the text is parsed to see if it matches with an action for the system to execute (thus a known command syntax) when a trigger is detected).
Therefore, taking the teachings of Lesso and Finklestein together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and operations of Lesso with the speech recognition and commitment engine operations specifically cited to above in Finklestein at least because doing so would enable natural user interface experiences to provide positive user experiences (Finklestein paras. [0002] and [0029]).

Regarding claim 11, Lesso does not explicitly teach the limitations of claim 11.
Finklestein teaches wherein determining the respective measures of fitness includes analyzing one or more features of the user's voice in one or both of the pre-event and post-event input streams (Finklestein paras. [0044]-[0046], to arrive at the confidence scores generated by the speech recognition, first the audio input is processed by a feature extractor that extracts feature vectors representing the qualities of a spoken utterance, and then comparing the feature vectors to acoustic models for speech sounds).
Therefore, taking the teachings of Lesso and Finklestein together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and operations of Lesso with the feature extraction and comparison operations specifically cited to above in Finklestein at least because doing so would enable natural user interface experiences to provide positive user experiences (Finklestein paras. [0002] and [0029]).
Regarding claim 12, Lesso does not explicitly teach the limitations of claim 12.
Finklestein teaches wherein the analyzing includes comparing the one or features of the user's voice to one or more voice features observed from utterances containing commands for automated assistants (Finklestein paras. [0044]-[0047], and [0057]-[0058], extracted feature vectors of the input audio in the form of natural language speech is matched to (comparing) acoustic models for particular words or phrases that correspond to an intent for actions (commands) to be executed by the intelligent assistant).
Therefore, taking the teachings of Lesso and Finklestein together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and operations of Lesso with the feature extraction and comparison operations specifically cited to above in Finklestein at least because doing so would enable natural user interface experiences to provide positive user experiences (Finklestein paras. [0002] and [0029]).
Regarding claim 13, Lesso does not explicitly teach the limitations of claim 13.
Finklestein paras. [0044], [0048]-[0050], [0053], confidence values are determined using machine learning techniques, where the speech recognition engine uses Hidden Markov Models to match the feature vectors of the natural language speech audio input with speech components), wherein the machine learning model is trained to generate output indicative of whether the input is directed to an automated assistant (Finklestein paras. [0047] and [0057]-[0058], speech recognition uses the feature vectors to identify words and phrases that match the spoken sounds of the audio data, which is later used to determine (thus indicative of) whether there is a match in the words and phrases that comprise actions to be performed by the intelligent assistant).
Therefore, taking the teachings of Lesso and Finklestein together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and operations of Lesso with the feature extraction and speech recognition operations specifically cited to above in Finklestein at least because doing so would enable natural user interface experiences to provide positive user experiences (Finklestein paras. [0002] and [0029]).
Regarding claim 14, Lesso teaches a system comprising one or more processors and memory storing instructions that, in response to execution of the instructions by the one or more processors, cause the one or more processors to perform the following operations: (Lesso paras. [0049]-[0050], and [0065] speech recognition system and functionality, the speech recognition system provided by computer-readable code stored on non-transitory medium for running on any suitable computational resources for providing the required function, distributed amongst 
integrated circuits, for example, in one embodiment based around the architecture shown in FIG. 2 ): 
executing in an inactive listening state at least in part on a computing device operated by a user (Lesso paras. [0047], [0050]-[0051] and [0058]-[0059], on a mobile telephone with a processor (computing device), phase 1 processing of a signal received from a microphone operates in an always on fashion, but performs only signal activity detection and not trigger phase detection, consuming very low power, thus inactive but listening for a signal activity); 
while in the inactive listening state, detecting an event that triggers transition of from the inactive listening state into an active listening state (Lesso paras. [0051]-[0055], while in phase 1, it is determined whether the received signal contains speech which triggers a transition to phase 2, then phase 3 and eventually a speech recognition enabling stage (any of the subsequent phases after phase 1 considered as an “event” triggered by the determination of phase 1)), wherein first data indicative of sound captured by one or more microphones prior to the event is temporarily stored, as a pre-event input stream, in a memory buffer of the computing device operated by the user (Lesso paras. [0072]-[0073], buffer 110 continually operates to store the last 2-10 seconds of received signal (thus only temporarily stored) for use in the phase 1 determination, where fig. 5b and para. [0080] give an example of the incoming speech from a microphone including ambient noise appearing before a trigger phrase (pre-event input stream), and sentences appearing after the trigger phrase, and where figs. 2 and 5a, and paras. [0039] and [0065]-[0066] teach the buffer being part of the mobile telephone); 
obtaining, as a post-event input stream, second data indicative of sound captured by one or more of the microphones after detecting the event (Lesso para. [0129], when the trigger phrase detection occurs and it is determined that the trigger phrase is spoken by an authorized user, a signal DNR out from a noise reduction block buffer is provided to the speech recognition engine (obtaining post-event input stream), where fig. 7k and para. [0149] teach the output of the noise reduction block buffer includes sound data from after the trigger phrase (post-event input stream)); 
while in the active listening state, determining post-event input streams for triggering responsive action (Lesso paras. [0119]-[0120], once all the phases of processing have been enabled and input speech is sent to the speech recognition engine (thus in the active listening state), the speech recognition engine identifies commands (to trigger a responsive action such as placing a phone call) spoken by the user after speaking the trigger phrase, where the commands are acted upon (triggering a responsive action) to control an aspect of the operation of the mobile telephone); and 
causing to perform one or more responsive actions based on the selected input stream (Lesso para. [0119], the commands are acted upon for control purposes to control an aspect (responsive action) of the operation of the mobile telephone).
Although Lesso discloses an applications processor on a cellphone that handles various processes involving using the cell phone to play music or generate satellite navigation, Lesso does not explicitly teach “an automated assistant,” or “by the automated assistant.”

Lesso still further does not explicitly teach based on the respective measures of fitness, selecting the pre-event input stream or post-event input stream.
Finklestein teaches an automated assistant and by the automated assistant (Finklestein para. [0032], all in one assistant device which is an intelligent assistant system).
Finklestein further teaches respective measures of fitness of the pre-event and post-event input streams (Finklestein paras. [0058], [0062], [0067], and paras. [0216]-[0219], the intent of a user in a spoken utterance is determined to be either a command to be immediately executed or a commitment, and this intent determination generates a probable intent from the user utterance which represents the fitness of a pre-event to an intent of an command to be immediately executed or to be executed after a trigger, where for an add commitment intent, a user utters a statement before an event (pre-event), and then a command is executed per the user statement once a triggering event occurs, where the commitment engine stores the commitments and determines whether context information which includes user input (post-event input stream) and then calculates a cost function (fitness) to determine whether to perform the action based on the content information). 
Finklestein still further teaches based on the respective measures of fitness, selecting the pre-event input stream or post-event input stream (Finklestein paras. [0219] and [0256] at a contextually appropriate time, when the cost function of the commitment manager determines (based on respective measures of fitness) that an action should be executed, then the commitment command stored will execute, and where the context information includes the user input).
Therefore, taking the teachings of Lesso and Finklestein together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and operations of Lesso with the commitment engine operations specifically cited to above in Finklestein at least because doing so would enable natural user interface experiences to provide positive user experiences (Finklestein paras. [0002] and [0029]).
Regarding claim 19, Lesso teaches At least one non-transitory computer-readable medium comprising instructions that, in response to execution of the instructions by one or more processors, cause the one or more processors to perform the following operations: (Lesso paras. [0049]-[0050], and [0065] speech recognition system and functionality, the speech recognition system provided by computer-readable code stored on non-transitory medium for running on any suitable computational resources for providing the required function, distributed amongst 
integrated circuits, for example, in one embodiment based around the architecture shown in FIG. 2 ): 
executing in an inactive listening state at least in part on a computing device operated by a user (Lesso paras. [0047], [0050]-[0051] and [0058]-[0059], on a mobile telephone with a processor (computing device), phase 1 processing of a signal received from a microphone operates in an always on fashion, but performs only signal activity detection and not trigger phase detection, consuming very low power, thus inactive but listening for a signal activity); 
Lesso paras. [0051]-[0055], while in phase 1, it is determined whether the received signal contains speech which triggers a transition to phase 2, then phase 3 and eventually a speech recognition enabling stage (any of the subsequent phases after phase 1 considered as an “event” triggered by the determination of phase 1)), wherein first data indicative of sound captured by one or more microphones prior to the event is temporarily stored, as a pre-event input stream, in a memory buffer of the computing device operated by the user (Lesso paras. [0072]-[0073], buffer 110 continually operates to store the last 2-10 seconds of received signal (thus only temporarily stored) for use in the phase 1 determination, where fig. 5b and para. [0080] give an example of the incoming speech from a microphone including ambient noise appearing before a trigger phrase (pre-event input stream), and sentences appearing after the trigger phrase, and where figs. 2 and 5a, and paras. [0039] and [0065]-[0066] teach the buffer being part of the mobile telephone); 
obtaining, as a post-event input stream, second data indicative of sound captured by one or more of the microphones after detecting the event (Lesso para. [0129], when the trigger phrase detection occurs and it is determined that the trigger phrase is spoken by an authorized user, a signal DNR out from a noise reduction block buffer is provided to the speech recognition engine (obtaining post-event input stream), where fig. 7k and para. [0149] teach the output of the noise reduction block buffer includes sound data from after the trigger phrase (post-event input stream)); 
Lesso paras. [0119]-[0120], once all the phases of processing have been enabled and input speech is sent to the speech recognition engine (thus in the active listening state), the speech recognition engine identifies commands (to trigger a responsive action such as placing a phone call) spoken by the user after speaking the trigger phrase, where the commands are acted upon (triggering a responsive action) to control an aspect of the operation of the mobile telephone); and 
causing to perform one or more responsive actions based on the selected input stream (Lesso para. [0119], the commands are acted upon for control purposes to control an aspect (responsive action) of the operation of the mobile telephone).
Although Lesso discloses an applications processor on a cellphone that handles various processes involving using the cell phone to play music or generate satellite navigation, Lesso does not explicitly teach “an automated assistant,” or “by the automated assistant.”
Lesso further does not explicitly teach respective measures of fitness of the pre-event and post-event input streams.
Lesso still further does not explicitly teach based on the respective measures of fitness, selecting the pre-event input stream or post-event input stream.
Finklestein teaches an automated assistant and by the automated assistant (Finklestein para. [0032], all in one assistant device which is an intelligent assistant system).
Finklestein further teaches respective measures of fitness of the pre-event and post-event input streams (Finklestein paras. [0058], [0062], [0067], and paras. [0216]-[0219], the intent of a user in a spoken utterance is determined to be either a command to be immediately executed or a commitment, and this intent determination generates a probable intent from the user utterance which represents the fitness of a pre-event to an intent of an command to be immediately executed or to be executed after a trigger, where for an add commitment intent, a user utters a statement before an event (pre-event), and then a command is executed per the user statement once a triggering event occurs, where the commitment engine stores the commitments and determines whether context information which includes user input (post-event input stream) and then calculates a cost function (fitness) to determine whether to perform the action based on the content information). 
Finklestein still further teaches based on the respective measures of fitness, selecting the pre-event input stream or post-event input stream (Finklestein paras. [0219] and [0256] at a contextually appropriate time, when the cost function of the commitment manager determines (based on respective measures of fitness) that an action should be executed, then the commitment command stored will execute, and where the context information includes the user input).
Therefore, taking the teachings of Lesso and Finklestein together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and operations of Lesso with the commitment engine operations specifically cited to above in Finklestein at least because doing so would enable natural user interface experiences to provide positive user experiences (Finklestein paras. [0002] and [0029]).
Regarding claim 20, Lesso teaches further comprising instructions for (Lesso para. [0065], the speech recognition system provided by computer-readable code stored on non-transitory medium) monitoring sound captured by one or more of the microphones for one or more hot words (Lesso paras. [0050], [0102] and [0116]-[0117] phase three processing including validating that a trigger phrase (one or more hot words) was spoken by an authorized user of the device by operating on the received signal section t3-t0 from a selected microphone), wherein the event that triggers transition of the automated assistant from the inactive listening state into the active listening state comprises detection of one or more of the hot words during the monitoring (Lesso paras. [0054]-[0056], fig. 4, process moves to step 86  where it is determined that the trigger phrase is spoken by a user with whom the trigger phrase is associated, and when it is determined, then the speech recognition engine is enabled (active listening, since subsequent speech is now passed to the speech recognition engine rather than the system continuing the simple monitoring state (inactive listening))).
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Lesso in view of Finkelstein, as set forth above regarding claim 1 from which claim 8 depends, further in view of Johnston, “Creating a Circular Buffer in C and C++,” EmbeddedArtistry.com Blog, May 17, 2017 (available at https://embeddedartistry.com/blog/2017/05/17/creating-a-circular-buffer-in-c-and-c/) (herein “Johnston NPL”).
Regarding claim 8, Lesso teaches wherein the memory buffer comprises a memory buffer that is periodically overwritten by newly captured audio data (Lesso paras. [0072]-[0073], buffer 110 continually operates to store the last 2-10 seconds of received signal (thus overwritten every 2-10 seconds (periodically)) for use in the phase 1 determination).
Lesso does not explicitly teach the memory is circular. 
Johnston NPL teaches a circular memory buffer (Johnston NPL page 2, circular buffers as fixed sized queues where the most recent data is always available).
Therefore, taking the teachings of Lesso and Johnston NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and operations of Lesso with the circular buffer specifically cited to above in Johnston NPL at least because doing so would allow for data consumption and production to happen at different rates, and ensure that the system is always consuming the most recent data (Johnston NPL page 2).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Stevans et al., US 2018/0108343 A1, directed towards a plurality of wake-up phases, including configuration of an acoustic and language model for a virtual assistant that can invoke a particular user profile.
Li et al., US 2019/0206412 A1, directed towards a system and method for controlling a home assistant device that can search for a predefined trigger word for activating the home assistant device.
Kim et al., US 10,789,041 B2, directed towards a speech trigger threshold used to trigger a virtual assistant. Kim teaches that a confidence level is calculated to determine whether the sampled audio input includes a portion of a spoken trigger.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908.  The examiner can normally be reached on M-Th, and every other Friday, 9:30a-7p..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656