Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections-35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 10  are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Howard (US 20200272690 A1)

With respect to claims 1 and 10, Howard teaches An electronic apparatus/method comprising: 
a microphone ([0040] The audio subsystem 202 provides audio data to the hotworder 210. The hotworder 210 is configured to identify hotwords in audio received through the microphone 204 and/or stored in the buffer 208); 
a memory storing a plurality of keyword recognition models ([0086] The computing device 400 includes a processor 402, a memory 404, a storage device 406) ; 
and a processor ([0086] The computing device 400 includes a processor 402, a memory 404, a storage device 406), which is coupled with the microphone and the memory, configured to control the electronic apparatus, wherein the processor is further configured to: 
selectively execute at least one keyword recognition model among the plurality of keyword recognition models based on operating state information of the electronic apparatus, based on a first user voice being input through the microphone, identify whether at least one keyword corresponding to the executed at least one keyword recognition model is included in the first user voice by using the executed at least one keyword recognition model, and based on at least one keyword identified as being included in the first user voice, perform an operation of the electronic apparatus corresponding to the at least one keyword ([0040] The audio subsystem 202 provides audio data to the hotworder 210. The hotworder 210 is configured to identify hotwords in audio received through the microphone 204 and/or stored in the buffer 208. The hotworder 210 may be referred to as a hotword detector, keyword spotter, or keyword detector. In some implementations, the hotworder 210 may be active at any time that the system 200 is powered on [starts on first operation state]. The hotworder 210 continuously analyzes the audio data stored in the buffer 208. The hotworder 210 computes a hotword confidence score that reflects the likelihood that current audio data in the buffer 208 includes a hotword. To compute the hotword confidence score, the hotworder 210 may use the hotword models 214. The hotword models 214 [plurality of models] may include different hotword models for different hotwords such as “ok computer,” “hey computer,” “ok assistant,” or other terms and phrases. The hotword models 214 may be adapted for a particular user or be for any user [selectively execute based on user]. And,   [0041] In some implementations, the hotworder 210 determines the audio data includes the hotword “ok computer.” The hotworder 210 may identify the portion of the audio data 212 that includes the hotword. For example, the hotworder 210 may identify  for the speech recognizer 218 the portion of the audio data [perform an operation] that includes the hotword “ok computer.”)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2, 3, 11 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Howard as applied to claims 1, 2, 10 and 11  respectively, in further view of Yavagal (US 20200184966 A1)

With respect to claim 2, 11 Howard fails to explicitly disclose but Yavagal teaches obtain a similarity value between a signal pattern of the at least one keyword and a signal pattern of the first user voice through the executed at least one keyword recognition model, and 
based on a similarity value of the signal pattern of the at least one keyword exceeding a threshold value among the signal pattern of the first user voice, identify that the first user voice includes the at least one keyword ([0020] The device determines (138), using the first speech-processing component, a score corresponding to similarity between the audio data and a wakeword. If the score does not satisfy the wakeword-detection parameter, the device may take no further action regarding the audio data. The device, however, may determine (140), using the first speech-processing component, that the score satisfies the wakeword-detection parameter. For example, the score may be greater than a wakeword-detection threshold, thus satisfying a condition corresponding to a parameter of the first speech-processing component. The device determines (142), using a second speech-processing component of the voice-controlled device 110 (for example, a second-stage, higher-accuracy speech-processing component), that the audio data includes the representation of a wakeword. As explained in greater detail below, the device 110 may select and use a wakeword-detection model and/or speaker identification model or either or both of the first and second speech-processing components based on the location. The device 110 may further determine and use a speaker-identification parameter based on the location.)

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Howard in view of Yavagal, in order to obtain a similarity value between a signal pattern of the at least one keyword and a signal pattern of the first user voice through the executed at least one keyword recognition model, and based on a similarity value of the signal pattern of the at least one keyword exceeding a threshold value among the signal pattern of the first user voice, identify that the first user voice includes the at least one keyword to use historical data and/or usage patterns over time to increase or decrease a confidence level of an identity of a user ([0096], Yavagal);

With respect to claims 3 and 12, Howard further teaches  wherein the processor is further configured, based on the electronic apparatus operating in a first operating state in which a conversation system stored in the electronic apparatus is deactivated, to execute a first keyword recognition model corresponding to the first operating state among the plurality of keyword recognition models, and wherein the processor is further configured, based on the electronic apparatus operating in a second operating state configured to process text corresponding to a second user voice input through the microphone by an activated conversation system, to execute a second keyword recognition model corresponding to the second operating state among the plurality of keyword recognition models (0040] The audio subsystem 202 provides audio data to the hotworder 210. The hotworder 210 is configured to identify hotwords in audio received through the microphone 204 and/or stored in the buffer 208. To compute the hotword confidence score, the hotworder 210 may use the hotword models 214 [plurality of models]. [0042] The speech recognizer 218 performs speech recognition on the audio data or on the portion of the audio data 212 that does not include the hotword [second user voice]. The speech recognizer 218 may use a language model and an acoustic model [plurality of second keyword models] to generate a transcription [text]of the audio data or the portion of the audio data that does not include the hotword. In some implementations, the speech recognizer may not use individual language and acoustic models. Instead, the speech recognizer 218 may use a speech recognition model that is configured to receive audio data and output a transcription. The speech recognition model may use neural networks and/or models trained using machine learning techniques.)


Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Howard and Yavagal as applied to claims 3, 12  respectively, in further view of Yasa (US 20200184959 A1)

With respect to claim 4, 13 Howard, Yavagal fail to explicitly disclose but Yasa teaches based on the keyword being not recognized by using the second keyword recognition model executed while operating in the second operating state, input the first user voice to an automatic speech recognition (ASR) module of the conversation system to obtain text corresponding to the first user voice, and provide a response to the first user voice based on the text corresponding to the first user voice ([0156] If the ASR processing results are clear, and there may be an error in NLU processing [keyword not recognized], then the system(s) 120 may not request the user to repeat the utterance, and may instead output an alternative utterance [provide response], an error condition or may fail silently [Fig. 2A shows wakeword detection happening  in device 110. This moves the system to second operating state]). 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Howard, Yavagal in view of Yasa, in order that when based on the keyword being not recognized by using the second keyword recognition model executed while operating in the second operating state, input the first user voice to an automatic speech recognition (ASR) module of the conversation system to obtain text corresponding to the first user voice, and provide a response to the first user voice based on the text corresponding to the first user voice to improve human-computer interactions  ([0002], Yasa);

Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Howard and Yavagal as applied to claims 3, 12  respectively, in further view of Ji (US 20200365148 A1) and Bissell (US 11158308 B1)

With respect to claim 5, 14 Howard, Yavagal fail to explicitly disclose but Ji teaches wherein the processor is further configured to, based on the electronic apparatus operating in a third operating state configured to recognize a second user voice input through the microphone by the activated conversation system, through an automatic speech recognition (ASR) module of the conversation system, stop executing the [[plurality]] of keyword recognition models and obtain text corresponding to the second user voice through the ASR module ([0039] In some embodiments, computing cloud 110 also includes a natural language understanding (NLU) system 115. Electronic device 100 can optionally bypass the NLU 115 [stop executing keyword recognition] within electronic device 100, configure request 108 to include the transcription of audio stream 102 [[stream has ASR text in it], and transmit request 108 to computing cloud 110).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Howard, Yavagal in view of Ji, in order that processor is further configured to, based on the electronic apparatus operating in a third operating state configured to recognize a second user voice input through the microphone by the activated conversation system, through an automatic speech recognition (ASR) module of the conversation system, stop executing the [[plurality]] of keyword recognition models and obtain text corresponding to the second user voice through the ASR module  to increase confidence weight assigned to certain keywords ([0084], Ji);
Howard, Yavagal and Ji fail to explicitly disclose but Bissel teaches a plurality of keyword recognition models ( Col 2 ll 38-41 An NLU model may be associated with a domain and may be configured to process user inputs relating to the particular domain by incorporating words that are likely to be used with respect to the particular domain)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Howard, Yavagal, Ji in view of Bissel, in order to have a plurality of keyword recognition models to improve human-computer interactions. (Col 1 ll 20-22, Bissel)

Claims 6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Howard and Yavagal as applied to claims 3, 12  respectively, in further view of Wagner (US 10175933 B1) 

With respect to claim 6, 15 Howard, Yavagal fail to explicitly disclose but Wagner teaches wherein the processor is further configured to, based on a keyword including a meaning of requesting a predefined operation being identified as being included in the first user voice by using the first keyword recognition model executed while operating in the first operating state, control to perform the predefined operation (Col 4 ll 5-28 The microphone 116 of the voice-controlled device 110 detects audio from the environment 114 [puts system in first operating state]where the voice-controlled device 110 is located, such as sounds uttered from the user 112 [first user voice]. As illustrated, the voice-controlled device 110 includes a processor 120 and memory 122, which stores or otherwise has access to application(s) 124, which can include one or more speech-processing engines, voice applications, etc. As used herein, a processor can include multiple processors and/or a processor having multiple cores. The applications 124 can perform speech recognition on audio signals generated based on sound captured by the microphone, such as utterances spoken by the user 112. In some instances, the memory 122 may only include one or more URLs for accessing content and personalized content as instructed by the service provider 106. The voice-controlled device 110 can perform certain actions [perform predetermined operations] in response to recognizing different speech from the user 112. The user can speak predefined commands (e.g., “Awake”; “Sleep”; “Open”; “Pause”; “Repeat”; “Skip”; “More”; etc.) [keywords including meaning], or can use a more casual conversation style when interacting with the device 110 (e.g., “I'd like to go to a movie. Please tell me what's playing at the local cinema.”; “Please send me more information to my email.”; etc.).).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Howard, Yavagal in view of Wagner, in order for a keyword including a meaning of requesting a predefined operation being identified as being included in the first user voice by using the first keyword recognition model executed while operating in the first operating state, control to perform the predefined operation to  provide interactive, voice-controlled programs or applications that provide information, entertainment, news, etc. (Col 6 ll 48-50, Wagner);

Claims 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Howard and Yavagal as applied to claims 3, 12  respectively, in further view of Littlejohn (US 20200111497 A1)

With respect to claim 7, 16 Howard, Yavagal fail to explicitly disclose but Littlejohn teaches based on a keyword including a meaning of requesting a next or previous operation being identified as being included in the first user voice by using the first keyword recognition model executed while operating in the first operating state, identify an operation performed by the electronic apparatus by the conversation system before the first keyword recognition model identifies that the keyword is included in the first user voice, on a basis of conversation history information, and control to perform an operation corresponding to the next or previous operation of the operation performed by an identified electronic apparatus ([0005] This is achieved by storing (e.g. in the cloud) both an account conversation state associated with an account shared by the plurality of playback devices, and a device conversation state associated with a specific one of the plurality of playback devices,  and  [0126] The process illustrated in FIG. 3 [Figure 3 stares in step 302 which is the ‘Waiting’ state, which is mapped to the first operating state] enables the content playback system to provide a continuous experience as the user moves from one playback device to the next (e.g. when the user goes from one room to the next). This is due to the use of the account conversation state in interpreting the voice command. In particular, the account conversation state provides previous interactions between the user and one or more of the playback devices as context for interpreting the voice command. For example, when the user plays a particular playlist via a first one of the plurality of playback devices, this may be indicated in the account conversation state. The user may then move to another playback device (e.g. in another room), and make a voice command such as “Play the next song”. The controller interprets this voice command in the context of the account conversation state, and so may determine that the user intent is to play the next song in the playlist.). 

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Howard, Yavagal in view of Littlejohn, in order for a keyword including a meaning of requesting a predefined operation being identified as being included in the first user voice by using the first keyword recognition model executed while operating in the first operating state, control to perform the predefined operation to  provide additional context for the system to interpret the voice command, to facilitate interpretation of the voice command ([0049] Littlejohn);

Claims 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Howard and Yavagal as applied to claims 3, 12  respectively, in further view of Torok (US 20180233136 A) and Lee (US 20160217795 A1)
With respect to claim 8, 17 Howard, Yavagal fail to explicitly disclose but Torok teaches , wherein the processor is further configured to, based on a keyword including a meaning of stopping an operation being identified as being included in the first user voice by using the second keyword recognition model executed while operating in the second operating state, [[control to stop processing with respect to text corresponding to the second user voice]] by the conversation system ([0159] For instance, the user 102 may initiate the playback of the first audio content corresponding to the first audio file 1205 by uttering the first voice command determined at block 1202 [the first voice command and recognition puts the system in operation state 2], and subsequently uttering a voice command  [first user voice in operation state 2] to stop the music, at which point the timer can be started, and lapse of the predetermined period of time at block 1214 can be evaluated from a point in time when the music was stopped (or paused)). 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Howard, Yavagal in view of Torok, in order for, based on a keyword including a meaning of stopping an operation being identified as being included in the first user voice by using the second keyword recognition model executed while operating in the second operating state, [[control to stop processing with respect to text corresponding to the second user voice]] by the conversation system to  provide a high-quality audio experience for the users in the environment 106 ([0053] Torok).
Howard, Yavagal or Torok  fail to explicitly disclose but Lee teaches control to stop processing with respect to text corresponding to the second user voice ([0207] When the first voice does not include the first command in step 1404 or when the second voice is not received in step 1405, the second voice recognition processor 180 terminates the voice recognition.) 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Howard, Yavagal, Torok in view of Lee, in order to control to stop processing with respect to text corresponding to the second user voice to  increase the performance of the voice recognition ([0154] Lee).

Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Howard and Yavagal as applied to claims 3, 12  respectively, in further view of Zajac (US 20190027131 A1) and Lee (US 20190027131 A1)
With respect to claim 9, 18  Howard, Yavagal fail to explicitly disclose but Zajac teaches , wherein the processor is further configured to, based on a keyword including a meaning of repeating an operation being identified as being included in the first user voice by using the second keyword recognition model executed while operating in the second operating state,[[ control to stop processing the text corresponding to the second user voice performed by the conversation system ]]and to repeat processing of the text based on conversation history information ([0041] Alternatively or additionally, in some embodiments, a response executed by voice-controlled application 124 may be an internal process triggered by command 133, rather than a visual or audio output. For example, in one such embodiment, voice-controlled application 124 is a home automation application and/or an IPA, and command 133 corresponds to a particular voice command, such as “cancel current request,” “stop current action,” “repeat previous command,” “what is the outside temperature right now?,” turn off all devices in this room,” and the like… As noted above, compared to using a complete word or phrase for voice control of an application, phonetic fragments can be more quickly spoken, can be more reliably recognized by speech recognition application 121, and are less apt to interrupt other user activities, such as conducting a conversation., and [0023] Thus, speech recognition application 121 receives audio signal 131 from microphone 103 and generates a verbal utterance indicator 132 [this puts the system in the second operating state] ). 
Howard, Yavagal or Zajac fail to explicitly disclose but Lee teaches control to stop processing with respect to text corresponding to the second user voice ([0207] When the first voice does not include the first command in step 1404 or when the second voice is not received in step 1405, the second voice recognition processor 180 terminates the voice recognition.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Howard, Yavagal, Zajac in view of Lee, in order to control to stop processing with respect to text corresponding to the second user voice to  increase the performance of the voice recognition ([0154] Lee).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675.  The examiner can normally be reached on Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.   Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ATHAR N PASHA/Examiner, Art Unit 2657     

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657