DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Interpretation 112(f)

1.	The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

2.	The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:

(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the 

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
(a)	Claim 1;	“…one or more microphones configured to detect sound”
(b)	Claim 1;	“…command-keyword engine is configured to (a) process input sound data representing the sound detected by the at least one microphone…”
(c)	Claim 1;	“…the first NLU is configured to determine an intent of a given voice input…”
(d)	Claim 1;	“…the second NLU is configured to determine an intent of a given voice input…”
(e)	Claim 5;	“…a respective NLU configured to detect, in input sound data, keywords from a respective predetermined library of keywords different from the other respective predetermined libraries of keywords”
(f)	Claim 5;	“…each respective NLU is configured to determine an intent of a given voice input…”
(g)	Claim 7;	“…a voice assistant service (VAS) wake-word engine configured to receive input sound data representing the sound detected by the at least one microphone and 

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.

(a) Fig. 2A, Microphone 222, Paragraph 0076
(b) Fig. 7A, Command Keyword Engine 771a, Paragraph 0142 
(c) Fig. 7A, NLU 779, Paragraphs 0153-0157
(d) Fig. 7A, NLU 779, Paragraph 0153-0157
(e) Fig. 7A, NLU 779, Paragraph 0153-0157
(f) Fig. 7A, NLU 779, Paragraph 0153-0157 
(g) Fig. 7A, VAS Wake Word Engine 773, Paragraph 0076

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
1.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.	Claims 1, 2, 6-9, 13-16 & 20 are rejected under 35 U.S.C. 103 as being unpatentable over Lockhart et al. (US 10,186,265 B1 hereinafter, Lockhart ‘265) in combination with Meany et al. (US 9,484,030 B1 hereinafter, Meany ‘030).
Regarding claim 1; Lockhart ‘265 discloses a playback device (Fig. 1, Speech Controlled Device 110) of a media playback system (Fig. 1, System 100), 
the playback device comprising: 
at least one speaker (Fig. 1, Speaker 101); 
one or more microphones (Fig. 1, Microphone 103) configured to detect sound (i.e. Speech-controlled device 110 may capture input audio 11 of a spoken utterance from user 5 via a microphone 103 of the speech-controlled device 110. Column 6, lines 1-30); 
a network interface (Fig. 1, Network 199) 
one or more processors (Fig. 7, Controller(s)/Processor(s) 704)
and data storage (Fig. 7, Memory 706) having instructions stored thereon that are executable by the one or more processors to cause the playback device to perform functions (i.e. Each of these devices (110/120) may include one or more controllers /processors (704/804), that may each include a central processing unit (CPU) for 
comprising: 
receiving input sound data representing the sound detected by the one or more microphones (i.e. Speech-controlled device 110 may capture input audio 11 of a spoken utterance from user 5 via a microphone 103 of the speech-controlled device 110. Column 6, lines 1-30); 
detecting, via a command-keyword engine (Fig. 4A,Wake Word Detection Module 220), a first command keyword in a first voice input represented in the input sound data (i.e. For a wakeword, the associated function is typically to "wake" a local device so that it may capture audio following (or surrounding) the wakeword and send audio data to a remote server for speech processing. Column 3, lines 4-19), 
wherein the command-keyword engine is configured to (a) process input sound data representing the sound detected by the at least one microphone (i.e. The local device 110 may include a first detector (primary wakeword module) 220a to detect a wakeword in audio data detected by the microphone 103. Column 6, lines 31-49)
and (b) generate a command-keyword event when the command-keyword engine detects, in the input sound data, one of a plurality of keywords supported by the command-keyword engine; (i.e. The local device 110 may be configured to receive and respond to wakewords and execute audible commands in conjunction with server 120. The local device 110 may include a first detector (primary wakeword module) 220a to detect a wakeword in audio data detected by the microphone 103. The local device 110 may also include a second detector (secondary wakeword module) 220b to detect a wakeword in output audio data to be output from a speaker 101 of the local device 110. The first detector and the second detector may be enabled or disabled at different times and for specific lengths of time. While the local device 110 is listening for the wakeword, a user 5 may say the wakeword and say a command following the wakeword. The local device 110 may detect the wakeword, illustrated as block 132, as uttered by the user or any other audio source within the range of the local device's microphone 103. The local device 110 may then transmit (134) audio of the detected wakeword and/or data corresponding to the command to the remote device 120 via the network 199.  Column 6, lines 31-49)
in response to detecting the first command keyword (i.e. The wakeword detection module 220 works in conjunction with other components of the device 110, for example a microphone (not illustrated) to detect keywords in audio 11. Column 8, lines 52-60), 
(Fig. 2, Automatic Speech Recognition 250), whether the input sound data includes at least one keyword within a first predetermined library of keywords (Fig. 2 ASR Model 252) from which the first NLU is configured to determine an intent of a given voice input (i.e. A spoken utterance in the audio data 111 is input to a processor configured to perform ASR, which then interprets the spoken utterance based on a similarity between the spoken utterance and pre-established language models 254 stored in an ASR model knowledge base (i.e., ASR model storage 252). The NLU process takes textual input (such as processed from ASR 250 based on the utterance input audio 11) and attempts to make a semantic interpretation of the text. That is, the NLU process determines the meaning behind the text based on the individual words and then implements that meaning. NLU processing 260 interprets a text string to derive an intent or a desired action from the user as well as the pertinent pieces of information in the text that allow a device (e.g., device 110) to complete that action. Column 10, lines 3-18 and Column 11, line 61 thru Column 12, line 8);
transmitting, via the network interface over a local area network, the input sound data to a second playback device (Fig. 1, Server(s) 120) of the media playback system (i.e. The local device 110 may then transmit (134) audio of the detected wakeword and/or data corresponding to the command to the remote device 120 via the network 199. Column 6, lines 31-49)
the second playback device employing a second local NLU (Fig. 2, Natural Language Understanding 260) with a second predetermined library of keywords (Fig. 2, Entity Library 282 i.e. The NLU module 260 may comprise the name entity recognition module 262, the intent classification module 264, and/or other components. The NLU module 260 may also include a stored knowledge base and/or entity library. Column 26, lines 1-8)
from which the second NLU is configured to determine an intent of a given voice input (i.e. An intent classification (IC) module 264 parses the query to determine an intent or intents for each identified domain, where the intent corresponds to the action to be performed that is responsive to the query. Column 13, lines 11-53); 
receiving, via the network interface, a response from the second playback device (Fig. 1, “Cause the Output Audio Data to be Transmitted to Local Device” i.e. the server 120 may then cause (144) the output audio data to be transmitted to the local device for delivery to the user 5. Column 6, line 50 thru Column 7, line 24);
and after receiving the response from the second playback device, performing an action based on an intent determined by at least one of the first NLU or the second NLU according to the (Fig. 6, Step 606 i.e. When the wakeword and/or command is recognized (604: Yes), the command may be processed (606) to generate a result comprising audio data. Column 22, lines 44-53)
Examiner reasonably believes and one of ordinary skill in the art would understand that Column 6, lines 31-49 of Lockhart ‘265 discloses wherein generating a command-keyword event when the command-keyword engine detects, in the input sound data, one of a plurality of keywords supported by the command-keyword engine. However, Examiner cites Meany ‘030 to better disclose the limitation. Meany ‘030 at  Column 5, lines 19-39 discloses that once non-noise audio is detected in the audio received by the device 110 (or separately from speech detection), the system 100 (for example through device 110) may use the sound recognition module 280 to detect specific sounds in the audio. This process may also be referred to as keyword detection or acoustic event detection. Specifically, acoustic event detection is typically performed without performing linguistic analysis, textual analysis or semantic analysis. Instead, incoming audio (or audio data) is analyzed to determine if specific characteristics of the audio match preconfigured sound profiles, which may include acoustic waveforms, acoustic signatures, or other data to determine if the incoming audio "matches" stored audio data corresponding to a specific sound or keyword.
Lockhart ‘265 and Meany ‘030 are combinable because they are from same field of endeavor of speech systems (Meany ‘030 at “Background”). 
	At the time the invention was effectively filed, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Lockhart ‘265 by adding a generating a command-keyword event when the command-keyword engine detects, in the input sound data, one of a plurality of keywords supported by the command-keyword engine as taught by Meany ‘030. The motivation for doing so would have been advantageous because speech recognition systems have progressed to the point where humans can interact with computing devices entirely relying on speech. This would allow a user of a computing device to enhance their 

Regarding claim 2; Lockhart ‘265 discloses wherein the first predetermined library of keywords includes keywords that are not included within the second predetermined library of keywords (i.e. The device 110 and/or the server 120 may include an ASR module 250. The ASR module 250 in the device 110 may be of limited or extended capabilities. The ASR module 250 may include the language models 254 stored in ASR model storage component 252. If limited speech recognition is included, the ASR module 250 may be configured to identify a limited number of words, whereas extended speech recognition may be configured to recognize a much larger range of words.  Column 25, line 59-67).

Regarding claim 6; Lockhart ‘265 discloses wherein the keywords of the first predetermined library of keywords associated with the first NLU comprises keywords corresponding to a first intent category (i.e. A spoken utterance in the audio data 111 is input to a processor configured to perform ASR, which then interprets the spoken utterance based on a similarity between the spoken utterance and pre-established language models 254 stored in an ASR model knowledge base (i.e., ASR model storage 252). Column 10, lines 3-18);
and wherein the second predetermined library of keywords associated with the second NLU comprises keywords corresponding to a second intent category (i.e. The NLU process takes textual input (such as processed from ASR 250 based on the utterance input audio 11) and attempts to make a semantic interpretation of the text. That is, the NLU process determines the meaning behind the text based on the individual words and then implements that meaning. NLU processing 260 interprets a text string to derive an intent or a desired action from the user as well as the pertinent pieces of information in the text that allow a device (e.g., device 110) to complete that action. Column 11, line 61 thru Column 12, line 8).

Regarding claim 7; Lockhart ‘265 discloses a voice assistant service (VAS) wake-word engine (Fig. 1, #130 “Listen for Wakeword by First Detector 130”) configured to receive input sound data (i.e. The local device 110 may be configured to receive and respond to wakewords and execute audible commands in conjunction with server 120. The local device 110 may include a first detector (primary wakeword module) 220a to detect a wakeword in audio data detected by the microphone 103. Column 6, lines 31-49);
and generate a VAS wake-word event when the first wake-word engine detects a VAS wake word in the input sound data, wherein the playback device streams sound data representing the sound detected by the at least one microphone to one or more servers of the voice assistant service when the VAS wake-word event is generated (i.e. Once a determination is made that the result, comprising the output audio data 151 from the server or audio source 420, includes a wakeword, the audio processing module 522 may generate and send instructions to the primary wakeword detector 220a to disable wakeword detection to avoid interruption of the result (output audio data 151 from the server or audio source 420) being broadcast from the speech-controlled device 110 (also referred to as the local device 110). Wakeword detection may be disabled by deactivating a microphone 103 connected to the speech-controlled device 110, wherein the microphone 103 may be configured to detect input audio 11 that may include a wakeword. Wakeword detection may also be disabled by executing instructions for the primary wakeword detector 220a to not respond when a wakeword is identified in a stream of output audio data 151 from the local device 110 via speakers 101. It should be appreciated that disabling wakeword detection can be performed in a number of different ways, as long as the identified wakeword fails to interrupt the device when the wakeword is output from the speakers of the local device. Column 20, line 55 thru Column 21, line 7).

Regarding claims 8 & 15; Claims 8 & 15 contains substantially the same subject matter as claim 1. Therefore, Claims 8 & 15 are rejected on the same grounds as claim 1. However, Claim 15 further discloses a tangible, non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the processors to perform functions. Lockhart ‘265 at Column 24, line 60 thru Column 25, line 2 discloses wherein a device's computer instructions may be stored in a non-transitory manner in non-volatile memory (706/806), storage (708/808), or an external device(s). Alternatively, some or all of the executable instructions may be embedded in hardware or firmware on the respective device in addition to or instead of software.

Regarding claim 9 & 16; Claims 9 & 16 contains substantially the same subject matter as claim 2. Therefore, Claims 9 & 16 are rejected on the same grounds as claim 2.

Regarding claim 13 & 20; Claims 13 & 20 contains substantially the same subject matter as claim 6. Therefore, Claims 13 & 20 are rejected on the same grounds as claim 6.

Regarding claim 14; Claim 14 contains substantially the same subject matter as claim 7. Therefore, Claim 14 is rejected on the same grounds as claim 7.


Allowable Subject Matter
1.	Claims 3-5, 10-12 & 17-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

2.	Claims 10 & 17 contains substantially the same subject matter as claim 3. Therefore, Claims 10 & 17 are objected on the same grounds as claim 3.

3.	Claims 4 & 5 depend on objected claim 3. Therefore, by virtue of their dependency, Claims 4 & 5 are also objected subject matter.

Claims 11 & 12 depend on objected claim 10. Therefore, by virtue of their dependency, Claims 11 & 12 are also objected subject matter.

5.	Claims 18 & 19 depend on objected claim 17. Therefore, by virtue of their dependency, Claims 18 & 17 are also objected subject matter.


Examiner’s Reasons for Indication of Allowable Subject Matter
Lockhart ‘265 discloses a system and method for temporarily disabling keyword detection to avoid detection of machine-generated keywords. Audio data received for output by audio speakers is first captured by an effect component, such as an audio equalizer. The effect component may perform various operations including altering the audio data, copying the audio data and delaying the time in which the output audio is sent to the audio speakers. The effect component may generate a copy of the audio data and transmit or route the copy of the audio data to a secondary keyword detector via an audio channel, resulting in a further delay. The secondary detector may determine that the copy of the processed audio data includes a keyword that is likely to be output during a first time interval. The secondary detector may then transmit a signal to a primary keyword detector to disable keyword.
Meany ‘030 discloses wherein a system is configured to execute audio-initiated commands. The system detects audio and determines if a first sound is included in the audio. The system then processes further incoming audio to detect a second sound. If the second sound is not detected within a time threshold, the system executes a command. The command may include delivering a 
Lockhart ‘265 either alone or in combination with Meany ‘030 fail to teach wherein the first predetermined library of keywords comprises a first partition having a first subset of keywords and a second partition having a second subset of keywords different from the first subset of keywords; the second predetermined library of keywords comprises a third partition having a third subset of keywords and a fourth partition having a fourth subset of keywords; wherein the first subset of keywords and the third subset of keywords include some or all of the same keywords; wherein the third subset of keywords differs from the first, second, and fourth subsets of keywords, and wherein the fourth subset of keywords differs from the first, second, and third subsets of keywords. As a result and for these reasons, Examiner indicates Claims 3-5, 10-12 & 17-19 as allowable subject matter.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARCUS T. RILEY, ESQ. whose telephone number is (571)270-1581.  The examiner can normally be reached on 9-5 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


MARCUS T. RILEY, ESQ.
Examiner
Art Unit 2677



/MARCUS T RILEY/Primary Examiner, Art Unit 2677