DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Interpretation 112(f)
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
(a)	Claim 1; “memory configured to store a first wake-up word recognition engine” 
(b)	Claim 1; “communication interface configured to communicate with a server” 
(c)	Claim 1; “server configured to store a second wake-up word recognition engine” 
(d)	Claim 1; “processor configured to: acquire an audio signal through the microphone” 
(e)	Claim 4; “processor is further configured to acquire a first wake-up word recognition result for a first wake-up word through the first wake-up word recognition engine” 
(f)	Claim 5; “processor is further configured to: deactivate the VAD function in response to recognizing the first wake-up word or the second wake-up word” 
(g)	Claim 7; “processor is further configured to: transmit the wake-up word recognition interval to the server using an application programming interface (API) for the second wake-up word recognition engine” 
(h)	Claim 8; “processor is further configured to acquire a voice presence probability from the generated pre-processed audio signal using the VAD function” 
(i)	Claim 11; “processor is further configured to deactivate the VAD function” 

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
(a) Fig. 1, Memory 140, Paragraph 0006
(b) Fig. 1, Communication Interface 110, Paragraph 0007
(c) Fig. 1, Artificial Intelligence Server 400, Paragraph 0007
(d) Fig. 1, Processor 170, Paragraph 0007
(e) Fig. 1, Processor 170, Paragraph 0010
(f) Fig. 1, Processor 170, Paragraph 0011
(g) Fig. 1, Processor 170, Paragraph 0007
(h) Fig. 1, Processor 170, Paragraph 0014
(i) Fig. 1, Processor 170, Paragraph 0011
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 103
1.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.	Claims 1-4, 7-9, 12-15 & 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Piersol et al. (20190156818 A1 hereinafter, Piersol ‘818) in combination with Fainberg et al. (US 20200105256 A1 hereinafter, Fainberg ‘256).
Regarding claim 1; Piersol ‘818 discloses an artificial intelligence apparatus (Fig. 1, Device 110) for recognizing a plurality of wake-up words (i.e. As the device 110 detects audio it may process the audio (either before or after the audio is stored in the buffer) to determine if the audio includes a wakeword. The device 110 may continue to do so until it detects (160) a wakeword in the received audio.  Paragraph 0023);
the artificial intelligence apparatus comprising: 
a microphone (Fig. 8, Microphone 308);
a memory (Fig. 8, Memory 806) configured to store a first wake-up word recognition engine (i.e. A memory (806/906) for storing data and instructions of the respective device. The memories (806/906) may individually include volatile random access memory (RAM), non-volatile read only memory (ROM), non-volatile magnetoresistive (MRAM) and/or other types of memory. Each device may also include a data storage component (808/908), for storing data and controller/processor-executable instructions. Paragraph 0094); 
a communication interface (Fig. 8, I/O Device Interfaces 802); 
configured to communicate with a server (i.e. Each device (110/120) includes input/output device interfaces (802/902). A variety of components may be connected through the input/output device interfaces Paragraph 0096); 
configured to store a second wake-up word recognition engine (i.e. The device may then detect (608) a wakeword. The wakeword may be detect as the audio data is processed, after it is stored in the buffer, or at a different time. IParagraph 0087);
and a processor (Fig. 8, Controller(s)/Processor(s) 804); 
configured to: acquire an audio signal through the microphone (i.e. An audio capture component, such as a microphone of the audio device 110, captures audio 11 corresponding to a spoken utterance, which may include a command. Paragraph 0027);
generate a pre-processed audio signal from the acquired audio signal (i.e. The acoustic front end (AFE) 256 transforms the audio data from the microphone into data for processing by the speech recognition engine. The AFE may reduce noise in the audio data and divide the digitized audio data into frames representing a time intervals for which the AFE determines a set of values, called a feature vector, representing the features/qualities of the utterance portion within the frame. Paragraph 0030);
extract a voice interval from the generated pre-processed audio signal, wherein the voice interval is associated with a portion of the audio signal corresponding to a voice (i.e. The AFE may reduce noise in the audio data and divide the digitized audio data into frames representing a time intervals for which the AFE determines a set of values, called a feature vector, representing the features/qualities of the utterance portion within the frame. Paragraph 0030);
set a wake-up word recognition interval comprising the extracted voice interval and a buffer interval corresponding to the extracted voice interval in the generated pre-processed audio signal, wherein the wake-up word recognition interval is an interval used for recognizing wake-up words (i.e. The system buffers incoming audio and indicates locations in the audio where the utterance changes, for example when a long pause is detected. When the system detects a wakeword within a particular utterance, the system determines the most recent utterance change location prior to the wakeword and sends the audio from that location to the end of the command utterance to a server for further speech processing. See Abstract and Paragraph 0087);
and transmit the set wake-up recognition interval in the generated pre-processed audio signal to the first wake-up word recognition engine and the second wake-up word recognition engine (i.e. The location of the beginpoint prior to the wakeword is shown as Tbeginpoint 726. The device may then send (612) audio data from the device 110 to the server 120 for speech processing. The sent audio data may begin at the beginpoint location, i.e., may include audio data corresponding to the beginpoint location 726. The device may then determine (614) whether an endpoint is detected. If an endpoint is not detected (614:No), the device may continue to send (612) audio to the server. If an endpoint is detected (614:Yes), for example endpoint Tendpoint 730 as shown in FIG. 7, the device may stop (616) sending audio data.  See Abstract and Paragraph 0087).
Piersol ‘818 disclosed most of the subject matter as described as above except for specifically pointing out there a second wake-up engine. 
Fainberg ‘256 discloses a second wake-up engine (i.e. The NMD 503 may include multiple, different wake-word engines and/or voice extractors, each supported by a particular VAS. Each wake-word engine may be configured to receive as input the sound-data stream SDS from the one or more buffers 568 and apply identification algorithms to cause a wake-word trigger for the appropriate VAS. Thus, as one example, the first wake-word engine 570a may be configured to identify the wake word “Alexa” and cause the NMD 503 to invoke the AMAZON VAS when “Alexa” is spotted. As another example, the second wake-word engine 570b may be configured to identify the wake word “Ok, Google” and cause the NMD 503 to invoke the GOOGLE VAS when “Ok, Google” is spotted. Paragraph 0122)
Piersol ‘818 and Fainberg ‘256 are combinable because they are from same field of endeavor of speech systems (Fainberg ‘256 at “Technical Field”). 
	At the time the invention was effectively filed, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Piersol ‘818 by adding a second wake-up engine as taught by Fainberg ‘256. The motivation for doing so would have been advantageous because given the ever-growing interest in digital media, there continues to be a need to develop consumer-accessible technologies to further enhance the listening experience. Therefore, it would have been obvious to combine Piersol ‘818 with Fainberg ‘256 to obtain the invention as specified.
Regarding claim 2; Piersol ‘818 discloses wherein the voice interval is extracted from the generated pre-processed audio signal through a voice activation detection (VAD) function (i.e. Some embodiments may apply voice activity detection (VAD) techniques. Such techniques may determine whether speech is present in an audio input based on various quantitative aspects of the audio input, such as the spectral slope between one or more frames of the audio input; the energy levels of the audio input in one or more spectral bands; the signal-to-noise ratios of the audio input in one or more spectral bands; or other quantitative aspects. Paragraph 0069).

Regarding claim 3; Piersol ‘818 discloses wherein the wake-up word recognition interval further comprises a first buffer interval and a second buffer interval, wherein the first buffer interval is set based at least in part on a preceding interval having a first length from the voice interval and the second buffer interval is set based at least in part on a subsequent interval having a second length from the voice interval (i.e. The device may then identify pauses in the speech (by identifying silent periods in the audio). As illustrated, the identified pauses include pauses 504-516. The device may compare the length of each pauses to a threshold length, where the threshold length represents a likelihood that a pause of the threshold length represents a break between utterances. As illustrated, the device may determine that only pause 506 has a length exceeding the threshold, while the other pauses have lengths that do not exceed the threshold (and thus may represent breaks within a same utterance, for example pauses between words).  Paragraph 0079).

Regarding claim 4; Piersol ‘818 discloses wherein the processor is further configured to acquire a first wake-up word recognition result for a first wake-up word through the first wake-up word recognition engine, and to acquire a second wake-up word recognition result for a second wake-up word through the second wake-up word recognition engine (i.e. Each marked location may be associated with a stored confidence and the system may use the confidence values when a wakeword is detected to determine where to mark a new utterance. For example, if a first location 1 second before a wakeword is associated with a low confidence, but a second location 1.8 seconds before the wakeword is associated with a high confidence, the system may select the second location as the beginning of the utterance for purposes of bounding the utterance for speech processing. Paragraph 0078).

Regarding claim 7; Piersol ‘818 discloses wherein the processor is further configured to: transmit the wake-up word recognition interval to the server using an application programming interface (API) for the second wake-up word recognition engine, and acquire the second wake-up word recognition result for the second wake-up word (i.e. The location of the beginpoint prior to the wakeword is shown as Tbeginpoint 726. The device may then send (612) audio data from the device 110 to the server 120 for speech processing. The sent audio data may begin at the beginpoint location, i.e., may include audio data corresponding to the beginpoint location 726. The device may then determine (614) whether an endpoint is detected. If an endpoint is not detected (614:No), the device may continue to send (612) audio to the server. If an endpoint is detected (614:Yes), for example endpoint Tendpoint 730 as shown in FIG. 7, the device may stop (616) sending audio data.  See Abstract and Paragraph 0087).

Regarding claim 8; Piersol ‘818 discloses wherein the processor is further configured to acquire a voice presence probability from the generated pre-processed audio signal using the VAD function, wherein the voice interval is extracted based at least in part on the acquired voice presence probability (i.e. Some embodiments may apply voice activity detection (VAD) techniques. Such techniques may determine whether speech is present in an audio input based on various quantitative aspects of the audio input, such as the spectral slope between one or more frames of the audio input; the energy levels of the audio input in one or more spectral bands; the signal-to-noise ratios of the audio input in one or more spectral bands; or other quantitative aspects. Paragraph 0069).

Regarding claim 9; Piersol ‘818 discloses wherein the voice interval is extracted based at least in part on extracting an interval in which the acquired voice presence probability is greater than a first reference value. (i.e. The device may then identify pauses in the speech (by identifying silent periods in the audio). As illustrated, the identified pauses include pauses 504-516. The device may compare the length of each pauses to a threshold length, where the threshold length represents a likelihood that a pause of the threshold length represents a break between utterances. As illustrated, the device may determine that only pause 506 has a length exceeding the threshold, while the other pauses have lengths that do not exceed the threshold (and thus may represent breaks within a same utterance, for example pauses between words).  Paragraph 0079)

Regarding claims 12 & 18; Claims 12 & 18 contain substantially the same subject matter as claim 1. Therefore, claims 12 & 18 are rejected on the same grounds as claim 1. However, claim 18 further discloses a non-transitory recoding medium storing one or more programs, which, when executed by one or more processors of a device, cause the device to perform operations. Piersol ‘818 discloses at Paragraph 0095 wherein a device’s computer instructions may be stored in a non-transitory manner in non-volatile memory (806/906), storage (808/908), or an external device(s). 
Regarding claims 13 & 19; Claims 13 & 19 contain substantially the same subject matter as claim 2. Therefore, claims 13 & 19 are rejected on the same grounds as claim 2.
Regarding claims 14 & 20; Claims 14 & 20 contain substantially the same subject matter as claim 3. Therefore, claims 14 & 20 are rejected on the same grounds as claim 3.
Regarding claim 15; Claim 15 contain substantially the same subject matter as claim 4. Therefore, claim 15 is rejected on the same grounds as claim 4.


Allowable Subject Matter
1.	Claims 5, 6, 10, 11, 16 & 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

2.	Claim 6 depends on indicated objected claim 5. Therefore, by virtue of its dependency, Claim 6 is also indicated as objected subject matter. 

3.	Claim 17 depends on indicated objected claim 16. Therefore, by virtue of its dependency, Claim 17 is also indicated as objected subject matter. Furthermore, Claim 17 contain substantially the same subject matter as claim 6. Therefore, claim 17 is objected on the same grounds as claim 6.

Examiners Statement of Reasons for Allowance
The cited reference (Piersol ‘818) teaches wherein a system for capturing and processing portions of a spoken utterance command that may occur before a wakeword. The system buffers incoming audio and indicates locations in the audio where the utterance changes, for example when a long pause is detected. When the system detects a wakeword within a particular utterance, the system determines the most recent utterance change location prior to the wakeword and sends the audio from that location to the end of the command utterance to a server for further speech processing.
The cited reference (Fainberg ‘256) teaches systems and methods for media playback via a media playback system include capturing sound data via a network microphone device and identifying a candidate wake word in the sound data. Based on identification of the candidate wake word in the sound data, the system selects a first wake-word engine from a plurality of wake-word engines. Via the first wake-word engine, the system analyzes the sound data to detect a confirmed wake word, and, in response to detecting the confirmed wake word, transmits a voice utterance of the sound data to one or more remote computing devices associated with a voice assistant service.
The cited references fail to disclose wherein the processor is further configured to: deactivate the VAD function in response to recognizing the first wake-up word or the second wake-up word,  acquire a voice recognition result for a command recognition interval after a wake-up word interval for a recognized wake-up word in the generated pre-processed audio signal, wherein the wake-up word interval refers to an interval of the recognized wake-up word, perform an operation based at least in part on the acquired voice recognition result, and activate the VAD function; wherein the voice recognition result for the command recognition interval is acquired based at least in part on using speech engines of a voice recognition platform corresponding to the recognized wake-up word, wherein the speech engines comprise at least a speech-to-text (STT) engine, a natural language processing (NLP) engine, or a voice synthesis engine; wherein the voice interval is extracted based at least in part on extracting an interval in which a value obtained by multiplying an amplitude of the pre-processed audio signal and the voice presence probability is greater than a second reference value; wherein the processor is further configured to deactivate the VAD function based on the artificial intelligence apparatus operating in a voice registration mode and to activate the VAD function after a voice registration function in the voice registration mode terminates. As a result, and for these reasons, Examiner indicates Claims 5, 6, 10, 11, 16 & 17 as allowable subject matter. 


Relevant Prior Art References Not Relied Upon
1.	Jeong (US 20180240456 A1) - This specification relates to a method for controlling an artificial intelligence system which performs a multilingual processing based on artificial intelligence technology. The method for controlling an artificial intelligence system which performs a multilingual processing includes: receiving voice information through a microphone; determining a language of the voice information, based on a preset reference; selecting a specific voice recognition server from a plurality of voice recognition servers which process different languages, based on a result of the determination; and transmitting the voice information to the selected specific voice recognition server.

2.	Lee et al. (US 20180158460 A1) - A lamp device for inputting or outputting a voice signal and a method of driving the same. The method of driving a lamp device includes receiving an audio signal; performing voice recognition of a first audio signal among the received audio signals; generating an activation signal based on the voice recognition result; transmitting the activation signal to the external device; receiving a first control signal from the external device; and transmitting a second audio signal among the received audio signals to the external device in response to the first control signal. Alternatively, various exemplary embodiment may be further included.

3.	Bocklet et al. (US 20190043488 A1) - A method and system are directed to autonomous neural network keyphrase detection and includes generating and using a multiple element state score vector by using neural network operations and without substantial use of a digital signal processor (DSP) to perform the keyphrase detection.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARCUS T. RILEY, ESQ. whose telephone number is (571)270-1581. The examiner can normally be reached 9-5 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tammy P. Goddard can be reached on 517-272-7773. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARCUS T. RILEY, ESQ.
Primary Examiner
Art Unit 2677



/MARCUS T RILEY/Primary Examiner, Art Unit 2677