DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 36 and 38 are objected to because of the following informalities:  
Claim 36 recites “the the” at the end of the 6th to last line.
Claim 38 recites “at first sample rate” in line 2 which should be –at a first sample rate--.
Appropriate correction is required.

Claim Interpretation
	“the audio data” in claim 39 is not interpreted as referring to “the contiguous audio data” in claim 36.

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
The “audio interface”, “speech onset detector”, “buffer”, “combiner”, and “audio interface control” in claim 21.

The “communication interface”, “audio interface”, “speech onset detector”, “combiner”, “wake-up phrase detector”, and “audio interface control” in claim 36.
“the audio interface”, “the speech onset detector” and “a threshold computation module” in claims 38-40.
All of the names of these elements are functional words (i.e. functional element names are both generic placeholders and functional language), and all descriptions of these elements are functional, and there is not sufficient structure in the claim to perform the claimed functions.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 24-25, 27, 31, 33, 34, 36-40 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 

As per Claim 24 and 31, the original Specification (i.e. the original Specification of Parent Application 16/016344 and Provisional Application 62/641,767, hereafter original Specification), where this application is a continuation of Parent Application 16/016,344, and not a continuation-in-part) does not have written description for “wherein/when each second interval is zero”
very close in time, the separation between two consecutive samples logically cannot be zero unless those two samples are the same sample, and Applicant’s original Specification only appears to describe continuous sampling (which suggests constantly obtaining samples, such that consecutive samples are very close in time to each other), and does not appear to describe where the sampling intervals are zero.

Claims 27 and 33 and 39 recite waking/turning-on a speech onset detector responsive to providing the first sample of the audio data which does not appear to be in the original Specification.  Paragraph 29 of the original Specification describes determining whether to pass audio data on to SOD 223 when a sound wave meets or exceeds an activation threshold, but no part of the original Specification appears to describe where the speech onset detector is sleeping and wakes up when it receives the audio data that was passed on, and no part of the original Specification appears to describe where the provision of the audio data to the speech onset detector is the trigger provided in response to the threshold being met/exceeded (as opposed to sending a wake-up signal to the speech onset detector before the audio data that is passed on).

Claim 34 using the updated threshold activity level to trigger sampling which does not appear to be in the original Specification.  Paragraph 29 of the original Specification describes determining whether to pass audio data on to SOD 223 when a sampling of audio data (it appears to only affect whether the audio data that is sampled is passed on).

	Claim 36 recites “wherein the communication interface is configured to wirelessly transmit at least a portion of the second samples of the audio data sampled at the second intervals to a network, responsive to detection of the wake up phrase” which does not appear to have written description in the original Specification.  Paragraph 21 describes wireless network(s) but not where any audio data is transmitted wirelessly in response to detecting a wake up phrase.

	The dependent claims incorporate the issues of their respective parent claims.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 24-25, 29, 31, 33, 37, are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

zero (because two samples logically cannot be spaced zero seconds apart unless they are the same sample).  It is therefore not clear if Applicant actually meant to recite “wherein/when each second interval” in claims 24 and 31. 

	Claim 25 further recites “the buffer” which is ambiguous (claim 21 recites a buffer and claim 25 recites “further comprising a buffer” such that there are two different buffers and it is not clear which one “the buffer” is supposed to refer to).

	Claim 29 recites “the detection by the speech onset detector of speech onset in the first sample of the audio data comprises the audio interface sampling the first sample of the audio data at a second sample rate” which is confusing because it is describing that a function performed by one component comprises another component performing a function, when claim elements typically perform their respective functions.

	Claim 33 recites “the providing of the first sample of the audio data” which is ambiguous because Claim 30 recites “providing a first sample of audio data” and claim 32 recites “providing the first sample of the audio data” where these two recitations of “providing…” do not need to refer to the same providing, and therefore it is not clear 

	Claim 37 recites ”the second interval” and “the first interval” which are ambiguous (Claim 36 recites multiple first intervals and multiple second intervals, and it is not clear which one of the multiple “first intervals” and “second intervals” are the ones that “the second interval” and “the first interval” in claim 37 are supposed to refer to).

	The dependent claims incorporate the issues of their respective parent claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 21, 30, 35, is/are rejected under 35 U.S.C. 103 as being unpatentable over Rossum et al. (US 2016/0196838), hereafter Rossum.

As per Claims 21, 30, Rossum suggests (along with the corresponding method of claim 30) An audio processing device, comprising: an audio interface operable to sample audio data; a speech onset detector; a buffer; a combiner; and an audio interface control, wherein the audio interface is operable to provide a first sample of the audio data to the speech onset detector, wherein responsive to detection by the speech onset detector of speech onset in the first sample of the audio data, the audio interface control is operable to switch the audio processing device from capturing second samples of the audio data at first intervals, to capturing the second samples of the audio data at second intervals, wherein each second interval is shorter than each first interval, and wherein the combiner is operable to provide contiguous audio data using at least one portion of the second samples of the audio data captured at the first intervals and the second samples of the audio data captured at the second intervals (Figures 1-3; paragraphs 3-6, 9, 23, 26, 29, 33, 34, 36, 37, 38, 39, 40, 41, 45, 46;
“An audio processing device,”: Paragraph 36 describes a system which includes at least one microphone [also referred to by DMIC] coupled to either an external or host DSP, which suggests an embodiment where “host” DSP refers to a DSP that is part of the same device as the microphone[s] [since “host” is described as an alternative to “external” which suggests where a “host DSP” is an “internal” component].  Figure 2 depicts microphone[s] and a “processor 210” which is part of the same device, and paragraph 33 describes where the processor 210, in one example, includes a DSP.  These portions suggest “An audio processing device” [a device including the DMIC and the DSP]
“comprising: an audio interface operable to sample audio data;”: Figure 3 and paragraph 36 describe where the DMIC/microphone 120 [part of “An audio processing device” including the DMIC/microphone and a DSP, suggested by paragraphs 33, 36, and Figures 2-3 as discussed above] includes, among other things, a transducer 302, 
“a speech onset detector;”: Figure 3 depicts where the DMIC/microphone 120 [part of “An audio processing device” including the DMIC/microphone and a DSP, suggested by paragraphs 33, 36, and Figures 2-3 as discussed above] includes a vocalization detector 320.  Paragraph 29 describes where vocalization detection is synonymous with “voice activity detection” [which conventionally/commonly in the art refers to detecting the presence or absence of speech] and buffering audio data significantly prior to the vocalization detection.  Paragraph 37 describes actions done “prior to the vocalization detection” and paragraph 38 describes where certain actions are done “when the DMIC 120 detects a vocalization”.  These paragraphs suggest where detecting “vocalization” detects presence of speech [at least because it would be unusual to call absence of speech “vocalization”], where no vocalization [i.e. VAD detection detecting no presence of speech] is detected for a period of time “prior to the vocalization detection” such that “when the DMIC… detects a vocalization”, the DMIC [particularly the vocalization detector 320 of the DMIC] is detecting speech after a period of non-speech [i.e. the vocalization detector 320 detects a “speech onset”].  These portions suggest where the “audio processing device” [device including the DMIC 120 and the DSP] comprises “a speech onset detector” [includes a vocalization detector in the microphone that detects presence of speech after a period of non-speech]

“a combiner;”: As discussed above, paragraphs 33, 36 and Figure 2-3 suggest “An audio processing device” [a device including the DMIC and the DSP].  Paragraph 41 describes where one of the functions of the DSP is to “pre-pend” buffered data to real-time audio data [at least suggested to be combining the buffered data with the real-time audio data].  These portions thus suggest where the “audio processing device” comprises “a combiner” [i.e. a portion of the DSP that pre-pends/combines buffered data to real-time audio data]
“and an audio interface control”: As discussed above, paragraphs 33, 36 and Figure 2-3 suggest “An audio processing device” [a device including the DMIC and the DSP].  Also, as discussed above, the combination of the transducer 302, the amplifier 304, the A/D converter 306 and the PDM 308 can be interpreted as “an audio interface operable to sample audio data” [as suggested by paragraphs 9, 36, 37, and Figure 3].  Paragraphs 38-39 describe where the DSP, among other things, outputs a clock on the CLK line appropriate for receiving real-time PDM 308 audio data from the DMIC 120, and where the DMIC 120 responds to the CLK input 312 by switching from the internal sample rate to the sample rate of the provided clock line.  These portions suggest where the “audio processing device” comprises “an audio interface control” [a portion of the DSP that sends a clock signal to the PDM, thereby “controlling” the combination “audio interface” to change its sample rate to the provided CLK sample rate].

“wherein responsive to detection by the speech onset detector of speech onset in the first sample of the audio data, the audio interface control is operable to switch the audio processing device from capturing second samples of the audio data at first intervals, to capturing the second samples of the audio data at second intervals, wherein each second interval is shorter than each first interval,”: As discussed above [see portions of this rejection directed to “an audio interface” and “a speech onset detector”, “and an audio interface control” and “wherein the audio interface is operable 
“and wherein the combiner is operable to provide contiguous audio data using at least one portion of the second samples of the audio data captured at the first intervals and the second samples of the audio data captured at the second intervals”: As discussed above [see portion of this rejection directed to “a combiner;”, and “wherein responsive to detection by the speech onset detector of speech onset in the first sample of the audio data, the audio interface control is operable to switch the audio processing device from capturing second samples of the audio data at first intervals, to capturing the second samples of the audio data at second intervals, wherein each second interval is shorter than each first interval,”, incorporated here by reference] Rossum suggests where the “audio processing device” comprises “a combiner” [i.e. a portion of the DSP that pre-pends/combines buffered data to real-time audio data] and responsive to 
For Claim 30, it is directed to a method equivalent of the functions performed by the elements of the “audio processing device” in claim 21, and so is rejected under similar rationale [i.e. for each of the limitations in claim 30, see the portions of the rejection of claim 21, above, which correspond to the limitations in claim 21 that correspond to the limitations in claim 30 for the rationale used to reject the limitations in claim 30]
“providing a first sample of audio data; detecting speech onset in the first sample of the audio data;” and “responsive to detecting the speech onset” in claim 30 correspond to “wherein the audio interface is operable to provide a first sample of the audio data to the speech onset detector, wherein responsive to detection by the onset detector of speech onset in the first sample of the audio data” in claim 21
“responsive to detecting the speech onset, switching from capturing second samples of the audio data at first intervals to capturing the second samples of the audio 
and
“and providing contiguous audio data using the second samples of the audio data captured at the second intervals and at least one of the second samples of the audio data captured at the first intervals” in claim 30 corresponds to “wherein the combiner is operable to provide contiguous audio data using at least one portion of the second samples of the audio data captured at the first intervals and the second samples of the audio data captured at the second intervals” in claim 21).

As per Claim 35, Rossum suggests wherein the capturing the second samples of the audio data at second intervals comprises sampling the second samples of the audio data at a first sample rate and the providing the first sample of the audio comprises sampling the first sample of the audio data at a second sample rate, wherein the first sample rate is greater than the second sample rate (Figures 1-3; paragraphs 3-6, 9, 23, 26, 29, 33, 34, 36, 37, 38, 39, 40, 41, 45, 46;
As discussed in the rejection of claims 21 and 30, paragraph 37 describes where the DMIC analyzes audio data to determine whether a vocalization has occurred, which, 
Paragraph 37 at least suggests where, prior to vocalization detection, “the DMIC… operating from its internal oscillator” analyzes audio data to determine whether a vocalization has occurred and where audio data is buffered into the buffer 310.  Paragraph 9 describes where buffered audio data “was previously acquired at a sample rate determined by the internal oscillator”.  Paragraphs 38-39 suggests where, “when the DMIC 120 detects vocalization” the DMIC 120 switches from the internal sample rate to the DSP-provided CLK rate [i.e. where the DMIC does not use the CLK rate until after vocalization is detected, which at least suggests that the audio data which is analyzed by the vocalization detector to detect vocalization is sampled at the internal oscillator sample rate].
Paragraph 37 describes where, prior to vocalization, the DMIC, among other things, buffers audio data into a recirculating memory using buffer 310 and paragraph 9 describes where buffered audio data was previously acquired at a sample rate determined by the internal oscillator [which suggests that the buffered data is captured at the internal oscillator sample rate].  Paragraphs 38-40 and 45, as discussed above, describe switching to a DSP-provided CLK line 312 sample rate which is higher than the 
Rossum thus further suggests “wherein the capturing the second samples of the audio data at second intervals comprises sampling the second samples of the audio data at a first sample rate” [capturing, at a higher DSP-provided CLK 312 sampling rate, the “second” samples of all “audio data” received by the DMIC/microphone over time which are not the “first” sample provided to the vocalization detector comprises the transducer+amplifier+A/D-converter+PDM “audio interface” sampling the “second” samples of all “audio data” received by the DMIC/microphone over time which are not the “first” sample provided to the vocalization detector at the DSP-provided-CLK-312/”first” sampling rate] 
“and the providing the first sample of the audio comprises sampling the first sample of the audio data at a second sample rate, wherein the first sample rate is greater than the second sample rate” [where providing the “first” portion/”sample” of all audio data received by the DMIC/microphone over time to the vocalization detector comprises sampling the “first” portion/”sample” of all audio data received by the DMIC/microphone over time at the internal-oscillator/”second” sampling rate, where, as 
To be clear, the samples are “sampled” in the sense that they are acquired/captured/sampled from all audio data received by the DMIC/microphone over time in order to form the captured/sampled samples.)

Claims 22-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rossum, as applied to Claim 21, above, and further in view of Scott et al. (US 9,398,367), hereafter Scott.

As per Claim 22, Rossum suggests a… detector operable to process the contiguous audio data to recognize a… in the second samples of the audio data captured at the second intervals (Figures 1-3; paragraphs 3-6, 9, 23, 26, 29, 33, 34, 36, 37, 38, 39, 40, 41, 45, 46;
Paragraph 41 describes where “The buffered data may be pre-pended to the real-time audio data for the purposes of keyword recognition” [where buffered data sampled at an internal oscillator sampling rate pre-pended to the real-time audio data sampled at a DSP-provided CLK sampling rate is interpreted as “the contiguous audio data”, as discussed in the rejection of claim 21].  Paragraphs 3-5 describe where keyword recognition follows vocalization detection and includes examining an utterance and results in a keyword match or no match, where vocalization detection determines whether a person begins to utter a possible keyword.  Paragraph 6 also describes where a DSP is used to perform computations for detecting keywords.  Also, as 
The portions discussed in the previous paragraph further suggest where an utterance of a keyword is in the real-time audio data sampled at the CLK sampling rate [because the buffered data is sampled “prior to vocalization” which is suggested to be during a period of non-speech, and because the real-time audio data is sampled at the DSP-provided CLK sampling rate in response to detecting vocalization, such that the real-time audio data is suggested to include speech following the “onset” that caused the vocalization detector to detect vocalization] and where the DSP includes a portion that performs keyword recognition [keyword recognizer/”detector”] that recognizes a keyword in “the contiguous audio data” [since keyword recognition is performed on speech, the speech is suggested to be in the real-time audio data as just discussed, and the buffered data is pre-pended to the real-time audio data which suggests that the audio data analyzed to recognize a keyword is the buffered data pre-pended to the real-time audio data]

Rossum does not, but Scott suggests a wake-up phrase detector operable to process the contiguous audio data to recognize a wake phrase in the second samples of the audio data captured at the second intervals (col. 4, lines 4-22; col. 5, lines 4-27; col. 5, line 61 – col. 6, line 17;
Col. 4, lines 4-22 describes where an aural cue can include a “specific keyword[s]” and also where keyword[s] may be referred to as a “wakephrase”.  Col. 5, lines 4-27 describe where a keyword spotting unit can identify or otherwise detect the presence of a predetermined keyword or a predetermined phrase, and col. 5, line 61 – col. 6, line 17 describes where the keyword spotting unit can monitor an audio signal in order to identify the presence of a wakephrase.
Scott thus suggests where a keyword recognized by the keyword recognizer/“detector” in “the contiguous audio” [as suggested by Rossum, as discussed above] is more specifically a wakephrase/”wake-up phrase”/”wake phrase” [such that the keyword recognizer/”detector” is more specifically a “wake-up phrase detector”])
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of keyword with 
	
As per Claim 23, Rossum suggests wherein the… detector is operable to recognize the… using the at least one captured portion of the second samples of the audio data captured at the first intervals (Figures 1-3; paragraphs 3-6, 9, 23, 26, 29, 33, 34, 36, 37, 38, 39, 40, 41, 45, 46;
Rossum suggests “a… detector operable to process the contiguous audio data to recognize a… in the second samples of the audio data captured at the second intervals” as discussed in the rejection of claim 22 [the-portion-of-the-DSP-that-performs-keyword-recognition/keyword-“detector” performs-keyword-recognition-on/”processes” the-buffered-audio-data-pre-pended-to-the-real-time-audio-data/”contiguous audio data” to recognize a keyword in the real-time-audio-data-sampled-at-the-DSP-provided-CLK-sampling-rate/”the second samples of the audio data captured at the second intervals”]

Rossum does not, but Scott suggests wherein the wake-up phrase detector is operable to recognize the wake phrase using the at least one captured portion of the second samples of the audio data captured at the first intervals (col. 4, lines 4-22; col. 5, lines 4-27; col. 5, line 61 – col. 6, line 17;
Same combination as discussed in the rejection of claim 22, where the keyword recognized in the real-time audio data is more specifically a wakephrase [such that the recognized keyword is more specifically a “wake phrase” and such that the keyword “detector” is more specifically a “wake-up phrase detector”])
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of keyword with another because the prior art teaches the claimed invention except for the substitution of a keyword which is not necessarily a wake-up phrase with a keyword which is.  Scott .

Claim 36-38 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rossum et al. (US 2016/0196838), hereafter Rossum, in view of Scott et al. (US 9,398,367), hereafter Scott and Mutagi et al. (US 10,027,662), hereafter Mutagi.

As per Claims 36, Rossum suggests (along with the corresponding method of claim 30) An electronic communication device, comprising: a microphone; a communication interface configured to wirelessly transmit and receive data; and an audio processing device comprising an audio interface coupled to the microphone and configured to sample audio data, a speech onset detector, a combiner, a… detector, and an audio interface control, wherein the audio interface is operable to provide a first sample of the audio data to the speech onset detector, wherein responsive to detection by the speech onset detector of speech onset in the first sample of the audio data, the audio interface control is operable to switch the audio interface from sampling second samples of the audio data at first intervals, to sampling the second samples of the audio data at second intervals, wherein the combiner is operable to provide contiguous audio data using at least one portion of the second samples of the audio data sampled at the first intervals and the the second samples of the audio data sampled at the second intervals, the… detector is configured to process the contiguous audio data to recognize a… (Figures 1-3; paragraphs 3-6, 9, 23, 26, 29, 33, 34, 36, 37, 38, 39, 40, 41, 45, 46;
“An electronic communication device,”: Figure 2 depicts a mobile device with a processor and memory [at least suggested to be “An electronic… device”] and which includes microphone[s] 120 [described in Figure 3 and paragraphs 36-37 as a DMIC/microphone connected to a DSP].  Paragraph 23 describes where mobile devices can be, among other things, smart phones and mobile telephones, which commonly/conventionally send/receive/”communicate” information wirelessly [at least send/receive voice data wirelessly for phone calls].  Figure 2 and paragraph 34 further describes where the mobile device includes communication devices 240 which are used, in some embodiments, to send audio/speech over a wired or wireless communications network.  Figure 1 and paragraph 26 also depicts where the mobile device in Figure 2 has two arrows pointing to/from computing cloud[s] [suggesting communications in both a sending and receiving direction] and where the cloud-based computing resource[s] are accessible over a network such as a cellular phone network [commonly/conventionally a wireless network].  These portions suggest an “electronic communication device” [a mobile device, for example a phone, in Figure 2 including a 
“comprising: a microphone;”: Figures 2-3 and paragraph 36 describe where the mobile device [the “electronic communication device”, as discussed above] includes a DMIC/microphone 120 which includes, among other things a transducer 302].  These portions further suggest where the “electronic communication device” comprises “a microphone” [the transducer 302 can be interpreted as a microphone, see e.g. Padmanabhan, US 2002/0010578, "speech utterance preprocessor... may include... audio-to-analog transducer (microphone) and an analog-to-digital converter", paragraph 19, and Erten, US 2002/0116197, "audio transducers 68 is a microphone pointed in the general direction of speaker 24", paragraph 45]
“a communication interface configured to wirelessly transmit and receive data;”: Figure 2 depicts a mobile device with a processor and memory [at least suggested to be “An electronic… device”] and which includes microphone[s] 120 [described in Figure 3 and paragraphs 36-37 as a DMIC/microphone connected to a DSP].  Paragraph 23 describes where mobile devices can be, among other things, smart phones and mobile telephones, which commonly/conventionally send/receive/”communicate” information wirelessly [at least send/receive voice data wirelessly for phone calls].  Figure 2 and paragraph 34 further describes where the mobile device includes communication devices 240 which are used, in some embodiments, to send audio/speech over a wired or wireless communications network.  Figure 1 and paragraph 26 also depicts where the mobile device in Figure 2 has two arrows pointing to/from computing cloud[s] 
“and an audio processing device”: Paragraph 36 describes a system which includes at least one microphone [also referred to by DMIC] 120 [an element of the mobile device in Figure 2] coupled to either an external or host DSP, which suggests an embodiment where “host” DSP refers to a DSP that is part of the same device as the microphone[s] [since “host” is described as an alternative to “external” which suggests where a “host DSP” is an “internal” component].  Figure 2 depicts microphone[s] 120 and a “processor 210” which is part of the same device, and paragraph 33 describes where the processor 210, in one example, includes a DSP.  These portions suggest where the “electronic communication device” [mobile device] further comprises “An audio processing device” [the combination of elements 304, 306, 308, 320, 310, and DSP 350 and the connecting lines between them, and the connecting line between elements 302 and 304 in Figure 3 is interpreted as “an audio processing device”, where the combination of elements is part of the mobile device “electronic communication device” in Figure 2, where, unlike in the rejection of claims 21 and 30, the audio not interpreted as including the transducer 302 which is, as discussed above, mapped to the claimed “microphone”]
“comprising an audio interface coupled to the microphone and configured to sample audio data,”: As discussed above, [see portion of this rejection directed to “an audio processing device”, incorporated here by reference] the combination of elements 304, 306, 308, 320, 310, and DSP 350 and the connecting lines between them, and the connecting line between elements 302 and 304 in Figure 3 is interpreted as “an audio processing device”, where the combination of elements is part of the mobile device “electronic communication device” in Figure 2.  Figure 3 and paragraph 36 describe where the DMIC/microphone 120 includes, among other things, a transducer 302, an amplifier 304, an A/D converter 306, a PDM 308, and lines connecting the PDM to the buffer and the vocalization detector and the DSP.  Paragraph 37 describes where the DMIC 120 operates on an internal oscillator which determines the internal sample rate “prior to vocalization detection” and “buffers audio data” into the buffer.  Paragraph 9 describes where buffered audio data “was previously acquired at a sample rate determined by the internal oscillator”.  These portions suggest where the “audio processing device” comprises “an audio interface coupled to the microphone and configured to sample audio data,” [the combination of the the amplifier, the A/D converter, and the PDM, the line connecting the amplifier to the transducer, and the lines connecting the PDM to the buffer, the vocalization detector and the DSP can be interpreted as an “audio interface” at least in the sense that it serves as an “interface” that is used to communicate “audio” information from the transducer to the buffer and is used to sample “audio data”, where the combination is “coupled to the microphone” 


“a… detector,”: As discussed above, [see portion of this rejection directed to “an audio processing device”, incorporated here by reference] the combination of elements 304, 306, 308, 320, 310, and DSP 350 and the connecting lines between them, and the connecting line between elements 302 and 304 in Figure 3 is interpreted as “an audio processing device”, where the combination of elements is part of the mobile device “electronic communication device” in Figure 2. Paragraph 41 describes where “The buffered data may be pre-pended to the real-time audio data for the purposes of keyword recognition”.  Paragraphs 3-5 describe where keyword recognition follows vocalization detection and includes examining an utterance and results in a keyword match or no match, where vocalization detection determines whether a person begins to utter a possible keyword.  Paragraph 6 also describes where a DSP is used to perform computations for detecting keywords.  These portions suggest “a… detector” [a-portion-of-the-DSP-that-performs-keyword-recognition/keyword-“detector”]

“wherein the audio interface is operable to provide a first sample of the audio data to the speech onset detector,”: As discussed above [see portions of this rejection directed to “an audio interface” and “a speech onset detector”, incorporated here by reference], the combination of elements 304, 306, 308, 320, 310, and DSP 350 and the connecting lines between them, and the connecting line between elements 302 and 304 
“wherein responsive to detection by the speech onset detector of speech onset in the first sample of the audio data, the audio interface control is operable to switch the audio interface from sampling second samples of the audio data at first intervals, to sampling the second samples of the audio data at second intervals,”: As discussed above [see portions of this rejection directed to “an audio interface” and “a speech onset detector”, “and an audio interface control” and “wherein the audio interface is operable to provide a first sample of the audio data to the speech onset detector,” incorporated here by reference] Rossum suggests where the vocalization detector is a “speech onset detector” that detects speech onset [detects vocalization/speech after a period of non-speech], the “audio interface” provides a first portion/”sample” of all “audio data” received by the DMIC/microphone over time to the vocalization detector so that the vocalization detector can determine whether a vocalization has occurred in the “first” 
“wherein the combiner is operable to provide contiguous audio data using at least one portion of the second samples of the audio data sampled at the first intervals and the the second samples of the audio data sampled at the second intervals,”: As discussed above [see portion of this rejection directed to “a combiner;”, and “wherein responsive to detection by the speech onset detector of speech onset in the first sample of the audio data, the audio interface control is operable to switch the audio interface from sampling second samples of the audio data at first intervals, to sampling the second samples of the audio data at second intervals,”, incorporated here by reference] Rossum suggests where the “audio processing device” comprises “a combiner” [i.e. a portion of the DSP that pre-pends/combines buffered data to real-time audio data] and responsive to detecting vocalization-after-a-period-of-non-speech/”speech onset” in the “first” audio data “sample” provided to the vocalization detector to be analyzed to detect vocalization, the portion of the DSP that sends a clock signal to the PDM “audio interface control” switches the “audio interface” from sampling, at a lower internal oscillator sampling rate, “second” samples of all “audio data” received by the DMIC/microphone over time which are not the “first” sample provided to the vocalization 
“the… detector is configured to process the contiguous audio data to recognize a…,”:  As discussed above [see portion of this rejection directed to “a… detector”, incorporated here by reference], Rossum suggests “a… detector” [a-portion-of-the-DSP-that-performs-keyword-recognition/keyword-“detector”].  Paragraph 41 describes where “The buffered data may be pre-pended to the real-time audio data for the purposes of keyword recognition” [where buffered data sampled at an internal oscillator sampling rate pre-pended to the real-time audio data sampled at a DSP-provided CLK sampling rate is interpreted as “the contiguous audio data”, as discussed in the rejection of claim 21].  Paragraphs 3-5 describe where keyword recognition follows vocalization detection and includes examining an utterance and results in a keyword match or no match, where vocalization detection determines whether a person begins to utter a possible keyword.  Paragraph 6 also describes where a DSP is used to perform computations for detecting keywords.  Also, as discussed in the rejection of claim 21, paragraph 37 describes actions done “prior to the vocalization detection” and paragraph 38 describes where certain actions are done “when the DMIC 120 detects a vocalization”.  These paragraphs suggest where detecting “vocalization” detects presence of speech [at least because it would be unusual to call absence of speech “vocalization”], where no vocalization [i.e. VAD detection detecting no presence of speech] is detected for a period of time “prior to the vocalization detection” such that “when the DMIC… detects a 
The portions discussed in the previous paragraph further suggest where an utterance of a keyword is in the real-time audio data sampled at the CLK sampling rate [because the buffered data is sampled “prior to vocalization” which is suggested to be during a period of non-speech, and because the real-time audio data is sampled at the DSP-provided CLK sampling rate in response to detecting vocalization, such that the real-time audio data is suggested to include speech following the “onset” that caused the vocalization detector to detect vocalization] and where the DSP includes a portion that performs keyword recognition [keyword recognizer/”detector”] that recognizes a keyword in “the contiguous audio data” [since keyword recognition is performed on speech, the speech is suggested to be in the real-time audio data as just discussed, and the buffered data is pre-pended to the real-time audio data which suggests that the audio data analyzed to recognize a keyword is the buffered data pre-pended to the real-time audio data]
These portions suggest “the… detector is configured to process the contiguous audio data to recognize a…,” [the-portion-of-the-DSP-that-performs-keyword-recognition/keyword-“detector” performs-keyword-recognition-on/”processes” the-buffered-audio-data-pre-pended-to-the-real-time-audio-data/”contiguous audio data” to recognize a keyword in the real-time-audio-data-sampled-at-the-DSP-provided-CLK-sampling-rate/”the second samples of the audio data sampled at the second intervals”])
a wake-up phrase detector and the wake-up phrase detector is configured to process the contiguous audio data to recognize a wake phrase (col. 4, lines 4-22; col. 5, lines 4-27; col. 5, line 61 – col. 6, line 17;
Col. 4, lines 4-22 describes where an aural cue can include a “specific keyword[s]” and also where keyword[s] may be referred to as a “wakephrase”.  Col. 5, lines 4-27 describe where a keyword spotting unit can identify or otherwise detect the presence of a predetermined keyword or a predetermined phrase, and col. 5, line 61 – col. 6, line 17 describes where the keyword spotting unit can monitor an audio signal in order to identify the presence of a wakephrase.
Scott thus suggests where a keyword recognized by the keyword recognizer/“detector” in “the contiguous audio” [as suggested by Rossum, as discussed above] is more specifically a wakephrase/”wake-up phrase”/”wake phrase” [such that the keyword recognizer/”detector” is more specifically a “wake-up phrase detector”])
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of keyword with another because the prior art teaches the claimed invention except for the substitution of a keyword which is not necessarily a wake-up phrase with a keyword which is.  Scott teaches that a keyword which is a wake-up phrase was known in the art.  One of ordinary skill in the art could have substituted one type of keyword with another to obtain the predictable results of a mobile device which samples audio data at an internal oscillator sampling rate, and which, in response to detecting vocalization, samples audio data at a DSP-provided CLK sampling rate, which pre-pends buffered audio data 
Rossum in view of Scott, do not, but Mutagi suggests and wherein the communication interface is configured to wirelessly transmit at least a portion of the second samples of the audio data sampled at the second intervals to a network, responsive to detection of the wake up phrase (Figure 2; col. 7, lines 24-55; col. 8, line 50 - col. 9, line 6; Figure 9; col. 26, lines 21-51;
Rossum suggests “the communication interface” as discussed above [see portion of this rejection based on Rossum and directed to “a communication interface”] including where Rossum’s mobile device can communication wirelessly using communication devices 240, including by sending speech/audio data [Figure 2, paragraphs 26 and 34].  Also, as discussed above, [see portion of this rejection based on Scott, and the portion of this rejection based on Rossum and directed to “the… detector is configured to process the contiguous audio data to recognize a…”] Rossum and Scott suggest where a portion of the DSP performs keyword/wake-phrase recognition on “the contiguous audio data” to recognize a keyword/wake-phrase in “the contiguous audio data” [at least suggested to be in the real-time-CLK-sampling-rate-sampled portion of “the contiguous audio data”].  Paragraph 3-5 of Rossum also describes where processing that follows keyword recognition is ASR.
In Mutagi, col. 8, line 50 – col. 9, line 6 describes where, in response to detecting a wakeword, a local device may begin transmitting audio data corresponding to input 
Mutagi suggests where the response to detecting/recognizing a wakephrase/wake-up phrase/keyword in the Rossum/Scott combination is to send to a server, via the “communication interface” of the mobile device [wirelessly, as suggested by Rossum], audio data including the wakephrase/wake-up phrase/keyword [i.e. the real-time-audio-data-sampled-at-the-DSP-provided-CLK-sampling-rate/”the second samples of the audio data sampled at the second intervals” which includes the detected/recognized keyword/wake-phrase/wake-up phrase], such that “the 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to combine prior art elements according to known methods because the prior art included each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference (Rossum teaches a mobile device that can send speech/audio wirelessly, and which can perform keyword recognition, Scott teaches where keyword recognition is wakephrase recognition/detection, and Mutagi teaches where a response to detecting a wakeword/keyword is sending audio including the detected wakeword to a server).  One of ordinary skill in the art could have combined the elements as claimed by known methods (by adding Mutagi’s transmission of audio data including a keyword/wakeword to the functions performed by Rossum’s mobile device in response to detecting a keyword/wake-phrase), and that in combination, each element merely performs the same function as it does separately (the transmission follows the detection of the keyword/wake-phrase).  The combination is the predictable results of a mobile device which samples audio data at an internal oscillator sampling rate, and which, in response to detecting vocalization, samples audio data at a DSP-provided CLK sampling rate, which pre-pends buffered audio data sampled at the internal oscillator sampling rate to real-time audio data sampled at the DSP-provided CLK sampling rate for the purposes of keyword recognition, which performs keyword recognition, and which can send 
	
As per Claim 37, Rossum suggests wherein the second interval is less than the first interval (Figures 1-3; paragraphs 3-6, 9, 23, 26, 29, 33, 34, 36, 37, 38, 39, 40, 41, 45, 46;
Same combination applied to reject claim 36, where, as discussed in the rejection of claim 36, paragraphs 38-40 and 45, as discussed above, describe switching to a DSP-provided CLK line 312 sample rate which is higher than the internal sample rate which suggests where “the second interval is less than the first interval” [sampling at a higher rate samples more often in the same period of time such that the intervals between samples is shorter when the higher DSP-provided CLK sampling rate is used than when the lower internal oscillator sampling rate is used]).

As per Claim 38, Rossum suggests wherein the audio interface is configured to sample the second samples of the audio data at first sample rate and the audio interface is configured to sample the first sample of the audio data at a second sample rate, wherein the first sample rate is greater than the second sample rate (Figures 1-3; paragraphs 3-6, 9, 23, 26, 29, 33, 34, 36, 37, 38, 39, 40, 41, 45, 46;
Same combination as discussed in the rejection of claim 36.
As discussed in the rejection of claim 36 [portions directed to “an audio interface coupled to the microphone and configured to sample audio data,” “wherein the audio 
Paragraphs 37-41 and 45 suggest where “wherein the audio interface is configured to sample the second samples of the audio data at first sample rate and the audio interface is configured to sample the first sample of the audio data at a second sample rate, wherein the first sample rate is greater than the second sample rate” [the switch to the CLK sampling rate occurs after vocalization is detected which suggests .

Allowable Subject Matter
Claim 29 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.

As per Claim(s) 24 (and similarly claim 31 and consequently claim 25), the prior art of record does not teach or suggest the combination of all limitations in claim(s) 21 and 24, including (i.e. in combination with the remaining limitations in claim[s] 21 and 24) capture the second samples (plural) of the audio data continuously according to the second intervals, wherein each second interval is zero.
2015/0340042 teaches “When the voice system is enabled to monitor the acoustic environment, act 210 may be performed continuously or periodically at a frequency that provides an appearance of continuous monitoring, even though it may not be strictly continuous” (paragraph 48) which teaches where “continuous” monitoring where the difference between samples is very close to zero but not exactly zero (i.e. all 
2015/0269954 teaches “there is a close match between the MFCCs at both sampling rates, e.g., between 0-1 second and between 4-5 seconds” (paragraph 45).  In this paragraph the two sampling rates appear to be 8 kHz and 4 kHz.  0-1 and 4-5 seconds in this reference appears to refer to the time period in which the MFCCs of the 8kHz and 4kHz rates match each other well in Figures 5A-5B, and not to the sampling periods.
2012/0177220 teaches “a variable t that indicates a sampling period is substituted with zero” (paragraph 87) where in paragraph 88, “If it is determined that the variable t is zero, there is only one piece of audio data, and t is incremented by +1 and then more data is acquired and stored.  Based on the context of paragraphs 87-88 and Figure 9, “sampling period” in this case does not refer to the sampling rate but rather refers to a specific time/index of a particular sample acquired at a particular time (see also paragraphs 92-93 which describes audio data “in the next sampling period”).  Additionally, in Rossum, “the second intervals” refers to sampling periods between samples which is different from the “intervals” in this reference”
	2020/0265861 teaches where a periodic signal detector initializes a sample interval as the variables to zero.  Particularly since Figure 4 depicts N0 N1 and N2 spaced and Figure 5 depicts N0 and N1 as values that can be between 2 and 8, the sample intervals also appear to be indices/times and not a sampling interval in the sampling rate sense.  Paragraph 32 also describes where a sample interval is “two to 

As per Claim 26 (and similarly claims 32 and 39, and consequently claims 27-28, 33-34, and 40) the prior art of record does not teach or suggest the combination of all limitations in claim(s) 21 and 26, including (i.e. in combination with the remaining limitations in claim[s] 21 and 26) wherein the audio interface is operable to provide the first sample of the audio data to the speech onset detector responsive to sound waves meeting or exceeding a threshold activity level, wherein the audio data is representative of the sound waves (additionally, for claim 32, even assuming providing the audio including a wakeword in Mutagi could be interpreted as a different “providing the first sample of the audio data”, it would not be provided “responsive to sound waves meeting or exceeding a threshold activity level” because the providing/transmitting-to-a-server in Mutagi is performed responsive to keyword/wakeword recognition)
2015/0340042 describes one or more VAD processing stages has determined that the acoustic input likely contains speech” (paragraph 77).  Paragraph 48 describes “When the voice system is enabled to monitor the acoustic environment, act 210 may be performed continuously or periodically at a frequency that provides an appearance of continuous monitoring, even though it may not be strictly continuous”.  This reference also describes where a subsequent stage of a plurality of processing stages is performed only if one or more previous processing stages is unable to conclude that the acoustic input corresponds to spurious acoustic activity (paragraph 34)  Paragraph 83 sound waves meet or exceed the threshold activity level, where it would not necessarily be obvious to add, to Rossum’s DMIC, a low level amplitude check at a sound wave level immediately prior to the vocalization detector (because at this point the audio is already converted into digital samples) or before the A/D conversion.  Additionally, it would not be obvious to add the thresholding that triggers collecting and processing of acoustic input (in paragraph 48) to Rossum because one of the points of Rossum is to buffer audio periodically, and to the extent that triggering collecting and processing is collecting and processing continuously, this is the function performed by Rossum based on the vocalization detector.  Paragraph 115 describes where low battery may be used to discourage “passing acoustic information on for further processing such that additional power consumption is incurred only in situations where the confidence is very high that the acoustic input includes a voice command” (suggesting where a threshold determination may lead to “passing acoustic information on for further processing) but this does not appear to describe where the acoustic input is passed on to a speech onset detector (as opposed to passing on to speech recognition or something else other than another voice activity all audio data”
2014/0244273 teaches “One or more stages of voice activity detection (VAD) can be used” (paragraph 34).  This reference does not appear to specifically teach that one stage provides the audio input to the next stage.
2008/0040109 teaches multiple VAD stages (first, second and third circuits) and associated VAD decisions in each stage (paragraphs 90-91) but these stages appear to operate on different portions of an audio signal (Figure 4).
2014/0278435 describes “performing one or more voice activity detection (VAD) processing stages that evaluate whether the acoustic input has the characteristics of voice/speech or whether the acoustic input is more likely the result of non-voice acoustic activity in the environment. VAD techniques refer generally to those that analyze one or more properties or characteristics of acoustic input (e.g., signal 
and similarly 2015/0340042 teaches “performing the at least one voice activity detection processing stage comprises performing spectral analysis on the acoustic input to evaluate whether a spectrum of the acoustic input is indicative of voice activity, performing periodicity analysis to evaluate whether signal periodicity of the acoustic input is indicative of voice activity and/or using phone loops to evaluate whether the acoustic input includes speech” (claim 28).  It is not clear if performing multiple stages of voice activity detection involves the same component performing multiple analyses or multiple components sequentially performing respective analyses (while passing audio data from one component to the next), and additionally it is not clear that one of the stages is a sound wave threshold comparison and where the audio is sent from the sound wave threshold comparison to a speech onset detection
2002/0138255 teaches two stages of voice activity detecting (paragraph 134; Figure 3; where one stage receives an input from another stage [element 22 receives an input from element 21]).  This reference does not appear to specifically teach that one stage provides the audio input to the next stage.
2014/0122078 teaches “When voice activity above the preset threshold level is detected in the audio input, the parts of the speech having the voice activity in them are then propagated to the feature creator 116. For example a command phrase like "HELLO PND" when spoken preceded and followed by pauses will have its preceding and following pauses removed by silence filter” (paragraph 32).  This reference also appears to describe where if voice activity is detected, then keyword recognition is performed (paragraph 45) The feature creator does not appear to be an onset detector.  
9478231 teaches transmitting “an interrupt signal to the DSP/CPU core… in response to detected sound energy” including in one example “the interrupt signal may be output if a filtered sound sample is above a threshold value” and “The threshold value may be dynamically updated”  
9484030 teaches “In the context of speech processing, if a specific sound is a "wakeword," once the wakeword is detected, the local device 110 may "wake" and begin transmitting audio data 111 corresponding to input audio 11 to the server(s) 120 for speech processing. Further, a local device 110 may "wake" upon detection of speech/spoken audio above a threshold, as described herein. Audio data corresponding to that audio may be sent to a server 120 for routing to a recipient device or may be sent to the server for speech processing for interpretation of the included speech (either for purposes of enabling voice-communications and/or for purposes of executing a server in response to detection of speech/spoken audio above a threshold for speech processing for interpretation of included speech whereas the claim 26 provides audio to an onset detector when the threshold is exceeded.
5983186 teaches “speech recognition devices and instruments include an input sound signal power or volume detector in communication with a central CPU for bringing the CPU out of an initial sleep state upon detection of perceived voice exceeding a predetermined threshold volume level and is continuously perceived for at least a certain period of time. If both these conditions are satisfied, the CPU is transitioned into an active mode so that the perceived voice can be analyzed against a set of registered key words to determine if a "power on" command or similar instruction has been received. If so, the CPU maintains an active state in normal speech recognition processing ensues until a "power off" command is received. However, if the perceived and analyzed voice can not be recognized, it is deemed to be background noise and the minimum threshold is selectively updated to accommodate the volume level of the perceived but unrecognized voice. Other aspects include tailoring the volume level of the synthesized voice response according to the perceived volume level as detected by the input sound signal power detector, as well as modifying audible response volume in accordance with updated volume threshold levels”.  Similar to what was discussed in the previous paragraph, this reference does not teach where providing audio to a speech onset detector is performed in response to the exceeding of the threshold (this reference appears to describe where the speech recognition CPU 
	2015/0051906 teaches where a VAD system is activated when an input signal exceeds a threshold level (“voice activity detector… use a broadband root mean square [RMS] measure of the signal energy… threshold of signal activity.  When the incoming RMS first exceeds this threshold, the VAD/SAD may be activated and the signal blocks may begin being passed to the other possible pre-processing”, paragraph 32)  This reference, however, does not specifically teach where the input audio is or is not provided to the voice activity detector based on the threshold.
2016/0284363 teaches turning an audio sensor on/off based on a voice activity detection and a threshold, where turning an audio sensor on/off suggests determining whether to provide audio to a further speech processor (“If BSM determines that voice activity… evaluate biosignal data… against the first threshold… audio sensor… may remain in an OFF or low power state until… voice activity is present… limit power consumption by limiting the activity of audio sensor… and, therefore, the activity of a downstream speech recognition system”, paragraph 41).  In this reference, however, the voice activity detector is the device which controls the audio-providing function of the audio sensor, and is not the device whose input of audio is controlled by the threshold.
	5459814 further teaches where a VAD periodically updates a threshold (“VAD periodically monitors and updates the threshold values to reflect changes in the level of background noise”).  This reference, however, does not specifically teach where the VAD is “sleeping” when it is not performing the periodic updates.

As per Claim 29, the prior art of record does not teach or suggest the combination of all limitations in claim(s) 21 and 29, including (i.e. in combination with the remaining limitations in claim[s] 21 and 29) wherein the capture of the second samples of the audio data at the first intervals comprises the audio interface sampling the second samples of the audio data at a first sample rate and the detection by the speech onset detector of speech onset in the first sample of the audio data comprises the audio interface sampling the first sample of the audio data at a second sample rate, wherein the first sample rate is greater than the second sample rate (i.e. sampling at the first intervals refers to sampling at the rate prior to speech onset detection, or, put another way, sampling at the rate that the audio interface is switched from, and not the rate that the audio interface is switched to, such that if sampling the second samples of the audio data at the first intervals/at the first sample rate samples at a rate that is greater than the second sample rate used to sample the first sample [in which speech onset is detected], then the sample rate of the first sample is even lower than the sampling rate that the audio interface is switched to upon detecting speech onset, see Specification, paragraph 47).
S. Dixon, “Onset Detection Revisited”, September 18-20, 2006, Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx’06), Montreal, Canada, teaches “Onset detection functions usually have a low sampling rate… 100 Hz] compared to audio signals” (Section 2, first paragraph)
Stowell, D and Plumbley, M, “Adaptive whitening for improved real-time audio onset detection”, 2007, In: International Computer Music Conference (ICMC) 2007, 2007-08-27 - 2007-08-31, Copenhagen, Denmark teaches where onset detection involves data reduction by converting the audio rate signal to an onset detection function ODF which is at a much lower sampling rate, and then identifying onsets in this ODF, where the ODF is determined by converting an audio signal to a stream of STFT frames and then a subsampled ODF is produced which may amount to one numerical value per STFT frame, and where onsets can be selected based on, for example, exceeding a threshold (Section 1).  In the context of the rejections of claims 21, it is not clear that this type of processing would be performed by “the audio interface” as part of sampling audio data.
Graf, S., Herbig, T., Buck, M., & Schmidt, G., “Features for voice activity detection: A comparative analysis.”, 2015, EURASIP Journal on Advances in Signal Processing, 2015, 1-15. doi:http://dx.doi.org/10.1186/s13634-015-0277-z teaches where temporal resolution of speech detection is limited and much lower than the sampling rate of an audio signal (page 2, upper right).  This reference does not appear to specifically teach where a lower sampling rate is used for sampling audio for speech detection (it appears to divide a signal into frames of N samples and then the frame is analyzed to determine whether there is speech or not.
2017/0133023 teaches “sampling rates in both branches are equal or the sampling rate in the time-domain encoder branch is lower than in the frequency domain branch” (paragraph 26) “In this aspect, the sampling rates can be as in the other aspect, or the sampling rates in the frequency domain branch are even lower than in the time-domain branch” (paragraph 27).


Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 

Claims 21-23, 26, 28, 30, 32, 35, are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 14-15 of U.S. Patent No. 10,332,543, hereafter Parent Patent 1. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of this application are rendered obvious by the claims of Parent Patent 1.

As per Claim 21, Claim 14 of Parent Patent 1 teaches An audio processing device, comprising: an audio interface operable to sample audio data; a speech onset detector; a buffer; a combiner; and an audio interface control, wherein the audio interface is operable to provide a first sample of the audio data to the speech onset detector (first 6 lines Claim 14 of Parent Patent 1)
wherein responsive to detection by the speech onset detector of speech onset in the first sample of the audio data, the audio interface control is operable to switch the audio processing device from capturing second samples of the audio data at first intervals, to capturing the second samples of the audio data at second intervals, wherein each second interval is shorter than each first interval (lines 8-14 of Claim 14 of Parent Patent 1, where continuously, by definition, is suggested to be a capturing that is as frequent as possible, such that sampling continuously is suggested to be more frequent and with shorter intervals than the periodic capturing of second samples)
and wherein the combiner is operable to provide contiguous audio data using at least one portion of the second samples of the audio data captured at the first intervals and the second samples of the audio data captured at the second intervals (lines 14-19 of Claim 14 of Parent Patent 1; as discussed in the previous paragraph, periodic capturing is suggested to be capturing at first intervals and continuous capturing is suggested to be capturing at second intervals that are shorter than the first intervals, and so providing contiguous data using the periodically captured second samples and the continuously captured second samples is suggested to provide contiguous audio data using second samples of audio data captured at first intervals and second samples of audio data captured at second intervals).

As per Claim 22, Claim 15 of Parent Patent 1 suggests Claim 22 (continuously captured is suggested to be capturing at second intervals, as discussed in the rejection of claim 1).

As per Claim 23, Claim 15 of Parent Patent 1 suggests Claim 23 (Claim 15 of Parent Patent 1 recites processing “the contiguous audio data” to recognize a wake phrase in the continuously captured audio data, where, as per Claim 14 of Parent 

As per Claim 26, Claim 14 of Parent Patent 1 suggests Claim 26 (lines 4-8, where “sound waves associated with the audio data” suggests where the audio data is the sound waves in data form [such that the audio data is representative of the sound waves])

As per Claim 28, Claim 14 of Parent Patent 1 suggests Claim 28 (last 7 lines).

As per Claims 30, 32, they are directed to methods performing steps that correspond to the functions of claims 21, 26, and are thus rejected under similar rationale (i.e. just as Claim 14 of Parent Patent 1 suggests the elements of claims 21 and 26 performing the functions of claim 21 and 26, Claim 14 of Parent Patent 1 suggests the steps of Claims 30 and 32 which correspond to the functions of claims 21 and 26).

As per Claim 35, Claim 14 of Parent Patent 1 suggests Claim 35 (the switch from periodically sampling to continuously sampling occurs after detecting speech onset, and so the first sample which is analyzed to detect the onset that causes the switch is 

Claims 36-38 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 14-15 of Parent Patent 1 in view of Mutagi et al. (US 10,027,662), hereafter Mutagi.

As per Claim 36, Claim 15 of Parent Patent 1 (interpreted as incorporating claim 14 of Parent Patent 1) suggests An electronic communication device, comprising: a microphone;… and an audio processing device (first 2 lines of Claim 14 of Parent Patent 1, where audio data is conventionally received via a microphone which suggests where the audio processing device and a microphone, together, form an “electronic…device”)
comprising an audio interface coupled to the microphone and configured to sample audio data, a speech onset detector, a combiner, a wake-up phrase detector, and an audio interface control (first 4 lines of Claim 14 of Parent Patent 1 and Claim 15 of Parent Patent 1, where the audio interface samples data which is suggested to be received by a microphone which suggests that the microphone is coupled to the audio interface)
wherein the audio interface is operable to provide a first sample of the audio data to the speech onset detector (lines 4-6 of Claim 14 of Parent Patent 1)
wherein responsive to detection by the speech onset detector of speech onset in the first sample of the audio data, the audio interface control is operable to switch the audio interface from sampling second samples of the audio data at first intervals, to sampling the second samples of the audio data at second intervals (lines 8-14 of Claim 14 of Parent Patent 1, where continuously, by definition, is suggested to be a capturing that is as frequent as possible, such that sampling continuously is suggested to be more frequent and with shorter intervals than the periodic capturing of second samples, where line 2 of Claim 14 of Parent Patent 1 describes where the audio interface is operable to sample audio data such that switching capturing of samples is suggested to switch how the audio interface samples audio data)
wherein the combiner is operable to provide contiguous audio data using at least one portion of the second samples of the audio data sampled at the first intervals and the the second samples of the audio data sampled at the second intervals, (lines 14-19 of Claim 14 of Parent Patent 1; as discussed in the previous paragraph, periodic capturing is suggested to be capturing at first intervals and continuous capturing is suggested to be capturing at second intervals that are shorter than the first intervals, and so providing contiguous data using the periodically captured second samples and the continuously captured second samples is suggested to provide contiguous audio data using second samples of audio data captured at first intervals and second samples of audio data captured at second intervals).
the wake-up phrase detector is configured to process the contiguous audio data to recognize a wake phrase… (Claim 15 of Parent Patent 1)
an electronic communication device and a communication interface configured to wirelessly transmit and receive data and wherein the communication interface is configured to wirelessly transmit at least a portion of the second samples of the audio data sampled at the second intervals to a network, responsive to detection of the wake up phrase (Figure 2; col. 7, lines 24-55; col. 8, line 50 - col. 9, line 6; Figure 9; col. 26, lines 21-51;
In Mutagi, col. 8, line 50 – col. 9, line 6 describes where, in response to detecting a wakeword, a local device may begin transmitting audio data corresponding to input audio to a server for speech processing, where the transmitted audio data may include data corresponding to the wakeword, or where the portion of the audio data corresponding to the wakeword may be removed prior to sending the audio data [at least suggesting an embodiment where wakeword audio is transmitted in response to detecting a wakeword], and where audio data is converted by ASR module into text [i.e. ASR following wakeword detection, similar to Rossum where ASR follows keyword recognition].  Figure 2 depicts an Automatic Speech Recognition element 250 [at least suggested to be the ASR module in the server].  Col. 7, lines 24-55 describes capturing audio 11, processing audio data corresponding to input audio 11 to determine if a keyword/wakeword is detected in the audio data, and following detection of a wakeword, the device sends audio data 111 corresponding to the utterance to a server that includes an ASR module.  Figure 9 and col. 26, lines 21-51 also describe where device 110 includes an antenna, and describes wireless network communication, and Figure 110 depicted in Figure 2 also appears to be a device that is not wired to the 
Mutagi suggests where the electronic device that includes the audio processing device of Claim 14 of Parent Patent 1 and a microphone further includes “a communication interface configured to wirelessly transmit and receive data” [Wifi commonly/conventionally involves sending and receiving data] such that the electronic device is more specifically an “electronic communication device” and “wherein the communication interface is configured to wirelessly transmit at least a portion of the second samples of the audio data sampled at the second intervals to a network, responsive to detection of the wake up phrase” [where detecting a wake-up phrase leads to sending the contiguous audio data to a server over a network])
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to combine prior art elements according to known methods because Claims 14-15 of Parent Patent 1 and Mutagi included each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference (Claims 14-15 of Parent Patent 1 teach the limitations of Claim 36 except for a wireless communication interface and wirelessly sending audio data in response to detecting a wakephrase, and Mutagi suggests a wireless communication interface and wirelessly sending audio data in response to detecting a wakeword).  One of ordinary skill in the art could have combined the elements as claimed by known methods (by adding Mutagi’s wireless communication interface to the audio processing device of Claims 14-15 of Parent Patent 1 and by 

	As per Claim 37, Claim 14 of Parent Patent 1 suggests Claim 37 (continuous capturing is suggested to capture at smaller intervals than periodic capturing)

	As per Claim 38, Claim 14 of Parent Patent 1 suggests Claim 38 (the switch from periodically sampling to continuously sampling occurs after detecting speech onset, and so the first sample which is analyzed to detect the onset that causes the switch is suggested to be sampled using periodic sampling which is suggested to be a lower sample rate than the continuous sampling used to sample the “continuously captured second samples”)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC YEN whose telephone number is (571)272-4249.  The examiner can normally be reached on M-F 9:00AM -5:30PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






EY 3/6/2021
/ERIC YEN/Primary Examiner, Art Unit 2658