DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on June 24, 2021. 
Claims 1-10 are pending in the application. As such, claims 1-10 have been examined. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings were received on June 24, 2021.  These drawings have been accepted and considered by the Examiner.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
“voice receiving module” in claims 1 and 6, 
“feature extraction module” in claims 1, 2, 6 and 7,
“first determination module” in claims 1 and 6,
“second determination module” in claims 1 and 6,
“function response module” in claims 1 and 6.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover “a central processing unit, a field-programmable gate array (FPGA), or a multi-purpose chip that can load programming language” (Spec. [p. 6 ln. 20-25]) as the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Where applicant acts as his or her own lexicographer to specifically define a term of a claim contrary to its ordinary meaning, the written description must clearly redefine the claim term and set forth the uncommon definition so as to put one reasonably skilled in the art on notice that the applicant intended to so redefine that claim term. Process Control Corp. v. HydReclaim Corp., 190 F.3d 1350, 1357, 52 USPQ2d 1029, 1033 (Fed. Cir. 1999). 
Specifically, “start-up voice” and “predetermined voice” are used in the claims to mean wake-word and keyword, respectively, while the accepted meaning of “voice” refers to a person (i.e., a person’s unique voice).
For the purpose of examination “start-up voice” and “predetermined voice” are interpreted to mean wake-word and keyword, respectively.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim 1-2, 5-7 and 10 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Pedersen et al. (US Patent Pub. No. 2021/0105565), hereinafter Pedersen.
Regarding claims 1 and 6, Pedersen teaches an acoustic event detection system and method (Pedersen [0001] The present application deals with a hearing device, e.g. a hearing aid, comprising a detector, e.g. for detecting a certain acoustic environment, e.g. a voice detector, e.g. for detecting specific keywords for a voice control interface. The present application further deals with a scheme for personalization of hearing device parameters), 
comprising: 
a voice activity detection subsystem (Pedersen [0001] The present application deals with a hearing device, e.g. a hearing aid, comprising a detector, e.g. for detecting a certain acoustic environment, e.g. a voice detector, e.g. for detecting specific keywords for a voice control interface. The present application further deals with a scheme for personalization of hearing device parameters), 
including: 
a voice receiving module configured to receive an original sound signal (Pedersen [0003] An input transducer comprising at least one microphone for providing at least one electric input signal representative of sound in the environment of the hearing device); 
a feature extraction module configured to extract a plurality of features from the original sound signal (Pedersen [0011] The predefined criterion may relate to minimizing a cost function regarding said output vectors. The predefined criterion may be based on the performance of the neural network in terms of true positives, false positives, true rejections and false rejections of said output vectors, when said multitude of feature vectors are extracted from time segment of said at least one electric input signal having known properties); 
and a first determination module configured to execute a first classification process to determine whether or not the plurality of features match to a start-up voice (Pedersen [0015] The detector (or a part thereof) implemented by the neural network may e.g. comprise a wake-word detector, keyword detector or detector of a preferred speaker (spouse detector). The detector (or a part thereof) implemented by the neural network may e.g. alternatively of further comprise a correlation detector, a level estimator, a modulation detector, a feedback detector, a voice detector, e.g. an own voice detector, an estimator of speech intelligibility of the current electric input signal or a signal derived therefrom. The output of the detector may comprise estimates of a value or values of a specific parameter or property or content of the electric input signal, or a probability or probabilities of such estimated value(s)); 
a database configured to store the plurality of extracted features (Pedersen [0109] More generally, a hearing device comprises an input transducer for receiving an acoustic signal from a user's surroundings and providing a corresponding input audio signal and/or a receiver for electronically (i.e. wired or wirelessly) receiving an input audio signal, a (typically configurable) signal processing circuit (e.g. a signal processor, e.g. comprising a configurable (programmable) processor, e.g. a digital signal processor) for processing the input audio signal and an output unit for providing an audible signal to the user in dependence on the processed audio signal. The signal processor may be adapted to process the input signal in the time domain or in a number of frequency bands. In some hearing devices, an amplifier and/or compressor may constitute the signal processing circuit. The signal processing circuit typically comprises one or more (integrated or separate) memory elements for executing programs and/or for storing parameters used (or potentially used) in the processing and/or for storing information relevant for the function of the hearing device and/or for storing information (e.g. processed information, e.g. provided by the signal processing circuit), e.g. for use in connection with an interface to a user and/or an interface to a programming device. In some hearing devices, the output unit may comprise an output transducer, such as e.g. a loudspeaker for providing an air-borne acoustic signal or a vibrator for providing a structure-borne or liquid-borne acoustic signal. In some hearing devices, the output unit may comprise one or more output electrodes for providing electric signals (e.g. a multi-electrode array for electrically stimulating the cochlear nerve). The hearing device may comprise a speakerphone (comprising a number of input transducers and a number of output transducers, e.g. for use in an audio conference situation)); 
and an acoustic event detection subsystem, including: 
a second determination module configured to, 
in response to the first determination module determining that the plurality of features match the start-up voice, 
execute a second classification process to determine whether or not the plurality of features match to at least one of a plurality of predetermined voices (Pedersen [0020] The output of the decision unit (post-processor), the resulting signal, may e.g. be a command word or sentence or a wake-word or sentence for activating the voice control interface);
and a function response module configured to, 
in response to the second determination module determining that the plurality of features match at least one of the plurality of predetermined voices, 
execute one of a plurality of functions corresponding to the at least one of the plurality of predetermined voices that is matched (Pedersen [0020] The output of the decision unit (post-processor), the resulting signal, may e.g. be a command word or sentence or a wake-word or sentence for activating the voice control interface).

Regarding claims 2 and 7, Pedersen teaches the acoustic event detection system and method according to claims 1 and 6.
Pedersen teaches
wherein the plurality of features are a plurality of Mel-Frequency Cepstral Coefficients (MFCCs) (Pedersen [0141] The feature vector (FV) may depend on the application. The feature vector (FV) may e.g. be or comprise a complex-valued output from a filter bank or simply the magnitude (or squared-magnitude) of the filter bank output. Alternative or additional feature vectors may be cepstral coefficients such as Mel Frequency Cepstral Coefficients (MFCC) or Bark Frequency Cepstral Coefficients (BFCC). In the case of own voice detection, the feature vector (FV) may contain information about the transfer function between different microphone signals).
Regarding claims 5 and 10, Pedersen teaches the acoustic event detection system and method according to claims 1 and 6.
Pedersen teaches
wherein the second classification process includes identifying the plurality of features through a trained machine learning model to determine whether the plurality of features match to at least one of the plurality of predetermined voices (Pedersen [0051] The classification unit may be based on or comprise a neural network, e.g. a trained neural network).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-3 and 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Pedersen in view of Rees (US Patent Pub. No. 2003/0055639), hereinafter Rees.
Regarding claims 3 and 8, Pedersen teaches the acoustic event detection system and method according to claims 2 and 7.
Pedersen teaches
wherein the feature extraction module extracts the plurality of features of the original sound signal through an extraction process, and the extraction process includes: 
decomposing the original sound signal into a plurality of frames (Pedersen [0027] The time segment of corresponding values of the at least one electric input signal and optionally the sensor signal covered by a given feature vector are used as input to the input layer of the neural network comprises at least one time frame of the at least one electric input signal. The time segment may comprise a multitude of time frames of the at least one electric input signal, e.g. more than three, such as more than five time frames, such as in the range of 2 to 50 time frames, e.g. corresponding to up to 0.5 to 1 s of audio, e.g. corresponding to one or more words); 
pre-enhancing signal data corresponding to the plurality of frames through a [filter] (Pedersen [0025] The hearing device may comprise an analysis filter bank for converting a time domain input signal to a number of frequency sub-band signals providing the input signal in a time-frequency representation (k,l), where k and l are frequency and time indices, respectively. The input transducer may comprise an analysis filter bank for each electric input signal, and/or the sensor signal. The hearing device may comprise an analysis filter bank for converting a time domain sensor signal to a number of frequency sub-band signals providing the sensor signal in a time-frequency representation (k, l), where k and l are frequency and time indices, respectively. The feature vector may be provided in a time-frequency representation. If reconstruction of a time domain signal is not required (it is not required if we just utilize the feature vector for a detector), the filter bank may be down-sampled with a factor above the critical down-sampling. We may as well utilize a smaller subset of the available frequency channels of the filter-bank as well as we may sum frequency channels together. The filter-bank channels may be low-pass filtered before being down-sampled); 
performing a Fourier transformation to convert the pre-enhanced signal data to a frequency domain to generate a plurality of sets of spectrum data corresponding to the plurality of frames (Pedersen [0042] The hearing device, e.g. the input transducer, and or the antenna and transceiver circuitry comprise(s) a TF-conversion unit for providing a time-frequency representation of an input signal. The time-frequency representation may comprise an array or map of corresponding complex or real values of the signal in question in a particular time and frequency range. The TF conversion unit may comprise a filter bank for filtering a (time varying) input signal and providing a number of (time varying) output signals each comprising a distinct frequency range of the input signal. The TF conversion unit may comprise a Fourier transformation unit for converting a time variant input signal to a (time variant) signal in the (time-)frequency domain. The frequency range considered by the hearing device from a minimum frequency f.sub.min to a maximum frequency f.sub.max comprises a part of the typical human audible frequency range from 20 Hz to 20 kHz, e.g. a part of the range from 20 Hz to 12 kHz. Typically, a sample rate f.sub.s is larger than or equal to twice the maximum frequency f.sub.max, f.sub.s≥2f.sub.max. Xx, a signal of the forward and/or analysis path of the hearing device is split into a number NI of frequency bands (e.g. of uniform width), where NI is e.g. larger than 5, such as larger than 10, such as larger than 50, such as larger than 100, such as larger than 500, at least some of which are processed individually. The hearing device is/are adapted to process a signal of the forward and/or analysis path in a number NP of different frequency channels (NP≤NI). For detectors e.g. for analyzing a signal of the forward path, e.g. the electric input signal, we may have fewer channels, e.g. NP′≤NP. The frequency channels may be uniform or non-uniform in width (e.g. increasing in width with frequency), overlapping or non-overlapping);
so as to generate the plurality of Mel-Frequency Cepstral Coefficients (MFCCs) (Pedersen [0141] The feature vector (FV) may depend on the application. The feature vector (FV) may e.g. be or comprise a complex-valued output from a filter bank or simply the magnitude (or squared-magnitude) of the filter bank output. Alternative or additional feature vectors may be cepstral coefficients such as Mel Frequency Cepstral Coefficients (MFCC) or Bark Frequency Cepstral Coefficients (BFCC). In the case of own voice detection, the feature vector (FV) may contain information about the transfer function between different microphone signals).
Pedersen does not teach
high-pass filter
obtaining a plurality of mel scales by applying a mel filter on the plurality of sets of spectrum data; 
extracting logarithmic energy on the plurality of mel scales; 
and performing a discrete cosine transformation on the obtained logarithmic energy 
to convert to a cepstrum domain.
Rees teaches
high-pass filter (Rees [0069] Referring to FIG. 7, when the control unit 86 identifies that speech has started, it outputs a control signal on line 88 to the buffer 78 which causes the N most recent frame energies to be read out of the buffer 78 and input to a high pass filter 90. The filter 90 removes the DC offset and any slowly varying noise contribution in the energy signal and outputs the filtered energies to buffer 92. In this embodiment, the filter 90 is a second order recursive filter, with a cut-off frequency of 1 Hz. FIG. 9 shows the output of the high-pass filter 90 for the energy signal shown in FIG. 6a. As shown, the filtered frame energy fluctuates about zero during the silence portions 72-1 and 72-2 but oscillates during the speech portions 74. As a result, it is assumed that during the silence portions, the filtered frame energies are uncorrelated from frame to frame, whereas in the speech portion, the filtered frame energy of each frame depends upon the filtered frame energy of its neighbouring frames)
obtaining a plurality of mel scales by applying a mel filter on the plurality of sets of spectrum data (Rees [0081] In the present embodiment, a mel spaced filter bank 69 having sixteen bands is used. The mel scale is well known in the art of speech analysis, and is a logarithmic scale that attempts to map the perceived frequency of a tone onto a linear scale. FIG. 12 shows the output .vertline.S.sup.k(f').vertline. of the mel spaced filter bank 69, when the samples shown in FIG. 11 are passed through the bank 69. The resulting envelope 100 of the magnitude spectrum is considerably smoother due to the averaging effect of the filter bank 69, although less so at the lower frequencies due to the logarithmic spacing of the filter bank); 
extracting logarithmic energy on the plurality of mel scales (Rees [0088] The noise masking block 73 performs a dynamic masking on each frame by firstly calculating the maximum log filter-bank energy output from the mel filter banks 69); 
and performing a discrete cosine transformation on the obtained logarithmic energy (Rees [0086] The vocal tract characteristics 101 can be extracted from the excitation characteristics 103, by performing a Discrete Cosine Transform (DCT) on the samples output from block 71, and then filtering the result. However, before performing the DCT, a dynamic noise masking is performed by the noise masking block 73)
to convert to a cepstrum domain (Rees [0093] FIG. 16 shows the output of the DCT block 75, which is known as the cepstrum C.sup.k(m)), 
Rees is considered to be analogous to the claimed invention because it is in the same field of speech processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pedersen further in view of Rees to allow for using a high pass filter, mel scales, log energies, and cosine transforms. Doing so would allow for an alternative system for detecting speech within an input signal (Rees [0004]).

Regarding claims 4 and 9, Pedersen in view of Rees teaches the acoustic event detection system and method according to claims 3 and 8.
Pedersen teaches
wherein the first classification process includes comparing the plurality of sets of spectrum data with spectrum data of the start-up voice to determine whether the plurality of features match to the start-up voice (Pedersen [0015] The detector (or a part thereof) implemented by the neural network may e.g. comprise a wake-word detector, keyword detector or detector of a preferred speaker (spouse detector). The detector (or a part thereof) implemented by the neural network may e.g. alternatively of further comprise a correlation detector, a level estimator, a modulation detector, a feedback detector, a voice detector, e.g. an own voice detector, an estimator of speech intelligibility of the current electric input signal or a signal derived therefrom. The output of the detector may comprise estimates of a value or values of a specific parameter or property or content of the electric input signal, or a probability or probabilities of such estimated value(s)).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL J. MUELLER whose telephone number is (571)272-1875. The examiner can normally be reached M-F 8:30am-5:30pm (Eastern).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PAUL J. MUELLER/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657