DETAILED ACTION

Introduction
1.         This office action is in response to Applicant’s submission filed on 09/20/2019.  Claims 1-12 are pending in the application. As such, Claims 1-12 have been examined.

Notice of Pre-AIA  or AIA  Status
2. 	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
3.	The drawings filed on 09/20/2019. have been accepted and considered by the Examiner.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

4.	The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
“…a framing module configured to subdivide…a filtering module configured to analyze…a feature extraction module configured to extract…a classification module configured to process…”  in claim 1; “…a flatness estimator module configured to assess…” in claim 2; “…filtering module is configured to discard…” in claim 3; “…the flatness estimator module configured to assess…” in claims 4 and 5; “…an energy estimator module configured to assess…” in claim 6; “…said filtering module is configured to discard…” in claim 7; “…the energy estimator module is configured to calculate…” in claim 8; “…the classification module is configured to generate …” in claim 9; and “…the classification module is configured to: compare …” in claims 10 and 11. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding 
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 1-3, 9, 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over (a)Spengler et al., (U.S. Patent Application Publication: 2007/0288242), in view of (b)Lovekin et al., (J. M. Lovekin, R. E. Yantorno, K. R. Krishnamachari, D. S. Benincasa and S. J. Wenndt, “Developing usable speech criteria for speaker identification technology,” 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001, pp. 421-424 vol.1), hereinafter referred to as SPENGLER and LOVEKIN.
	With respect to Claim 1, SPENGLER discloses:
1. A speaker recognition system for assessing the identity of a speaker through a speech signal based on speech uttered by said speaker, the system comprising: 

    PNG
    media_image1.png
    352
    344
    media_image1.png
    Greyscale
 a framing module configured to subdivide said speech signal over time into a set of frames (See e.g., “…receive a framed speech signal…” and “…the actual speech/utterance can be aligned in an observation frame or window using, for example, a convolution-based algorithm to enhance analysis of the speech. To perform the alignment, the user-speech template can be divided into a plurality of time slices or vectors….” See e.g., SPENGLER paras. 61-63, Fig. 5, 6, 8-12, 23); 
a filtering module configured to analyze the frames of the set to discard frames affected by noise and frames which do not comprise a speech, based on a spectral analysis of the frames (See e.g., “…A Short Time Fourier transformation is then performed on each time slice to form Fourier transformed data defining a spectrograph…taking the log of the absolute value of the complex data. The converted amplitude values are then thresholded by a centering 
    PNG
    media_image2.png
    215
    615
    media_image2.png
    Greyscale
threshold to normalize the energy values within each time slice. The Sum of each time slice, equivalent to the geometric mean of the frequency bins for the respective time slice… Mean positions of peaks of the convolution are then determined to identify the center of the speech, and the user-speech template is cyclically shifted to center the speech in the observation frame or window…,” “…convert sampled data to frequency domain…perform speech alignment…determine noise contour…perform noise removal process…,” “…to perform the operations of determining a background noise contour for 
    PNG
    media_image3.png
    573
    653
    media_image3.png
    Greyscale
noise within the observation frame or window and removing the noise from within and around speech formants of the aligned user speech template using a nonlinear noise removal process such as, for example, by thresholding bins of equalized portions of the user-speech template…by first estimating noise power (see FIG. 8) in each bin for each of a plurality of time slices, e.g., twenty, on either side of the speech near and preferably outside the boundaries of the speech for each of the frequency ranges defining the bins, and equalizing the energy values of the each bin across each of the frequency ranges in response to the estimated noise power to thereby “flatten” the spectrum…” See e.g., SPENGLER paras. 61-63, Figs. 3-6, 8-12, 23); 

    PNG
    media_image4.png
    235
    689
    media_image4.png
    Greyscale

    PNG
    media_image5.png
    285
    682
    media_image5.png
    Greyscale
a feature extraction module configured to extract audio features from frames which have not been discarded (See e.g., “…develop a set of feature vectors…” “…operation of developing a set of feature vectors representing energy of the frequency content of the user-speech template to determine a unique pattern…” See e.g., SPENGLER paras. 61-64, Figs. 3-5, 6, 8-15, 23); 

    PNG
    media_image6.png
    699
    680
    media_image6.png
    Greyscale
a classification module configured to process the audio features extracted from the frames which have not been discarded for [assessing the identity of the speaker] (See e.g., “… when implemented by a 1.6 GHZ, Pentium IV processor, Hidden Markov Model training on an utterance encapsulated within a 1.5 second frame can be performed in less than approximately 400 milliseconds for each word/utterance and recognition of such word/utterance (command annunciation) using a Hidden Markov Model recognition engine/classifier can be performed in less than 250 milliseconds…,” “…the recognize mode can include noise removal, feature extraction, speech alignment, and speech recognition functions…,” “…the speech actuated command program product 51 also provide a core speech recognizer engine/classifier which can include both Hidden Markov and Neural Net modeling and models which can recognize sound patterns of the speech/utterances…,” “…associate an index and/or function or state to the speech model…” See e.g., SPENGLER paras. 53-55, 61-64, Figs. 5, 6, 8-12, 16, 23).
SPENGLER does not explicitly, but LOVEKIN discloses a [speaker recognition system] and [assessing the identity of the speaker] (“…speaker identification (SID)…criteria for usable speech frames for SID. Voiced speech, of which usable speech is entirely comprised, is shown to be information rich for SID…performing a frame based (Target to Interferer Ratio) TIR as opposed to an overall TIR. Usable frames of speech are separated and collected into a file for each speaker by calculating the TIR for each frame individually to determine if it exceeds 
    PNG
    media_image7.png
    324
    369
    media_image7.png
    Greyscale
a predetermined threshold…,” and how “…it is meaningful to extract only voiced frames  from the full speaker utterances, and assess the performance of the SID system with these segments to approximate the performance with usable segments. The voiced-only speech is extracted using the Spectral Flatness Method (SFM) [6]…”  See e.g., LOVEKIN, Abstract, §§ 2, 3).
SPENGLER and LOVEKIN can be considered analogous art because they are from a similar field of endeavor in speech processing techniques and applications.  Thus, it would have been obvious to a person of ordinary skill in the art, before the effective filling date of the claimed invention, to modify the teachings of SPENGLER in view of LOVEKIN’s techniques comprising, see e.g., speaker identification architectures comprising “…speaker identification (SID)…criteria for usable speech frames for SID…” such that “…voiced speech, of which usable speech is entirely comprised, is shown to be information rich for SID…performing a frame based (Target to Interferer Ratio) TIR as opposed to an overall TIR.…” in order to advantageously enhance speaker identification since, see e.g., “…it is meaningful to extract only voiced frames from the full speaker utterances, and assess the performance of the SID system with these segments to approximate the performance with usable segments. The voiced-only speech is extracted using the Spectral Flatness Method (SFM) [6]…,” (See e.g., LOVEKIN, Abstract, §§ 2, 3).

With respect to Claim 2, SPENGLER in view of LOVEKIN discloses:
2. The system of claim 1, wherein the filtering module comprises a flatness estimator module (See e.g., “…to perform the operations of determining a background noise contour for noise within the observation frame or window and removing the noise from within and around speech formants of the aligned user speech template using a nonlinear noise removal process such as, for example, by thresholding bins of equalized portions of the user-speech template…by first estimating noise power (see FIG. 8) in each bin for each of a plurality of time slices, e.g., twenty, on either side of the speech near and preferably outside the boundaries of the speech for each of the frequency ranges defining the bins, and equalizing the energy values of the each bin across each of the frequency ranges in response to the estimated noise power to thereby “flatten” the spectrum…” See e.g., SPENGLER paras. 61-63, Figs. 3-6, 8-12, 23) configured to assess  whether a frame has to be discarded based on the flatness of the frequency spectrum of such frame (See e.g., “…“…it is meaningful to extract only voiced frames  from the full speaker utterances, and assess the performance of the SID system with these segments to approximate the performance with usable segments. The voiced-only speech is extracted using the Spectral Flatness Method (SFM) [6]…”  See e.g., LOVEKIN, Abstract, §§ 2, 3). 

With respect to Claim 3, SPENGLER in view of LOVEKIN discloses:
3. The system of claim 2, wherein said filtering module is configured to discard a frame if the flatness estimator module (See e.g., “…to perform the operations of determining a background noise contour for noise within the observation frame or window and removing the noise from within and around speech formants of the aligned user speech template using a nonlinear noise removal process such as, for example, by thresholding bins of equalized portions of the user-speech template…by first estimating noise power (see FIG. 8) in each bin for each of a plurality of time slices, e.g., twenty, on either side of the speech near and preferably outside the boundaries of the speech for each of the frequency ranges defining the bins, and equalizing the energy values of the each bin across each of the frequency ranges in response to the estimated noise power to thereby “flatten” the spectrum…” See e.g., SPENGLER paras. 61-63, Figs. 3-6, 8-12, 23) has assessed that said frame has to be discarded because said frame has a substantially flat spectrum (See e.g., “…“…it is meaningful to extract only voiced frames  from the full speaker utterances, and assess the performance of the SID system with these segments to approximate the performance with usable segments. The voiced-only speech is extracted using the Spectral Flatness Method (SFM) [6]…”  See e.g., LOVEKIN, Abstract, §§ 2, 3). 

With respect to Claim 9, SPENGLER in view of LOVEKIN discloses:
9. The system of claim 1, wherein the classification module (See e.g., “…using a Hidden Markov Model recognition engine/classifier…,” “…the recognize mode can include noise removal, feature extraction, speech alignment, and speech recognition functions…,” “…the speech 
    PNG
    media_image6.png
    699
    680
    media_image6.png
    Greyscale
actuated command program product 51 also provide a core speech recognizer engine/classifier which can include both Hidden Markov and Neural Net modeling and models which can recognize sound patterns of the speech/utterances…,” “…associate an index and/or function or state to the speech model…” See e.g., SPENGLER paras. 53-55, 61-64, Figs. 5, 6, 8-12, 16, 23) [is configured to generate for each known speaker of a predefined set of known speakers a corresponding score quantifying the likelihood that the speaker having uttered said speech is said known speaker, said generating the score being based on said audio features extracted from the frames which have not been discarded]. 
SPENGLER does not explicitly, but LOVEKIN discloses capabilities for the classification module of SPENGLER to be configured with speaker identification functionalities by using speaker’s models for testing and training in order to be [[is] configured to generate for each known speaker of a predefined set of known speakers a corresponding score quantifying the likelihood that the speaker having uttered said speech is said known speaker (See e.g., how “…speech from any of …previously trained speakers using different speech samples, which will then 
    PNG
    media_image8.png
    181
    342
    media_image8.png
    Greyscale
compare the given speech to the speaker’s models in an attempt to find a match… Voiced-only segments extracted at 37 Spectral Flatness Method (SFM) were used for this purpose in place of the actual usable segments. Table 1 shows the different training and testing situations, with accompanying speaker identification results… SID accuracy for normal when voiced only segments were used for training and testing, approximately 80% speaker ID accuracy was achieved. It was realized that less information was available when removing the unvoiced portions of the speech. Correct identification of 75.8%...,” “…38 speakers are separated into two groups of 19 speakers each. Group A contains 14 female speakers and 5 male speakers. Group B contains 19 male speakers… Group A(i) + Group A(i+l)… Group B(i) + Group B(i+l)… Group A(i) + Group B(i)…” See e.g., LOVEKIN, Abstract, §§ 2, 3, Fig. 1, Table 1), said ] (“…speaker identification (SID)…criteria for usable speech frames for SID. Voiced speech, 
    PNG
    media_image7.png
    324
    369
    media_image7.png
    Greyscale
of which usable speech is entirely comprised, is shown to be information rich for SID…performing a frame based (Target to Interferer Ratio) TIR as opposed to an overall TIR. Usable frames of speech are separated and collected into a file for each speaker by calculating the TIR for each frame individually to determine if it exceeds a predetermined threshold…,” and how “…it is meaningful to extract only voiced frames  from the full speaker utterances, and assess the performance of the SID system with these segments to approximate the performance with usable segments. The voiced-only speech is extracted using the Spectral Flatness Method (SFM) [6]…”  See e.g., LOVEKIN, Abstract, §§ 2, 3, Fig. 1, Table 1).
SPENGLER and LOVEKIN can be considered analogous art because they are from a similar field of endeavor in speech processing techniques and applications.  Thus, it would have been obvious to a person of ordinary skill in the art, before the effective filling date of the claimed invention, to modify the teachings of SPENGLER’s core speech recognizer engine/classifier with LOVEKIN’s techniques comprising, see e.g., a speaker identification architecture with speaker’s models in an attempt to find a match comprising “…speaker identification (SID)…criteria for usable speech frames for SID…” such that “…voiced speech, of which usable speech is entirely (See e.g., LOVEKIN, Abstract, §§ 2, 3).
With respect to Claim 12, SPENGLER discloses:
12. A method for assessing the identity of a speaker through a speech signal based on speech uttered by said speaker, the method comprising: 

    PNG
    media_image1.png
    352
    344
    media_image1.png
    Greyscale
subdividing said speech signal over time into a set of frames (See e.g., “…receive a framed speech signal…” and “…the actual speech/utterance can be aligned in an observation frame or window using, for example, a convolution-based algorithm to enhance analysis of the speech. To perform the alignment, the user-speech template can be divided into a plurality of time slices or vectors….” See e.g., SPENGLER paras. 61-63, Fig. 5, 6, 8-12, 23); 
spectrally analyzing the frames of the set and discarding frames affected by noise and frames which do not comprise a speech based on such spectral analysis of the frames (See e.g., “…A Short Time Fourier transformation is then performed on each time slice to form Fourier transformed data defining a spectrograph…taking the log of the absolute value of the complex data. The converted amplitude values are then thresholded by a centering 
    PNG
    media_image2.png
    215
    615
    media_image2.png
    Greyscale
threshold to normalize the energy values within each time slice. The Sum of each time slice, equivalent to the geometric mean of the frequency bins for the respective time slice… Mean positions of peaks of the convolution are then determined to identify the center of the speech, and the user-speech template is cyclically shifted to center 
    PNG
    media_image3.png
    573
    653
    media_image3.png
    Greyscale
the speech in the observation frame or window…,” “…convert sampled data to frequency domain…perform speech alignment…determine noise contour…perform noise removal process…,” “…to perform the operations of determining a background noise contour for noise within the observation frame or window and removing the noise from within and around speech formants of the aligned user speech template using a nonlinear noise removal process such as, for example, by thresholding bins of equalized portions of the user-speech template…by first estimating noise power (see FIG. 8) in each bin for each of a plurality of time slices, e.g., twenty, on either side of the speech near and preferably outside the boundaries of the speech for each of the frequency ranges defining the bins, and equalizing the energy values of the each bin across each of the frequency ranges in response to the estimated noise power to thereby “flatten” the spectrum…” See e.g., SPENGLER paras. 61-63, Figs. 3-6, 8-12, 23); 
extracting audio features from frames which have not been discarded (See e.g., “…develop a set of feature vectors…” “…operation of developing a set of feature vectors representing energy of the frequency content of the user-speech template to determine a unique pattern…” See 
    PNG
    media_image4.png
    235
    689
    media_image4.png
    Greyscale
e.g., SPENGLER paras. 61-64, Figs. 3-5, 6, 8-15, 23);

    PNG
    media_image5.png
    285
    682
    media_image5.png
    Greyscale


    PNG
    media_image6.png
    699
    680
    media_image6.png
    Greyscale
 processing the audio features extracted from the frames which have not been discarded for [assessing the identity of the speaker] (See e.g., “… when implemented by a 1.6 GHZ, Pentium IV processor, Hidden Markov Model training on an utterance encapsulated within a 1.5 second frame can be performed in less than approximately 400 milliseconds for each word/utterance and recognition of such word/utterance (command annunciation) using a Hidden Markov Model recognition engine/classifier can be performed in less than 250 milliseconds…,” “…the recognize mode can include noise removal, feature extraction, speech alignment, and speech recognition functions…,” “…the speech actuated command program product 51 also provide a core speech recognizer engine/classifier which can include both Hidden Markov and Neural Net modeling and models which can recognize sound patterns of the speech/utterances…,” “…associate an index and/or function or state to the speech model…” See e.g., SPENGLER paras. 53-55, 61-64, Figs. 5, 6, 8-12, 16, 23).

    PNG
    media_image7.png
    324
    369
    media_image7.png
    Greyscale
SPENGLER does not explicitly, but LOVEKIN discloses a [speaker recognition system and method] and [assessing the identity of the speaker] (“…speaker identification (SID)…criteria for usable speech frames for SID. Voiced speech, of which usable speech is entirely comprised, is shown to be information rich for SID…performing a frame based (Target to Interferer Ratio) TIR as opposed to an overall TIR. Usable frames of speech are separated and collected into a file for each speaker by calculating the TIR for each frame individually to determine if it exceeds a predetermined threshold…,” and how “…it is meaningful to extract only voiced frames from the full speaker utterances, and assess the performance of the SID system with these segments to approximate the performance with usable segments. The voiced-only speech is extracted using the Spectral Flatness Method (SFM) [6]…”  See e.g., LOVEKIN, Abstract, §§ 2, 3).
SPENGLER and LOVEKIN can be considered analogous art because they are from a similar field of endeavor in speech processing techniques and applications.  Thus, it would have been obvious to a person of ordinary skill in the art, before the effective filling date of the claimed invention, to modify the teachings of SPENGLER in view of LOVEKIN’s techniques comprising, see e.g., speaker identification architectures comprising “…speaker identification (SID)…criteria for usable speech frames for SID…” such that “…voiced speech, of which usable speech is entirely comprised, is shown to be information rich for SID…performing a frame based (Target to Interferer Ratio) TIR as opposed to an overall TIR.…” in order to advantageously enhance speaker (See e.g., LOVEKIN, Abstract, §§ 2, 3).

6.	Claims 4, 5, 6, 7, 8, is/are rejected under 35 U.S.C. 103 as being unpatentable over (a)Spengler et al., (U.S. Patent Application Publication: 2007/0288242), in view of (b)Lovekin et al., (J. M. Lovekin, R. E. Yantorno, K. R. Krishnamachari, D. S. Benincasa and S. J. Wenndt, “Developing usable speech criteria for speaker identification technology,” 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001, pp. 421-424 vol.1), and further in view of (c)Moattar et al., (M. H. Moattar and M. M. Homayounpour, “A simple but efficient real-time Voice Activity Detection algorithm,” 2009 17th European Signal Processing Conference, 2009, pp. 2549-2553), hereinafter referred to as SPENGLER,  LOVEKIN, and MOATTAR.

With respect to Claim 4, SPENGLER in view of LOVEKIN discloses:
4. The system of claim 3, wherein the flatness estimator module (See e.g., “…to perform the operations of determining a background noise contour for noise within the observation frame or window and removing the noise from within and around speech formants of the aligned user speech template using a nonlinear noise removal process such as, for example, by thresholding bins of equalized portions of the user-speech template…by first estimating noise power (see FIG. 8) in each bin for each of a plurality of time slices, e.g., twenty, on either side of the speech near and preferably outside the boundaries of the speech for each of the frequency ranges defining the bins, and equalizing the energy values of the each bin across each of the frequency ranges in response to the estimated noise power to thereby “flatten” the spectrum…” See e.g., SPENGLER paras. 61-63, Figs. 3-6, 8-12, 23) is configured to assess the flatness of the spectrum of a frame (See e.g., “…“…it is meaningful to extract only voiced frames  from the full speaker utterances, and assess the performance of the SID system with these segments to approximate the performance with usable segments. The voiced-only speech is extracted using the Spectral Flatness Method (SFM) [6]…”  See e.g., LOVEKIN, Abstract, §§ 2, 3) by [generating a corresponding flatness parameter based on a ratio of: the geometric mean of samples of the energy density of said frame; to the arithmetic mean of said samples of the energy density of said frame].
SPENGLER in view of LOVEKIN does not explicitly, but MOATTAR discloses [generating a corresponding flatness parameter based on a ratio of: the geometric mean of samples of the energy density of said frame; to the arithmetic mean of said samples of the energy density of said frame] (See e.g., flatness parameter based ratio capabilities according to see e.g., “…Spectral Flatness Measure (SFM)…a measure of the noisiness of spectrum and is a good feature in Voiced/Unvoiced/Silence detection…feature is   calculated using the following equation: 

    PNG
    media_image9.png
    713
    405
    media_image9.png
    Greyscale
SFMdb = 10log10 (Gm / Am)  where Am and Gm  are arithmetic and geometric means of speech spectrum respectively…,” See e.g., MOATTAR Abstract, §§2, 3).
SPENGLER, LOVEKIN, and MOATTAR can be considered analogous art because they are from a similar field of endeavor in speech processing techniques and applications.  Thus, it would have been obvious to a person of ordinary skill in the art, before the effective filling date of the claimed invention, to modify the teachings of SPENGLER and LOVEKIN in view of MOATTAR’s techniques comprising, see e.g., a voice activity detection (VAD) architecture using a Spectral Flatness Measure (SFM) algorithmic implementation in order to advantageously help with the performance of speech/audio processing considering see e.g., “…measure of the noisiness of spectrum and is a good feature in Voiced/Unvoiced/Silence detection …,” as such by “…uses[ing] short-term features such as Spectral Flatness (SF) and Short-term Energy. This helps the method to be appropriate for online processing tasks…,” (See e.g., MOATTAR, Abstract, §§ 2, 3).

With respect to Claim 5, SPENGLER in view of LOVEKIN and further in view of  MOATTAR discloses:

    PNG
    media_image9.png
    713
    405
    media_image9.png
    Greyscale
5. The system of claim 4, wherein the flatness estimator module is configured to assess that said frame has to be discarded if the corresponding flatness parameter is higher than a corresponding first threshold (See e.g., flatness parameter based ratio capabilities according to see e.g., “…Spectral Flatness Measure (SFM)…a measure of the noisiness of spectrum and is a good feature in Voiced/Unvoiced/Silence detection…feature is   calculated using the following equation: SFMdb = 10log10 (Gm / Am)  where Am and Gm  are arithmetic and geometric means of speech spectrum respectively…,” with the VAD Algorithmic implementation capabilities for discarding if the corresponding flatness parameter is higher than a corresponding first threshold according to instructions comprising see e.g.,  “…2- Set one primary threshold for each feature…Primary Threshold for SFM (SF_PrimThresh)… 3-4 Set Decision threshold for…SFM…Thresh_SF = SF_PrimThresh…3-5-…If ((SFM(i)-Min_SF)>=Thresh_SF) then Counter++…,” See e.g., MOATTAR, Abstract, §§ 2, 3).





Claim 6, SPENGLER in view of LOVEKIN discloses:
6. The system of claim 1, wherein the filtering module comprises an energy estimator module (See e.g., “…to perform the operations of determining a background noise contour for noise within the observation frame or window and removing the noise from within and around speech formants of the aligned user speech template using a nonlinear noise removal process such as, for example, by thresholding bins of equalized portions of the user-speech template…by first estimating noise power (see FIG. 8) in each bin for each of a plurality of time slices, e.g., twenty, on either side of the speech near and preferably outside the boundaries of the speech for each of the frequency ranges defining the bins, and equalizing the energy values of the each bin across each of the frequency ranges in response to the estimated noise power to thereby “flatten” the spectrum…” See e.g., SPENGLER paras. 61-63, Figs. 3-6, 8-12, 23) configured to assess (See e.g., “…it is meaningful to extract only voiced frames  from the full speaker utterances, and assess the performance of the SID system with these segments to approximate the performance with usable segments. The voiced-only speech is extracted using the Spectral Flatness Method (SFM) [6]…”  See e.g., LOVEKIN, Abstract, §§ 2, 3) [whether a frame has to be discarded based on how the spectral energy of said frame is distributed over frequency].

    PNG
    media_image10.png
    718
    416
    media_image10.png
    Greyscale
SPENGLER in view of LOVEKIN does not explicitly, but MOATTAR discloses [whether a frame has to be discarded based on how the spectral energy of said frame is distributed over frequency] (See e.g., capabilities for discarding a frame based on spectral energy of said frame is distributed over frequency according to see e.g., “…2-…Primary Threshold for Energy (Energy_PrimThresh)…3-1-Compute frame energy (E(i))…3-4-Set Decision threshold for E…and SFM…3-5-…If(E(i))-Min_E)>=Thrsh_E) then Counter ++…,” See e.g., MOATTAR, Abstract, §§ 2, 3).
SPENGLER, LOVEKIN, and MOATTAR can be considered analogous art because they are from a similar field of endeavor in speech processing techniques and applications.  Thus, it would have been obvious to a person of ordinary skill in the art, before the effective filling date of the claimed invention, to modify the teachings of SPENGLER and LOVEKIN in view of MOATTAR’s techniques comprising, see e.g., a voice activity detection (VAD) architecture using a Spectral Flatness Measure (SFM) algorithmic implementation in order to advantageously help with the performance of speech/audio processing considering see e.g., “…measure of the noisiness of spectrum and is a good feature in Voiced/Unvoiced/Silence detection …,” as such by “…uses[ing] short-term features such as Spectral Flatness (SF) and Short-term Energy. This helps the method to be appropriate for online processing tasks…,” (See e.g., MOATTAR, Abstract, §§ 2, 3).
Claim 7, SPENGLER in view of LOVEKIN and further in view of  MOATTAR discloses:

    PNG
    media_image11.png
    719
    417
    media_image11.png
    Greyscale
7. The system of claim 6, wherein said filtering module is configured to discard a frame if the energy estimator module has assessed that said frame has to be discarded because said frame has a substantial amount of energy above an upper frequency threshold (See e.g., capabilities for filtering mode discarding a frame based on energy estimator assessment  according to substantial amount of energy above an upper frequency threshold based on see e.g., “…2-…Primary Threshold for F (F_PrimThresh)…3-2-1-Find F(i)…as the most dominant frequency component…3-4-Set Decision threshold for…F and SFM…Thresh_F=F_PrimThresh…If(F(i))-Min_F)>=Thrsh_F) then Counter ++…,” See e.g., MOATTAR, Abstract, §§ 2, 3).

With respect to Claim 8, SPENGLER in view of LOVEKIN and further in view of  MOATTAR discloses:
8. The system of claim 7, wherein the energy estimator module is configured to calculate an energy parameter of a corresponding frame based on a ratio of: the energy of the frame pertaining to frequencies lower than said upper frequency threshold (See e.g., “…2-…Primary Threshold for 
    PNG
    media_image12.png
    718
    419
    media_image12.png
    Greyscale
Energy (Energy_PrimThresh)…3-1-Compute frame energy (E(i))…3-4-Set Decision threshold for E…and SFM…If(E(i))-Min_E)>=Thrsh_E) then Counter ++…,” “…2-…Primary Threshold for F (F_PrimThresh)…3-2-1-Find F(i)…as the most dominant frequency component…3-4-Set Decision threshold for…F and SFM…Thresh_F=F_PrimThresh…If(F(i))-Min_F)>=Thrsh_F) then Counter ++…,” See e.g., MOATTAR, Abstract, §§ 2, 3); to the total energy of the frame, wherein: the energy estimator module is further configured to assess that said frame has to be discarded if the corresponding energy parameter is lower than a second threshold (See e.g., “…2-…Primary Threshold for Energy (Energy_PrimThresh)…3-1-Compute frame energy (E(i))…3-4-Set Decision threshold for E…and SFM…3-5-…If(E(i))-Min_E)>=Thrsh_E) then Counter ++…3-6-If Counter > 1…3-7-If current frame is marked as silence, update the energy minimum value: Min_E…3-8-Thresh_E = Energy_PrimThresh*log(Min_E)…,” See e.g., MOATTAR, Abstract, §§ 2, 3).

Allowable Subject Matter
7.	Claims 10 and 11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
8.       The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.  Yanna et al., (Ma, Yanna, and Akinori Nishihara. "Efficient voice activity detection algorithm using long-term spectral flatness measure." EURASIP Journal on Audio, Speech, and Music Processing 2013.1 (2013): 1-18.), discloses, see e.g., “…a novel and robust voice activity detection (VAD) algorithm utilizing long-term spectral flatness measure (LSFM) which is capable of working at 10 dB and lower signal-to-noise ratios(SNRs). This new LSFM-based VAD improves speech detection robustness in various noisy environments by employing a low-variance spectrum estimate and an adaptive threshold. The discriminative power of the new LSFM feature is shown by conducting an analysis of the speech/non-speech LSFM distributions. The proposed algorithm was evaluated under 12 types of noises (11 from NOISEX-92 and speech-shaped noise) and five types of SNR in core TIMIT test corpus...” (See e.g., Yanna et al., Abstract). 
Please, see additional references in form PTO-892 for more details.
9.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Edgar Guerra-Erazo whose telephone number is (571) 270-3708.  The examiner can normally be reached on M-F 7:30a.m.-5:00p.m. EST. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Bhavesh Mehta can be reached on (571) 272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, 
/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656