DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 04/17/2020. Claims 1-19 are pending in the application and have been examined.
	
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 and 14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims 1 and 14 recite extracting features of audio sample to determine the occupancy in the area using a processor.
Regarding claims 1 and 14, the limitation of extracting features of audio sample to determine the occupancy in the area using a processor, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the 
This judicial exception is not integrated into a practical application. In particular, the claims only recites one additional element – using a processor to perform both the extracting and determining steps. The processor in both steps is recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of identifying or generating a text transcript) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea. 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of pre-AIA  35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1 and 14 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Nongpiur et.al. (US Patent Application Publication 2016/0335488).
	Regarding claim 1, Nongpiur teaches a method of detecting occupancy in an area comprising: obtaining, with a processor, an audio sample from an audio sensor (see Nongpiur, [0029] The environmental sensor 61 can include one or more of the sensors noted above, such as a microphone sensor or any other suitable environmental sensor that obtains a corresponding type of information about the environment in which the premises management device 60 is located); determining, with the processor, feature functional values of a set of selected feature functionals from the audio sample, the determining of the feature functional values comprising (see Nongpiur, [0029] The processor 64 can receive and analyze data obtained by the sensor 61, control operations of other components of the premises management device 60 and process communication with other devices by executing instructions stored on the computer-readable memory 65): extracting features in the set of selected feature functionals from the audio sample (see Nongpiur, [0038] FIG. 2 is an illustrative block diagram of a sound-event detector executed by the processor(s) 64 according to an implementation of the disclosed subject matter. As noted above, the sound-event detection is carried out in three stages, including a feature extraction stage 202); and determining the feature functional values of the set of selected features from the extracted features (see Nongpiur, [0038] a classification stage 204, and a detection stage 206, but embodiments are not limited thereto); and determining, with the processor, occupancy in the area using a classifier based on the determined feature functional values (see Nongpiur, [0038] In the feature extraction stage 202, sound data provided by the microphone sensor 61 is received and the most relevant features that help in discriminating sounds such as human and pet occupancy sounds from other sounds, are extracted from the spectrogram of the audio signal; discriminating sound interpreted as determining occupancy in the area).
	Regarding claim 14, Nongpiur teaches a system for determining occupancy in an area comprising: at least one audio sensor configured to record an audio sample in the area (see Nongpiur [0029] The memory 65 or another memory in the premises management device 60 can also store environmental data obtained by the sensor 61); a processor configured to execute programmed instructions stored in a memory to (see Nongpiur [0029] The processor 64 can receive and analyze data obtained by the sensor 61, control operations of other components of the premises management device 60 and process communication with other devices by executing instructions stored on the computer-readable memory 65): obtain the audio sample from the audio sensor (see Nongpiur, [0029] The environmental sensor 61 can include one or more of the sensors noted above, such as a microphone sensor or any other suitable environmental sensor that obtains a corresponding type of information about the environment in which the premises management device 60 is located); determine feature functional values of a set of selected feature functionals from the audio sample, the determining of the feature functional values comprising(see Nongpiur, [0029] The processor 64 can receive and analyze data obtained by the sensor 61, control operations of other components of the premises management device 60 and process communication with other devices by executing instructions stored on the computer-readable memory 65): extracting features in the set of selected feature functionals from the audio sample (see Nongpiur, [0038] FIG. 2 is an illustrative block diagram of a sound-event detector executed by the processor(s) 64 according to an implementation of the disclosed subject matter. As noted above, the sound-event detection is carried out in three stages, including a feature extraction stage 202); and determining the feature functional values of the set of selected features from the extracted features (see Nongpiur, [0038] a classification stage 204, and a detection stage 206, but embodiments are not limited thereto); and determine occupancy in the area using a classifier based on the determined feature functional values (see Nongpiur, [0038] In the feature extraction stage 202, sound data provided by the microphone sensor 61 is received and the most relevant features that help in discriminating sounds such as human and pet occupancy sounds from other sounds, are extracted from the spectrogram of the audio signal; discriminating sound interpreted as determining occupancy in the area).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.

Claims 2, 11, 12 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Nongpiur et.al. (US Patent Application Publication 2016/0335488) in view of Rachuri et. al. (US Patent Application Publication 2016/0162844).
Regarding claim 2 Nongpiur teaches the method of claim 1, but fails to teach wherein the classifier is a decision tree classifier. However Rachuri teaches wherein the classifier is a decision tree classifier (see Rachuri, [0060] Once features are extracted from a frame, in block 450 the features are classified using a decision tree to estimate whether the frame represents a voice, non-voice, or silence. In one example, a C4.5 model is used for representing the audio decision tree).
Nongpiur and Rachuri are considered to be analogous to the claimed invention because they relate to detection and analyzing of sounds in a particular environment using audio sensors. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Nongpiur on obtaining accurate, real-time data measurement which permits detection of sounds in a particular see Rachuri, [0004]).
Regarding claim 11 Nongpiur teaches the method of claim 2, however fails to teach wherein the set of feature functionals includes at least two selected from the group consisting of: maximum MFCC-1; mean energy; dynamic range of envelope dynamic range; mean of brightness; dynamic range of brightness; median of brightness; entropy of MFCC-4 delta; standard deviation of spectral flux; entropy of MFCC-19; standard deviation of envelope dynamic range; entropy of envelope dynamic range. However Rachuri teaches wherein the set of feature functionals includes at least two selected from the group consisting of: maximum MFCC-1; mean energy (see Rachuri, [0065] in one embodiment, MFCC coefficients are useful in audio classification, especially for speaker identification. In addition to sound classification, system 300 also aims to analyze meeting dialogue by performing speaker identification and displaying visualizations with speaker percentages for analytic feedback. In one embodiment, MFCC features are included in the audio pipeline 321. In one example embodiment, a total of 13 MFCC features are used that include 12 MFCC coefficients and MFCC energy extracted from a given frame); dynamic range of envelope dynamic range; mean of brightness; dynamic range of brightness; median of brightness; entropy of MFCC-4 delta; standard deviation of spectral flux (see Rachuri, [0064] In one embodiment, the audio pipeline 321 uses mean, standard deviation (Std), and Root Mean Square (RMS). In one example, the audio pipeline 321 uses mean and Std of the normalized absolute audio signal values for a given frame. RMS is the quadratic mean of the audio signal values. Since the amplitude of silence is close to zero, these basic statistical features are helpful in differentiating silence from speech and ambient noise. Zero crossing rate (ZCR) is the number of time-domain crossings over zero in a given frame, i.e., the number of times the audio signal value changes from positive value to negative value and vice versa. ZCR is quite useful for discriminating human speech from music as human voice shows a higher variance in ZCR than music or ambient noise. In one embodiment, the following spectral features are used by system 300: spectral bandwidth, spectral roll-off, spectral centroid, and spectral flux. These frequency domain features have been found to be highly effective in audio classification. Music generally contains more diverse frequencies, therefore, spectral bandwidth can be used to differentiate it from other sounds as it is a measure of the width of the range of frequencies. Spectral flux and roll off can also be useful for distinguishing music from human voice as these are higher for the former than the latter); entropy of MFCC-19; standard deviation of envelope dynamic range; entropy of envelope dynamic range.
	Regarding claim 12, Nongpiur teaches the method of claim 1, but fails to teach wherein the set of selected feature functionals and the classifier are learned in a machine-learning training process. Rachuri teaches wherein the set of selected feature functionals and the classifier are learned in a machine-learning training process (see Rachuri, [0063] in one embodiment, the audio pipeline 321 uses several features to train its decision tree model and to classify an audio sample in real-time).
Regarding claim 15, Nongpiur teaches the system of claim 14, but fails to teach wherein the classifier is a decision tree classifier. However Rachuri teaches wherein the classifier is a decision tree classifier (see Rachuri, [0060] Once features are extracted from a frame, in block 450 the features are classified using a decision tree to estimate whether the frame represents a voice, non-voice, or silence. In one example, a C4.5 model is used for representing the audio decision tree).
Nongpiur and Rachuri are considered to be analogous to the claimed invention because they relate to detection and analyzing of sounds in a particular environment using audio sensors. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Nongpiur on obtaining accurate, real-time data measurement which permits detection of sounds in a particular environment with the classifier method configured to analyze the sensor data teachings of Rachuri to improve automation of meeting detection and analysis using sensor data (see Rachuri, [0004]).
Claims 3 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Nongpiur et.al. (US Patent Application Publication 2016/0335488) in view of Rachuri et. al. (US Patent Application Publication 2016/0162844) further in view of L. Rutkowski, M. Jaworski, L. Pietruczuk and P. Duda, "A New Method for Data Stream Mining Based on the Misclassification Error," in IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 5, pp. 1048-1059, May 2015.
Regarding claim 3, Nongpiur and Rachuri teach the method of claim 2, however fails to teach wherein the decision tree classifier has maximum depth of between 2 and 15. However Rutkowski teaches wherein the decision tree classifier has maximum depth of between 2 and 15 (see Rutkowski, pg. 1054 section IV, the maximum depth of the synthetic tree is dmax (at this level all nodes are replaced by leaves). After the whole tree is constructed, to each leaf, a class is randomly assigned. Each synthetic tree represents a different data concept, which is a particular distribution of attributes values and classes. In brief, data concept determines the correlations between the attributes and classes, i.e., it is a particular classification problem. For the purpose of the following simulations, twelve synthetic trees were generated (all of them with D=30 binary attributes, dmin=3 and dmax=18)).
Nongpiur and Rachuri teaches all the claimed features of the method of claim 2, and Rutkowski teaches the range of max decision tree classifier depth. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Nongpiur and Rachuri on obtaining accurate, real-time data measurement using the classifier method configured to analyze the sensor data to improve the consideration of decision tree depth constraints due to misclassification errors for stream data as taught by Rutkowski (see Rutkowski, pg. 1049, section 1.B).
Regarding claim 16, Nongpiur and Rachuri teach the system of claim 15, however fails to teach wherein the decision tree classifier has maximum depth of between 2 and 15. However Rutkowski teaches wherein the decision tree classifier has maximum depth of between 2 and 15 (see Rutkowski, pg. 1054 section IV, the maximum depth of the synthetic tree is dmax (at this level all nodes are replaced by leaves). After the whole tree is constructed, to each leaf, a class is randomly assigned. Each synthetic tree represents a different data concept, which is a particular distribution of attributes values and classes. In brief, data concept determines the correlations between the attributes and classes, i.e., it is a particular classification problem. For the purpose of the following simulations, twelve synthetic trees were generated (all of them with D=30 binary attributes, dmin=3 and dmax=18)).
Nongpiur and Rachuri teaches all the claimed features of the system of claim 15, and Rutkowski teaches the range of max decision tree classifier depth. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Nongpiur and Rachuri on obtaining accurate, real-see Rutkowski, pg. 1049, section 1.B).
Claims 4 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Nongpiur et.al. (US Patent Application Publication 2016/0335488) in view of Rachuri et. al. (US Patent Application Publication 2016/0162844) further in view of Nowazin et. al. (US Patent Application Publication 2014/0122381).
Regarding claim 4, Nongpiur and Rachuri teach the method of claim 2, however fails to teach wherein the decision tree classifier has a maximum depth of five. However Nowazin teaches wherein the decision tree classifier has a maximum depth of five (see Nowazin, [0039] It is then determined 318 whether the value for the optimal uncertainty reduction is less than a threshold. If the value for the uncertainty reduction is less than the threshold, then this indicates that further expansion of the tree does not provide significant benefit. This gives rise to asymmetrical trees which naturally stop growing when no further nodes are needed. In such cases, the current node is set 320 as a leaf node. Similarly, the current depth of the tree is determined 318 (i.e. how many levels of nodes are between the root node and the current node). If this is greater than a predefined maximum value, then the current node is set 320 as a leaf node. In one example, the maximum tree depth can be set to 15 levels, although other values can also be used).
Nongpiur and Rachuri teaches all the claimed features of the method of claim 2, and Nowazin teaches the specific depths of decision tree for optimal uncertainty reduction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective see Nowazin, [0005]).
Regarding claim 17, Nongpiur and Rachuri teach the system of claim 15, however fails to teach wherein the decision tree classifier has a maximum depth of five. However Nowazin teaches wherein the decision tree classifier has a maximum depth of five (see Nowazin, [0039] It is then determined 318 whether the value for the optimal uncertainty reduction is less than a threshold. If the value for the uncertainty reduction is less than the threshold, then this indicates that further expansion of the tree does not provide significant benefit. This gives rise to asymmetrical trees which naturally stop growing when no further nodes are needed. In such cases, the current node is set 320 as a leaf node. Similarly, the current depth of the tree is determined 318 (i.e. how many levels of nodes are between the root node and the current node). If this is greater than a predefined maximum value, then the current node is set 320 as a leaf node. In one example, the maximum tree depth can be set to 15 levels, although other values can also be used).
Nongpiur and Rachuri teaches all the claimed features of the system of claim 15, and Nowazin teaches the specific depths of decision tree for optimal uncertainty reduction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Nongpiur and Rachuri on obtaining accurate, real-time data measurement using the classifier method configured to analyze the sensor data to improve the accuracy of predictions and fewer errors using Gini index for see Nowazin, [0005]).
Claims 5, 6, 7, 10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Nongpiur et.al. (US Patent Application Publication 2016/0335488) in view of Rachuri et. al. (US Patent Application Publication 2016/0162844) further in view of S. Wegener, M. Haller, J. J. Burred, T. Sikora, S. Essid and G. Richard, "On the robustness of audio features for musical instrument classification," 2008 16th European Signal Processing Conference, 2008.
Regarding claim 5, Nongpiur and Rachuri teach the method of claim 2, however fail to teach wherein the set of feature functionals includes between 5 and 25 feature functionals. However, Wegener teaches wherein the set of feature functionals includes between 5 and 25 feature functionals (see Wegner, pg.3 section 5 after feature extraction, a feature ranking is created for each signal modification and the original audio data with SFS resulting in 11 rankings. The SFS is performed on approximately 106 frames for each signal modification. The Table 4 lists the 13 best features for all signal modifications and the original audio data selected with SFS).
Nongpiur, Rachuri and Wegner are considered to be analogous to the claimed invention because they relate to audio classifications based on efficient audio features. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Nongpiur and Rachuri on obtaining accurate, real-time data measurement with classification method which permits detection of sounds in a particular environment with the robust feature ranking teachings of Wegner to improve classification accuracy of classification systems (see Wegner, pg. 1, section 1).
Regarding claim 6, Nongpiur and Rachuri teach the method of claim 2, however wherein the set of feature functionals includes between 10 and 20 feature functionals. However, Wegner teaches wherein the set of feature functionals includes between 10 and 20 feature functionals (see Wegner, pg. 3 section 5 after feature extraction, a feature ranking is created for each signal modification and the original audio data with SFS resulting in 11 rankings. The SFS is performed on approximately 106 frames for each signal modification. The Table 4 lists the 13 best features for all signal modifications and the original audio data selected with SFS).
Regarding claim 7, Nongpiur and Rachuri teach the method of claim 2, however fail to teach wherein the set of feature functionals includes 11 feature functionals. However, Wegner teaches wherein the set of feature functionals includes 11 feature functionals (see Wegner, pg. 3 section 5 after feature extraction, a feature ranking is created for each signal modification and the original audio data with SFS resulting in 11 rankings. The SFS is performed on approximately 106 frames for each signal modification. The Table 4 lists the 13 best features for all signal modifications and the original audio data selected with SFS).
Regarding claim 10, Nongpiur and Rachuri teach the method of claim 2, however fail to teach wherein the set of feature functionals includes at least two selected from the group consisting of: maximum of MFCC-1; mean of energy; dynamic range of envelope dynamic range; mean of brightness; dynamic range of brightness; median of brightness; entropy of MFCC-4 delta; standard deviation of spectral flux; entropy of MFCC-19; standard deviation of envelope dynamic range; entropy of envelope dynamic range; absolute integral of MFCC-5 delta; entropy of zero crossing rate; absolute integral of brightness; entropy of spectral roll off; entropy of brightness; entropy of spectral flux; entropy of spectral variance; entropy of MFCC-7; entropy of MFCC-2 delta; entropy of MFCC-1 delta-delta; entropy of MFCC-5; entropy of energy; entropy of MFCC-3; and entropy of MFCC-20 delta. However Wegner teaches wherein the set of feature functionals includes at least two selected from the group consisting of: maximum of MFCC-1; mean of energy; dynamic range of envelope dynamic range; mean of brightness; dynamic range of brightness; median of brightness; entropy of MFCC-4 delta; standard deviation of spectral flux; entropy of MFCC-19; standard deviation of envelope dynamic range; entropy of envelope dynamic range; absolute integral of MFCC-5 delta; entropy of zero crossing rate; absolute integral of brightness; entropy of spectral roll off; entropy of brightness; entropy of spectral flux; entropy of spectral variance; entropy of MFCC-7; entropy of MFCC-2 delta; entropy of MFCC-1 delta-delta; entropy of MFCC-5; entropy of energy; entropy of MFCC-3; and entropy of MFCC-20 delta (see Wegner, pg. 3 section 5 for isolated notes, also some of the lower MFCCs, the first two perceptual adapted loudness coefficients (Ld-1 and Ld-2), the spectral centroids (SC and ASC) and the temporal centroid (TC) have low ranks for all signal modifications and the original signals. The deviation of feature ranks over the SFS rankings corresponding to different signal modifications is relatively low for these mentioned features. Furthermore, the maximum, minimum, mean, median, standard deviation, skewness, and kurtosis of each feature over all classes and for all signal modifications and the original database are extracted to explore their changes. Great variations of the statistics of a feature over the different signal modifications suggest that this feature is highly influenced by this signal modifications and thus not very robust. The features LAT and TC show very small changes of their statistics, so the statistical evaluation supports the results of the robust feature ranking in Table 5. This features could be considered as very robust features. Some of the lower MFCCs and the first two Ld coefficients (Ld-l and Ld-2), SC, and ASC show some larger differences between the statistics for additive noise, so they could be considered as some less robust features for noise, although they are among the 13 best features of the robust feature ranking, but they seem to be robust against all other signal modifications).
Regarding claim 19, Nongpiur and Rachuri teach the system of claim 15, however fail to teach wherein the set of feature functionals includes at least two selected from the group consisting of: maximum of MFCC-1; mean of energy; dynamic range of envelope dynamic range; mean of brightness; dynamic range of brightness; median of brightness; entropy of MFCC-4 delta; standard deviation of spectral flux; entropy of MFCC-19; standard deviation of envelope dynamic range; entropy of envelope dynamic range; absolute integral of MFCC-5 delta; entropy of zero crossing rate; absolute integral of brightness; entropy of spectral roll off; entropy of brightness; entropy of spectral flux; entropy of spectral variance; entropy of MFCC-7; entropy of MFCC-2 delta; entropy of MFCC-1 delta-delta; entropy of MFCC-5; entropy of energy; entropy of MFCC-3; and entropy of MFCC-20 delta. However Wegner teaches wherein the set of feature functionals includes at least two selected from the group consisting of: maximum of MFCC-1; mean of energy; dynamic range of envelope dynamic range; mean of brightness; dynamic range of brightness; median of brightness; entropy of MFCC-4 delta; standard deviation of spectral flux; entropy of MFCC-19; standard deviation of envelope dynamic range; entropy of envelope dynamic range; absolute integral of MFCC-5 delta; entropy of zero crossing rate; absolute integral of brightness; entropy of spectral roll off; entropy of brightness; entropy of spectral flux; entropy of spectral variance; entropy of MFCC-7; entropy of MFCC-2 delta; entropy of MFCC-1 delta-delta; entropy of MFCC-5; entropy of energy; entropy of MFCC-3; and entropy of MFCC-20 delta (see Wegner, pg. 3 section 5 for isolated notes, also some of the lower MFCCs, the first two perceptual adapted loudness coefficients (Ld-1 and Ld-2), the spectral centroids (SC and ASC) and the temporal centroid (TC) have low ranks for all signal modifications and the original signals. The deviation of feature ranks over the SFS rankings corresponding to different signal modifications is relatively low for these mentioned features. Furthermore, the maximum, minimum, mean, median, standard deviation, skewness, and kurtosis of each feature over all classes and for all signal modifications and the original database are extracted to explore their changes. Great variations of the statistics of a feature over the different signal modifications suggest that this feature is highly influenced by this signal modifications and thus not very robust. The features LAT and TC show very small changes of their statistics, so the statistical evaluation supports the results of the robust feature ranking in Table 5. This features could be considered as very robust features. Some of the lower MFCCs and the first two Ld coefficients (Ld-l and Ld-2), SC, and ASC show some larger differences between the statistics for additive noise, so they could be considered as some less robust features for noise, although they are among the 13 best features of the robust feature ranking, but they seem to be robust against all other signal modifications).
Nongpiur, Rachuri and Wegner are considered to be analogous to the claimed see Wegner, pg. 1, section 1).
Claims 8, 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Nongpiur et.al. (US Patent Application Publication 2016/0335488) in view of Rachuri et. al. (US Patent Application Publication 2016/0162844) further in view of I. McCowan, D. Dean, M. McLaren, R. Vogt and S. Sridharan, "The Delta-Phase Spectrum With Application to Voice Activity Detection and Speaker Recognition," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2026-2038, Sept. 2011.
Regarding claim 8, Nongpiur and Rachuri teach the method of claim 2, however fail to teach wherein the features include one or more of envelope dynamic range, zero crossing rate, energy, brightness, spectral variance, spectral roll off, spectral flux, at least one MFCC coefficient, a delta of at least one MFCC coefficient, and a delta-delta of at least one MFCC coefficient. However, McCowan teaches wherein the features include one or more of envelope dynamic range, zero crossing rate, energy, brightness, spectral variance, spectral roll off, spectral flux, at least one MFCC coefficient, a delta of at least one MFCC coefficient, and a delta-delta of at least one MFCC coefficient (see McCowan, pg. 2034 section VI. APPLICATION TO SPEAKER RECOGNITION The effectiveness of the proposed MFDP features for voice activity detection was demonstrated in the preceding section, with the fusion results highlighting the complementary information they offer to MFCC features. This section seeks to further validate the proposed phase representation by investigating whether MFDP features are also able to capture speaker discriminative information from the phase domain through their application to the task of speaker recognition. Also McCowan, pg. 2028, pg. 2032, section V, in order to validate the proposed delta-phase spectrum and Mel-Frequency Delta Phase features derived from it, a first set of experiments was conducted applying the features for a simple voice activity, or speech/nonspeech, detection task; interpreted as delta of at least one MFCC coefficient).
Nongpiur, Rachuri and McCowan are considered to be analogous to the claimed invention because they relate to voice activity detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Nongpiur and Rachuri on obtaining accurate, real-time data measurement with classification method which permits detection of sounds in a particular environment with the phase differences of frame step interval teachings of McCowan to improve detecting voice activity in noise, or distinguishing voices (see McCowan, pg. 2036, section I).
Regarding claim 9, Nongpiur, Rachuri and McCowan teach the method of claim 8. McCowan further teaches wherein the set of feature functionals include at least one of the group consisting of: mean, median, standard deviation, absolute integral, minimum, maximum, dynamic range, dominant-frequency, and entropy, determined for each of the features (see McCowan, pg. 2034, section VI, the two feature sets will be examined in the context of a GMM Supervector SVM system. MFCC variants of this system have demonstrated state-of-the-art performance in recent SRE’s. The GMM Supervector SVM system [53] combines robust yet straightforward acoustic modeling in the form of mean-adapted high-order Gaussian mixture models (GMMs) with more recent discriminative machine learning approaches through support vector machine (SVM) classification. Zero and Test-norm score normalization was applied to all scores to reduce the statistical variation observed in scores.  Zero-norm is a speaker-centric technique in which a speaker’s scores are scaled by the mean and standard deviation, obtained when scoring the impostor cohort against the speaker model).
Regarding claim 18, Nongpiur and Rachuri teach the system of claim 15, however fail to teach wherein the features include one or more of envelope dynamic range, zero crossing rate, energy, brightness, spectral variance, spectral roll off, spectral flux, at least one MFCC coefficient, a delta of at least one MFCC coefficient, and a delta-delta of at least one MFCC coefficient; and the set of feature functionals include at least one of the group consisting of: mean, median, standard deviation, absolute integral, minimum, maximum, dynamic range, dominant- frequency, and entropy, determined for each of the features. However McCowan teaches wherein the features include one or more of envelope dynamic range, zero crossing rate, energy, brightness, spectral variance, spectral roll off, spectral flux, at least one MFCC coefficient, a delta of at least one MFCC coefficient, and a delta-delta of at least one MFCC coefficient (see McCowan, pg. 2034 section VI. APPLICATION TO SPEAKER RECOGNITION The effectiveness of the proposed MFDP features for voice activity detection was demonstrated in the preceding section, with the fusion results highlighting the complementary information they offer to MFCC features. This section seeks to further validate the proposed phase representation by investigating whether MFDP features are also able to capture speaker discriminative information from the phase domain through their application to the task of speaker recognition. Also McCowan, pg. 2028, pg. 2032, section V, in order to validate the proposed delta-phase spectrum and Mel-Frequency Delta Phase features derived from it, a first set of experiments was conducted applying the features for a simple voice activity, or speech/nonspeech, detection task; interpreted as delta of at least one MFCC coefficient); and the set of feature functionals include at least one of the group consisting of: mean, median, standard deviation, absolute integral, minimum, maximum, dynamic range, dominant- frequency, and entropy, determined for each of the features (see McCowan, pg. 2034, section VI, the two feature sets will be examined in the context of a GMM Supervector SVM system. MFCC variants of this system have demonstrated state-of-the-art performance in recent SRE’s. The GMM Supervector SVM system [53] combines robust yet straightforward acoustic modeling in the form of mean-adapted high-order Gaussian mixture models (GMMs) with more recent discriminative machine learning approaches through support vector machine (SVM) classification. Zero and Test-norm score normalization was applied to all scores to reduce the statistical variation observed in scores.  Zero-norm is a speaker-centric technique in which a speaker’s scores are scaled by the mean and standard deviation, obtained when scoring the impostor cohort against the speaker model).
Nongpiur, Rachuri and McCowan are considered to be analogous to the claimed invention because they relate to voice activity detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the .
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Nongpiur et.al. (US Patent Application Publication 2016/0335488) in view of  I. McCowan, D. Dean, M. McLaren, R. Vogt and S. Sridharan, "The Delta-Phase Spectrum With Application to Voice Activity Detection and Speaker Recognition," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2026-2038, Sept. 2011.
Regarding claim 13, Nongpiur teaches the method of claim 1, however fails to teach wherein the determining of the feature functional values further comprises segmenting the extracted features. However, McCowan teaches wherein the determining of the feature functional values further comprises segmenting the extracted features (see McCowan, pg. 2033, section V, Speech and non-speech segmentation decisions were made by comparing the smoothed speech scores to a threshold).
Nongpiur and McCowan are considered to be analogous to the claimed invention because they relate to voice activity detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Nongpiur on obtaining accurate, real-time data measurement which permits detection of sounds in a particular environment with the phase differences of frame step interval teachings of McCowan to improve detecting see McCowan, pg. 2036, section I).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Huang, Q., Ge, Z., & Lu, C. (2016). Occupancy estimation in smart buildings using audio-processing techniques. arXiv preprint arXiv:1602.08507 teaches occupancy estimations based on speaker recognition, based on background STE (audio energy) level associated with different size crowds (see Huang,  pg. 2, section 3 ).
Nongpiur, (US Patent Application Publication 2015/0380013) teaches a human detection system that is sensitive, accurate, has low computational effort, and sufficiently flexible for different indoor environments (see Nongpiur, [0007]).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 2:00pm - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/NANDINI SUBRAMANI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656