DETAILED ACTION

Introduction
1.         This office action is in response to Applicant’s submission filed on 04/04/2019.  Claims 1-20 are pending in the application. As such, Claims 1-20 have been examined.

Notice of Pre-AIA  or AIA  Status
2. 	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
3.	The drawings filed on 04/04/2019 have been accepted and considered by the Examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


4.	Claims 1-3, 10-12, 19 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over (a) Baidwan et al., (V. S. Baidwan and S. Gujral, "Comparative Analysis of Prosodic Features and Linear Predictive Coefficients for Speaker Recognition Using Machine Learning Technique," 2014 International Conference on Devices, Circuits and Communications (ICDCCom), 2014, pp. 1-8), in view of (b) Savafi et al., (S. Safavi, H. Gan, I. Mporas and R. Sotudeh, "Fraud Detection in Voice-Based Identity Authentication Applications and Services," 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), 2016, pp. 1074-1081), hereinafter referred to as BAIDWAN and SAVAFI.
	With respect to Claim 1, BAIDWAN discloses:
1. A computer-implemented method comprising: 
segmenting, by the computer, an audio signal from an incoming phone call into a plurality of audio frames (See e.g., “…Voice samples from 100 different speakers (male and female both) were collected using Sony Xperia C Smartphone, which has 1.2 GHz Quad Core processor and 1 GB of RAM. Speakers were asked to speak hello word ten times in normal speed…,” “…pre-processing phase, the voice sample which are recorded with more clarity (noise free) are selected from the whole samples and voice samples which having duration of only 2 second will be selected. Many voice samples have duration of 3 or 4 second, so for these samples special voice cutter is used to obtain a sample which has duration of 2 second…,” See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5); 
extracting, by the computer, a pitch parameter, a set of formant parameters, and a residual parameter for an audio frame of the plurality of audio frames based on a source-filter model (See e.g., “…100 speaker’s 1000 instances are there and same smartphone was used for all recordings. Speech signals were sampled at 22,050 Hz with 16 bits of precision. In this section, from the samples speech signal, prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5);

    PNG
    media_image1.png
    649
    580
    media_image1.png
    Greyscale

generating, by the computer, a pitch parameter statistic based upon the pitch parameter of the audio frame and respective pitch parameters of other audio frames of the plurality of audio frames (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5); 

generating, by the computer, formant parameters statistics based upon the set of formant parameters for the audio frame and respective sets of formant parameters of other audio frames of the plurality of audio frames (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5); 

(See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5); 
calculating, by the computer, a voice modification score for the audio signal based upon 
comparing the pitch parameter statistic with a normal human speech pitch parameter statistic (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating and comparing capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5), comparing the formant parameters statistics with corresponding normal human speech formant parameter statistics (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating and comparing capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5), and 
(See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating and comparing capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5), the [voice modification score] indicating probability of the audio signal containing a modified human speech (See e.g., how generating and comparing capabilities via the Machine Learning (ML) Training  and Testing use metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, where “Accuracy:…correct classifications of class x divided by the total number of classifications of class x…Precision…the proportion of…which truly have class x among all those which are classified as class x…Recall…is a measure of the ability of a prediction model to select instances of a certain class from a data set also called as sensitivity and corresponds to the true positive rate…” See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5); and 
[determining, by the computer, whether the incoming phone call is fraudulent based upon the voice modification score.]

    PNG
    media_image2.png
    670
    520
    media_image2.png
    Greyscale
 	BAIDWAN does not explicitly, but SAVAFI discloses [voice modification score] and [determining, by the computer, whether the incoming phone call is fraudulent based upon the voice modification score] (See e.g., how voice modification scores can be observed in Scores A, B, C in development and evaluation sets in collaboration with Fusion from Train and Test inputs where “…usage of scores from the development and evaluation sets for the training and testing of the hybrid countermeasure architecture…” are provided in Fig. 5 and “…using voice for authentication could be beneficial in several application areas, including, security, protection, education, call-based and web-based services. Voice-based biometric applications are subject to different types of spoofing attacks. The most accessible and affordable type of spoofing for a voice-based biometrics system is a replay attack. Replay, which is to playback a pre-recorded speech sample, presents a genuine risk to automatic speaker verification technology. This work presents two architectures for detecting frauds caused by replay attacks in a voice-based biometrics authentication systems. Experimental results confirmed that obtained performances from both methods could further improve by applying a machine learning algorithm for performing fusion at the score level…,”  See e.g., SAVAFI Abstract, §§II, III,  Fig. 5).
BAIDWAN and SAVAFI can be considered analogous art because they are from a similar field of endeavor in voice processing and authentication techniques and applications.  Thus, it would have been obvious to a person of ordinary skill in the art, before the effective filling date of (See e.g., SAVAFI Abstract, §§II, III, Fig. 5).

With respect to Claim 2, BAIDWAN in view of SAVAFI discloses:
2. The computer-implemented method of claim 1, wherein the pitch parameter statistic includes at least one of an average pitch value and pitch consistency (See e.g., “…100 speaker’s 1000 instances are there and same smartphone was used for all recordings. Speech signals were sampled at 22,050 Hz with 16 bits of precision. In this section, from the samples speech signal, prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5).

With respect to Claim 3, BAIDWAN in view of SAVAFI discloses:
3. The computer-implemented method of claim 1, wherein the format parameters statistics include at least one of average formant values and inter-formant consistency (See e.g., “…100 speaker’s 1000 instances are there and same smartphone was used for all recordings. Speech signals were sampled at 22,050 Hz with 16 bits of precision. In this section, from the samples speech signal, prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5).

With respect to Claim 10, BAIDWAN discloses:
10. A system comprising: a non-transitory storage medium storing a plurality of computer program instructions; a processor electrically coupled to the non-transitory storage medium and configured to execute the plurality of computer program instructions to: 
segment an audio signal from an incoming phone call into a plurality of audio frames (See e.g., “…Voice samples from 100 different speakers (male and female both) were collected using Sony Xperia C Smartphone, which has 1.2 GHz Quad Core processor and 1 GB of RAM. Speakers were asked to speak hello word ten times in normal speed…,” “…pre-processing phase, the voice sample which are recorded with more clarity (noise free) are selected from the whole samples and voice samples which having duration of only 2 second will be selected. Many voice samples have duration of 3 or 4 second, so for these samples special voice cutter is used to obtain a sample which has duration of 2 second…,” See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5); 
extract a pitch parameter, a set of formant parameters, and a residual parameter for an audio frame of the plurality of audio frames based on a source-filter model (See e.g., “…100 speaker’s 1000 instances are there and same smartphone was used for all recordings. Speech signals were sampled at 22,050 Hz with 16 bits of precision. In this section, from the samples speech signal, prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5); 

    PNG
    media_image1.png
    649
    580
    media_image1.png
    Greyscale
generate a pitch parameter statistic based upon the pitch parameter of the audio frame and respective pitch parameters of other audio frames of the plurality of audio frames (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5); 
generate formant parameters statistics based upon the set of formant parameters for the audio frame and respective sets of formant parameters of other audio frames of the plurality of audio frames (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5);
generate a residual parameter statistic based upon the residual parameter of the audio frame and respective residual parameters of the other audio frames of the plurality of the audio frames (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5);
calculate a voice modification score for the audio signal based upon comparing the pitch parameter statistic with a normal human speech pitch parameter statistic (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating and comparing capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5), comparing the formant parameters statistics with corresponding normal human speech formant parameter statistics (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating and comparing capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5), and comparing the residual parameter statistic with a normal human speech residual parameter statistic (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating and comparing capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5), the [voice modification score] indicating probability of the audio signal containing a modified human speech (See e.g., how generating and comparing capabilities via the Machine Learning (ML) Training  and Testing use metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, where “Accuracy:…correct classifications of class x divided by the total number of classifications of class x…Precision…the proportion of…which truly have class x among all those which are classified as class x…Recall…is a measure of the ability of a prediction model to select instances of a certain class from a data set also called as sensitivity and corresponds to the true positive rate…” See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5); and [determine whether the incoming phone call is fraudulent based upon the voice modification score.]
BAIDWAN does not explicitly, but SAVAFI discloses [voice modification score] and [determine whether the incoming phone call is fraudulent based upon the voice modification score] (See e.g., how voice modification scores can be observed in Scores A, B, C in development and evaluation sets in collaboration with Fusion from Train and Test inputs where “…usage of scores from the development and evaluation sets for the training and testing of the hybrid countermeasure architecture…” are provided in Fig. 5 and “…using voice for authentication could be beneficial in several application areas, including, security, protection, education, call-based and web-based services. Voice-based biometric applications are subject to different types of spoofing attacks. The most accessible and affordable type of spoofing for a voice-based biometrics system is a replay attack. Replay, which is to playback a pre-recorded speech sample, presents a genuine risk to automatic speaker verification technology. This work presents two architectures for detecting frauds caused by replay attacks in a voice-based biometrics authentication systems. Experimental results confirmed that obtained performances from both methods could further improve by applying a machine learning algorithm for performing fusion at the score level…,”  See e.g., SAVAFI Abstract, §§II, III,  Fig. 5).
BAIDWAN and SAVAFI can be considered analogous art because they are from a similar field of endeavor in voice processing and authentication techniques and applications.  Thus, it would have been obvious to a person of ordinary skill in the art, before the effective filling date of the claimed invention, to modify the teachings of BAIDWAN in view of SAVAFI’s techniques comprising, see e.g., fusion model training and testing “…architectures for detecting frauds…in a voice-based biometrics authentication systems…” in order to advantageously permit see e.g., “…further improve[ment] by applying a machine learning algorithm for performing fusion at the score level …” and as such “…using voice for authentication could be beneficial in several application areas, including, security, protection, education, call-based and web-based services…,” (See e.g., SAVAFI Abstract, §§II, III, Fig. 5).

With respect to Claim 11, BAIDWAN in view of SAVAFI discloses:
11. The system of claim 10, wherein the pitch parameter statistic includes at least one of an average pitch value and pitch consistency (See e.g., “…100 speaker’s 1000 instances are there and same smartphone was used for all recordings. Speech signals were sampled at 22,050 Hz with 16 bits of precision. In this section, from the samples speech signal, prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5).

With respect to Claim 12, BAIDWAN in view of SAVAFI discloses:
12. The system of claim 10, wherein the format parameters statistics include at least one of average formant values and inter-formant consistency (See e.g., “…100 speaker’s 1000 instances are there and same smartphone was used for all recordings. Speech signals were sampled at 22,050 Hz with 16 bits of precision. In this section, from the samples speech signal, prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5).

With respect to Claim 19, BAIDWAN discloses:
19. A computer-implemented method comprising: 
extracting, by a computer, frame level parameters from an audio signal of an incoming phone call based upon a physical model of human speech (See e.g., “…Voice samples from 100 different speakers (male and female both) were collected using Sony Xperia C Smartphone, which has 1.2 GHz Quad Core processor and 1 GB of RAM. Speakers were asked to speak hello word ten times in normal speed…,” “…pre-processing phase, the voice sample which are recorded with more clarity (noise free) are selected from the whole samples and voice samples which having duration of only 2 second will be selected. Many voice samples have duration of 3 or 4 second, so for these samples special voice cutter is used to obtain a sample which has duration of 2 second…,” See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5); 
(See e.g., “…100 speaker’s 1000 instances are there and same smartphone was used for all recordings. Speech signals were sampled at 22,050 Hz with 16 bits of precision. In this section, from the samples speech signal, prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5); 

    PNG
    media_image1.png
    649
    580
    media_image1.png
    Greyscale
executing, by the computer, a single-class machine learning model trained on normal human speech recordings on the parameter statistics to generate [a voice modification score] (See e.g., See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted; and how the generating and comparing capabilities via the Machine Learning (ML) Training  and Testing use metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, where “Accuracy:…correct classifications of class x divided by the total number of classifications of class x…Precision…the proportion of…which truly have class x among all those which are classified as class x…Recall…is a measure of the ability of a prediction model to select instances of a certain class from a data set also called as sensitivity and corresponds to the true positive rate…” See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5); and [determining, by the computer, whether the incoming phone call is fraudulent based upon the voice modification score].

    PNG
    media_image2.png
    670
    520
    media_image2.png
    Greyscale
BAIDWAN does not explicitly, but SAVAFI discloses [a voice modification score] and [determining, by the computer, whether the incoming phone call is fraudulent based upon the voice modification score] (See e.g., how voice modification scores can be observed in Scores A, B, C in development and evaluation sets in collaboration with Fusion from Train and Test inputs where “…usage of scores from the development and evaluation sets for the training and testing of the hybrid countermeasure architecture…” are provided in Fig. 5 and “…using voice for authentication could be beneficial in several application areas, including, security, protection, education, call-based and web-based services. Voice-based biometric applications are subject to different types of spoofing attacks. The most accessible and affordable type of spoofing for a voice-based biometrics system is a replay attack. Replay, which is to playback a pre-recorded speech sample, presents a genuine risk to automatic speaker verification technology. This work presents two architectures for detecting frauds caused by replay attacks in a voice-based biometrics authentication systems. Experimental results confirmed that obtained performances from both methods could further improve by applying a machine learning algorithm for performing fusion at the score level…,”  See e.g., SAVAFI Abstract, §§II, III,  Fig. 5).
BAIDWAN and SAVAFI can be considered analogous art because they are from a similar field of endeavor in voice processing and authentication techniques and applications.  Thus, it would have been obvious to a person of ordinary skill in the art, before the effective filling date of the claimed invention, to modify the teachings of BAIDWAN in view of SAVAFI’s techniques comprising, see e.g., fusion model training and testing “…architectures for detecting frauds…in a voice-based biometrics authentication systems…” in order to advantageously permit see e.g., “…further improve[ment] by applying a machine learning algorithm for performing fusion at the score level …” and as such “…using voice for authentication could be beneficial in several application areas, including, security, protection, education, call-based and web-based services…,” (See e.g., SAVAFI Abstract, §§II, III, Fig. 5).

With respect to Claim 20, BAIDWAN in view of SAVAFI discloses:
20. The computer-implemented method of claim 19, wherein the physical model is a source-filter model (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating and comparing capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5).

s 4, 5, 13, 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over (a) Baidwan et al., (V. S. Baidwan and S. Gujral, "Comparative Analysis of Prosodic Features and Linear Predictive Coefficients for Speaker Recognition Using Machine Learning Technique," 2014 International Conference on Devices, Circuits and Communications (ICDCCom), 2014, pp. 1-8), in view of (b) Savafi et al., (S. Safavi, H. Gan, I. Mporas and R. Sotudeh, "Fraud Detection in Voice-Based Identity Authentication Applications and Services," 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), 2016, pp. 1074-1081), and further in view of (c) Adiga et al., (Adiga, Nagaraj, Banriskhem K. Khonglah, and SR Mahadeva Prasanna. "Improved voicing decision using glottal activity features for statistical parametric speech synthesis." Digital Signal Processing 71 (2017): 131-143), hereinafter referred to as BAIDWAN,  SAVAFI, and ADIGA.

With respect to Claim 4, BAIDWAN in view of SAVAFI discloses:
4. The computer-implemented method of claim 1, wherein the residual parameter includes [residual kurtosis] and the residual parameter statistic includes [residual kurtosis consistency] (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating and comparing capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5).
[residual kurtosis] and [residual kurtosis consistency](See e.g., “…ratio of skewness to kurtosis computed from integrated linear prediction residual (ILPR) signal…,” See e.g., ADIGA §§ 3, 3.2.3)
BAIDWAN, SAVAFI, and ADIGA can be considered analogous art because they are from a similar field of endeavor in voice processing techniques and applications.  Thus, it would have been obvious to a person of ordinary skill in the art, before the effective filling date of the claimed invention, to modify the teachings of BAIDWAN in view of SAVAFI with ADIGA’s techniques comprising, see e.g., higher-order statistics (HOS) measure techniques and applications comprising “…ratio of skewness to kurtosis computed from integrated linear prediction residual (ILPR) signal[s]…” in order to advantageously be, see e.g., “…used for capturing the asymmetric characteristic[s] of glottal pulse[s] …,” (See e.g., ADIGA §§ 3, 3.2.3).

With respect to Claim 5, BAIDWAN in view of SAVAFI discloses:
5. The computer-implemented method of claim 1, wherein the residual parameter indicates [at least one of glottal closure instances, glottal opening instances, and a model of glottal activity] (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating and comparing capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5).
BAIDWAN in view of SAVAFI does not explicitly, but ADIGA discloses [at least one of glottal closure instances, glottal opening instances, and a model of glottal activity] (See e.g., “…skewness and kurtosis ratio (SKR) evidence is the higher-order statistics (HOS) measure…” techniques and applications considered to “…represent[s] the glottal activity feature.…,” See e.g., ADIGA §§ 3, 3.2.3).
BAIDWAN, SAVAFI, and ADIGA can be considered analogous art because they are from a similar field of endeavor in voice processing techniques and applications.  Thus, it would have been obvious to a person of ordinary skill in the art, before the effective filling date of the claimed invention, to modify the teachings of BAIDWAN in view of SAVAFI with ADIGA’s techniques comprising, see e.g., higher-order statistics (HOS) measure techniques and applications comprising “…ratio of skewness to kurtosis computed from integrated linear prediction residual (ILPR) signal[s]…” in order to advantageously be, see e.g., “…used for capturing the asymmetric characteristic[s] of glottal pulse[s] …,” as well as “…represent[ing] the glottal activity feature…,” (See e.g., ADIGA §§ 3, 3.2.3).

With respect to Claim 13, BAIDWAN in view of SAVAFI discloses:
13. The system of claim 10, wherein the residual parameter includes [residual kurtosis] and the residual parameter statistic includes [residual kurtosis consistency] (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating and comparing capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5).
[residual kurtosis] and [residual kurtosis consistency] (See e.g., “…ratio of skewness to kurtosis computed from integrated linear prediction residual (ILPR) signal…,” See e.g., ADIGA §§ 3, 3.2.3)
BAIDWAN, SAVAFI, and ADIGA can be considered analogous art because they are from a similar field of endeavor in voice processing techniques and applications.  Thus, it would have been obvious to a person of ordinary skill in the art, before the effective filling date of the claimed invention, to modify the teachings of BAIDWAN in view of SAVAFI with ADIGA’s techniques comprising, see e.g., higher-order statistics (HOS) measure techniques and applications comprising “…ratio of skewness to kurtosis computed from integrated linear prediction residual (ILPR) signal[s]…” in order to advantageously be, see e.g., “…used for capturing the asymmetric characteristic[s] of glottal pulse[s] …,” (See e.g., ADIGA §§ 3, 3.2.3).

With respect to Claim 14, BAIDWAN in view of SAVAFI discloses:
14. The system of claim 10, wherein the residual parameter indicates [at least one of glottal closure instances, glottal opening instances, and a model of glottal activity] (See e.g., “…both the prosodic features and LPC are extracted…,” “…prosodic features and LPC Coefficients were extracted…four features are extracted namely fundamental frequencies (F0) and first three formants (F1, F2 and F3)…,” with generating and comparing capabilities via the Machine Learning (ML) Training  and Testing, with metrics comprising accuracy, precision, and recall based on prosodic features and LPC Coefficients extracted, See e.g., BAIDWAN §§2.1-2.3, 3, 3.1-3.6, 5).
[at least one of glottal closure instances, glottal opening instances, and a model of glottal activity] (See e.g., “…skewness and kurtosis ratio (SKR) evidence is the higher-order statistics (HOS) measure…” techniques and applications considered to “…represent[s] the glottal activity feature.…,” See e.g., ADIGA §§ 3, 3.2.3).
BAIDWAN, SAVAFI, and ADIGA can be considered analogous art because they are from a similar field of endeavor in voice processing techniques and applications.  Thus, it would have been obvious to a person of ordinary skill in the art, before the effective filling date of the claimed invention, to modify the teachings of BAIDWAN in view of SAVAFI with ADIGA’s techniques comprising, see e.g., higher-order statistics (HOS) measure techniques and applications comprising “…ratio of skewness to kurtosis computed from integrated linear prediction residual (ILPR) signal[s]…” in order to advantageously be, see e.g., “…used for capturing the asymmetric characteristic[s] of glottal pulse[s] …,” as well as “…represent[ing] the glottal activity feature…,” (See e.g., ADIGA §§ 3, 3.2.3).

Allowable Subject Matter
6.	Claims 6-9, 15-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
7.       The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.  Nemer et al., (E. Nemer, R. Goubran and S. Mahmoud, "Robust voice activity detection using higher-order statistics in the LPC residual domain," in IEEE Transactions on Speech and Audio Processing, vol. 9, no. 3, pp. 217-231, March 2001), discloses, see e.g., “…a robust algorithm for voice activity detection (VAD) based on newly established properties of the higher order statistics (HOS) of speech. Analytical expressions for the third and fourth-order cumulants of the LPC residual of short-term speech are derived assuming a sinusoidal model. The flat spectral feature of this residual results in distinct characteristics for these cumulants in terms of phase, periodicity and harmonic content and yields closed-form expressions for the skewness and kurtosis…The proposed VAD algorithm combines HOS metrics with second-order measures, such as SNR and LPC prediction error, to classify speech and noise frames. The variance of the HOS estimators is quantified and used to yield a likelihood measure for noise frames. Moreover, a voicing condition for speech frames is derived based on the relation between the skewness and kurtosis of voiced speech...” (See e.g., Nemer et al., Abstract). 
Please, see additional references in form PTO-892 for more details.
8.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Edgar Guerra-Erazo whose telephone number is (571) 270-3708.  The examiner can normally be reached on M-F 7:30a.m.-5:00p.m. EST. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Bhavesh Mehta can be reached on (571) 272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, 
/EDGAR X GUERRA-ERAZO/            Primary Examiner, Art Unit 2656