DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 07/17/2020. Claims 1-20 are pending in the application and have been examined.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 9/15/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.




Claims 1 and 2 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims 1 and 2 recite identifying the air traffic control information in the voice activity using an artificial intelligence algorithm or using the artificial intelligence algorithm to generate an initial transcription.
Regarding claim 1, the limitation of identifying the air traffic control information in the voice activity using an artificial intelligence algorithm, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “by an artificial 
This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional element – using a processor or artificial intelligence algorithm to perform both the identifying and generating steps. The processor or artificial intelligence in both steps is recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of identifying or generating a text transcript) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of 
Regarding claim 2, the limitation of using the artificial intelligence algorithm to generate an initial transcription, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “by an artificial intelligence algorithm,” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the “artificial intelligence algorithm” language, “generating” in the context of this claim encompasses the air control operator listening to the voice activity to generate a transcript of the air control information. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional element – using artificial intelligence algorithm to perform the generating step. The artificial intelligence in the above step is recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of generating a transcript) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of an artificial intelligence algorithm to perform both the generating step amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of pre-AIA  35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1 and 2 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Steuble et.al. (US Patent Application Publication 2015/0073790).
Regarding claim 1, Steuble teaches a method of communicating air traffic control information, the method comprising using a processor unit to perform the steps of: receiving an audio signal comprising voice activity (see Steuble, [0027] the audio packet receipt module 108 may receive audio packets from the network transceiver 129); identifying the air traffic control information in the voice activity using an artificial intelligence algorithm (see Steuble, [0032] the transcription engine may be tuned with domain specific audio recordings and a domain constrained limited set of words and phrases. As an example, in an air traffic control domain the transcription engine may be tuned with past recordings of air traffic control voice communications and a constrained list of the words and phrases likely to be used in air traffic control voice communications, such as internationally recognized commands and the designations of runways and/or flights for a specific airport); generating a text transcript of the air traffic control information in the voice activity (see Steuble, [0033] in block 612 the speech within the audio data may be transcribed by the tuned transcription engine to generate text (e.g., text data) corresponding to the speech within the audio data.   As another example, the text packet or packets and the audio packet or packets may be sent in near real time, usually within a time delay of one another (e.g., within a few seconds of each other) dependent on the time to accumulate a semantic content and a minor transcription processing delay, to an artificial intelligence machine that reacts to the speech within the text packet or packets. In this manner, the artificial intelligence machine may operate as an intelligent agent that reacts to voice communications, by processing the transcribed speech in the text packet or packets which may be more accurate that the artificial intelligence machine itself attempting to process the received audio); and displaying the text transcript of the air traffic control information on a confirmation display (see Steuble, [0033]  in block 614 the text packet or packets and the audio packet or packets may be sent in parallel, for example at the same time or within a specified time, such as one second, of each other. For example, the text packet or packets and the audio packet or packets may be sent in parallel over a network, such as the Internet, to one or more visualization and debriefing asset, such as a computing device having a display and speakers).
Regarding claim 2, Steuble teaches the method of claim 1 further comprising: selecting an audio clip from an audio library (see Steuble, [0031]  FIG. 5 illustrates an example system 500 enabling third party equipment 502 to generate and submit audio packets 208, for example to a network; interpreted as audio library);  playing the audio clip for a user (see Steuble, [0033] In this manner, the visualization and debriefing asset may receive the text packet or packets and the audio packet or packets and may use the text packet or packets to display a textual representation of the speech in the audio data recovered from the text packet or packets and/or audibly play out an audio representation of the speech in the audio data recovered from the audio packet or packets); using the artificial intelligence algorithm to generate an initial transcription (see Steuble, [0033] As another example, the text packet or packets and the audio packet or packets may be sent in near real time, usually within a time delay of one another (e.g., within a few seconds of each other) dependent on the time to accumulate a semantic content and a minor transcription processing delay, to an artificial intelligence machine that reacts to the speech within the text packet or packets); receiving an updated transcript from the user ( see Steuble, [0034] In determination block 616 it may be determined whether additional tuning of the transcription engine is needed and/or available. For example, when an operator is present and reviewing the transcription an indication of an error and/or a correction in the transcription input by the operator may indicate additional tuning is needed); and using the updated transcript to retrain the artificial intelligence algorithm (see Steuble, [0034] In response to determining that additional tuning is available/needed (i.e., determination block 616="Yes"), in block 618 additional tuning may be applied to the transcription engine and the method 600 may return to block 604 and transcribe audio packets with the retuned transcription engine).
Claims 4, 8, 12, 14 and 17 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Xu et. al. (US Patent Application Publication 2005/0071156).
Regarding claim 4, Xu teaches a method for detecting voice activity in an audio signal, comprising: determining a power spectrum of the audio signal (see Xu, [0020] The noise spectrum estimation mechanism 120 may take the preprocessed signal such as the DFT of the input audio signal 107 as input to compute the signal power spectrum (P.sub.y 115 ) and to estimate the noise power spectrum (P.sub.n 125) of the input audio signal ); comparing the power spectrum of the audio signal and a power spectrum of noise to form a comparison (see Xu, [0022]  For example, when speech is recorded, the background sound from the recording environment of the speech may be considered to be noise. The recorded audio signal in this case may then be a compound signal containing both speech and noise. The energy of this compound signal corresponds to the signal power spectrum. The noise power spectrum P.sub.n 125 may be estimated based on the signal power spectrum P.sub.y 115 computed based on the input audio signal 105); identifying portions of the audio signal that comprise speech based on the comparison between the power spectrum of the audio signal and the power spectrum of the noise (see Xu, [0022] The continuously derived dynamic over-subtraction factors may then be fed to the spectral subtraction mechanism 140 where such over-subtraction factors are used in spectral subtraction to produce a subtracted signal 145 that has a lower energy); and forming speech segments comprising the portions of the audio signal that comprise speech (see Xu, [0022] further details related to the spectral subtraction mechanism 140 are described with reference to FIG. 6. To generate an enhanced audio signal 155, the inverse DFT mechanism 150 may then transform the subtracted signal 145 to produce a signal that may have lower noise).
Regarding claim 8, Xu teaches the method of claim 4, wherein determining the power spectrum of the audio signal comprises determining the power spectrum for each of a plurality of analysis windows comprising a time segment of the audio signal (see Xu, [0023, 0024 ] FIG. 2(a) depicts an exemplary functional block diagram of the preprocessing mechanism 
    PNG
    media_image1.png
    412
    493
    media_image1.png
    Greyscale
 110, according to an embodiment of the inventions The exemplary preprocessing mechanism 110 comprises a signal frame generation mechanism 210 and a DFT mechanism 240. The frame generation mechanism 210 may first divide the input audio signal 105 into equal length frames as units for further computation. Each of such frames may typically include, for example, 200 samples per frame and there may be 100 frames per second. The granularity of the division may be determined according to computation requirement or application needs. To reduce the analysis effect near the boundary of each frame, a Hamming window can optionally be applied to each frame).
Regarding claim 12, Xu teaches the method of claim 4, wherein comparing the power spectrum of the audio signal and the power spectrum of the noise comprises comparing a slope of a plot of the power spectrum of the audio signal and a slope of a plot of the power spectrum of the noise (see Xu, [0022] The continuously derived dynamic over-subtraction factors may then be fed to the spectral subtraction mechanism 140 where such over-subtraction factors are used in spectral subtraction to produce a subtracted signal 145 that has a lower energy. Further details related to the spectral subtraction mechanism 140 are described with reference to FIG. 6.  See Xu, [0050] with estimated signal energy, and noise energy at each frame for each subband frequency, and the over-subtraction factor at each frame, a subtraction amount for each frequency at each frame can be calculated, at 740, using, for example, the formula described herein. The computed subtraction amount may then be used to subtract, at 745, from the original signal energy to produce a reduced energy spectrum. The reduced signal power spectrum and the phase information of the original input audio signal are then used to perform, at 750, an inverse DFT operation to generate an enhanced audio signal which may subsequently used for further processing or usage at 755; power spectrum subtraction is interpreted as the comparing of the slope of the power spectrum of audio signal and slope of power spectrum of noise).
Regarding claim 14, Xu teaches an apparatus for detecting voice activity in an audio signal (see Xu[0054] FIG. 10 depicts a different framework 1000, in which spectral subtraction based audio enhancement is embedded in audio signal processing, according to an embodiment of the present invention), comprising: a receiver configured to receive the audio signal (see Xu, [0020] The noise spectrum estimation mechanism 120 may take the preprocessed signal such as the DFT of the input audio signal 107 as input to compute the signal power spectrum (P.sub.y 115 ) and to estimate the noise power spectrum (P.sub.n 125) of the input audio signal); a voice activity detector configured to identify portions of the audio signal that comprise speech based on a comparison between a power spectrum of the audio signal and a power spectrum of noise (see Xu, [0022]  For example, when speech is recorded, the background sound from the recording environment of the speech may be considered to be noise. The recorded audio signal in this case may then be a compound signal containing both speech and noise. The energy of this compound signal corresponds to the signal power spectrum. The noise power spectrum P.sub.n 125 may be estimated based on the signal power spectrum P.sub.y 115 computed based on the input audio signal 105); and a segmenter configured to form speech segments comprising the portions of the audio signal that comprise speech (see Xu, [0018] the dynamic spectral subtraction based audio enhancer 100 receives an input audio signal 105 from an external source and produces an enhanced audio signal 155 as its output.  [0053] FIG. 9 illustrates different exemplary types of audio processing that may utilize the enhanced audio signal 155. Possible audio signal processing 910 may include, but is not limited to, recognition 920, playback 930, . . . , or segmentation 940. Speech recognition tasks 920 may include speech recognition 950, . . . , and speaker recognition 960. Speech based segmentation 940 may include, for example, speaker based segmentation 970, and acoustic based audio segmentation 980).
Regarding claim 17, Xu teaches the apparatus of claim 14, wherein the voice activity detector is configured to define a plurality of analysis windows in the audio signal, wherein each of the plurality of analysis windows comprises a time segment of the audio signal, and to determine the power spectrum of each of the plurality of analysis windows in the audio signal (see Xu, [0029] FIG. 3 depicts an exemplary functional block diagram of the noise 
    PNG
    media_image2.png
    358
    454
    media_image2.png
    Greyscale
spectrum estimation mechanism 120, according to at least one embodiment of the inventions. The noise power spectrum estimation mechanism 120 may include a signal power spectrum estimator 310 and a noise power spectrum estimator 330. It may also optionally include a signal power spectrum filter 320 which is responsible for smoothing the computed signal power spectrum prior to estimating the noise spectrum; signal frame generation and dft mechanism described in Fig 2(a) and 2(b)).

Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Steuble et.al. (US Patent Application Publication 2015/0073790) in view of Xu et. al. (US Patent Application Publication 2005/0071156).
Regarding claim 3, Steuble teaches the method of claim 1 further comprising detecting the voice activity in the audio signal by: identifying the air traffic control information in the speech segments using the artificial intelligence algorithm (see Steuble, [0032] in an embodiment, the transcription engine may be tuned with domain specific audio recordings and a domain constrained limited set of words and phrases. As an example, in an air traffic control domain the transcription engine may be tuned with past recordings of air traffic control voice communications and a constrained list of the words and phrases likely to be used in air traffic control voice communications, such as internationally recognized commands and the designations of runways and/or flights for a specific airport; transcription engine interpreted as artificial intelligence algorithm) but fails to teach comparing a power spectrum of the audio signal and a power spectrum of noise to form a comparison; identifying portions of the audio signal that comprise speech based on the comparison between the power spectrum of the audio signal and the power spectrum of the noise; 45Docket No. 19-2067-US-NP forming speech segments comprising the portions of the audio signal that comprise speech.  However Xu teaches comparing a power spectrum of the audio signal and a power spectrum of noise to form a comparison (see Xu, [0020] The noise spectrum estimation mechanism 120 may take the preprocessed signal such as the DFT of the input audio signal 107 as input to compute the signal power spectrum (P.sub.y 115 ) and to estimate the noise power spectrum (P.sub.n 125) of the input audio signal); identifying portions of the audio signal that comprise speech based on the comparison between the power spectrum of the audio signal and the power spectrum of the noise (see Xu, [0022] The continuously derived dynamic over-subtraction factors may then be fed to the spectral subtraction mechanism 140 where such over-subtraction factors are used in spectral subtraction to produce a subtracted signal 145 that has a lower energy); 45Docket No. 19-2067-US-NP forming speech segments comprising the portions of the audio signal that comprise speech (see Xu, [0022] Further details related to the spectral subtraction mechanism 140 are described with reference to FIG. 6. To generate an enhanced audio signal 155, the inverse DFT mechanism 150 may then transform the subtracted signal 145 to produce a signal that may have lower noise).
Steuble and Xu are both considered to be analogous to the claimed invention because both relate to audio signal enhancement. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Steuble to enable transcription of voice communications to be provided in parallel with an audio recording of the voice communications with the subtraction of noise  Xu, [0004]).
Claims 5, 6, 7, 13, 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Xu et. al. (US Patent Application Publication 2005/0071156) in view of Steuble et.al. (US Patent Application Publication 2015/0073790).
Regarding claim 5, Xu teaches the method of claim 4, but fails to teach wherein the audio signal comprises air traffic control radio communications.  However Steuble teaches wherein the audio signal comprises air traffic control radio communications (see Steuble, [0032] in an embodiment, the transcription engine may be tuned with domain specific audio recordings and a domain constrained limited set of words and phrases. As an example, in an air traffic control domain the transcription engine may be tuned with past recordings of air traffic control voice communications and a constrained list of the words and phrases likely to be used in air traffic control voice communications, such as internationally recognized commands and the designations of runways and/or flights for a specific airport).
	Xu and Steuble are both considered to be analogous to the claimed invention because both relate to audio signal enhancement. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Xu to produce an enhanced audio signal with less apparent noise with the transcription of voice communications to be provided in parallel with an audio recording of the voice communications teachings of Steuble to improve textual search of voice communications for key words (see Steuble, [0004]).
	 Regarding claim 6, Xu teaches the method of claim 5 but fails to teach identifying air traffic control information in the speech segments using an artificial intelligence algorithm; generating a text transcript of the air traffic control information in the speech segments; and displaying the text transcript of the air traffic control information on a confirmation display. However Steuble teaches identifying air traffic control information in the speech segments using an artificial intelligence algorithm (see Steuble, [0032] in an embodiment, the transcription engine may be tuned with domain specific audio recordings and a domain constrained limited set of words and phrases. As an example, in an air traffic control domain the transcription engine may be tuned with past recordings of air traffic control voice communications and a constrained list of the words and phrases likely to be used in air traffic control voice communications, such as internationally recognized commands and the designations of runways and/or flights for a specific airport; transcription engine interpreted as artificial intelligence algorithm ); generating a text transcript of the air traffic control information in the speech segments (see Steuble, [0033] in block 612 the speech within the audio data may be transcribed by the tuned transcription engine to generate text (e.g., text data) corresponding to the speech within the audio data.   As another example, the text packet or packets and the audio packet or packets may be sent in near real time, usually within a time delay of one another (e.g., within a few seconds of each other) dependent on the time to accumulate a semantic content and a minor transcription processing delay, to an artificial intelligence machine that reacts to the speech within the text packet or packets. In this manner, the artificial intelligence machine may operate as an intelligent agent that reacts to voice communications, by processing the transcribed speech in the text packet or packets which may be more accurate that the artificial intelligence machine itself attempting to process the received audio); and displaying the text transcript of the air traffic control information on a confirmation display (see Steuble, [0033]  in block 614 the text packet or packets and the audio packet or packets may be sent in parallel, for example at the same time or within a specified time, such as one second, of each other. For example, the text packet or packets and the audio packet or packets may be sent in parallel over a network, such as the Internet, to one or more visualization and debriefing asset, such as a computing device having a display and speakers).
	Regarding claim 7 Xu teaches the method of claim 4, but fails to teach wherein the audio signal is selected from the group of audio signals consisting of live audio signals and recorded audio signals.  However, Steuble teaches wherein the audio signal is selected from the group of audio signals consisting of live audio signals and recorded audio signals (see Steuble, [0029] FIG. 2 illustrates an embodiment system 200 enabled to generate text packets from audio packets in real time or near real time; see Steuble[0032] FIG. 6 illustrates an embodiment method 600 for providing audio packets of a recorded voice communication and text packs of the transcription of the recorded voice communication in parallel; real time/near real time : interpreted as live).
	Regarding claim 13, Xu teaches the method of claim 4, but fails to teach playing the audio clip for a user; 47Docket No. 19-2067-US-NP using an artificial intelligence algorithm to generate an initial transcription from the speech segments; receiving an updated transcript from the user; and using the updated transcript to retrain the artificial intelligence algorithm.  However, Steuble teaches wherein the audio signal is an audio clip selected from an audio library and further comprising: playing the audio clip for a user (see Steuble, [0033] In this manner, the visualization and debriefing asset may receive the text packet or packets and the audio packet or packets and may use the text packet or packets to display a textual representation of the speech in the audio data recovered from the text packet or packets and/or audibly play out an audio representation of the speech in the audio data recovered from the audio packet or packets); using the artificial intelligence algorithm to generate an initial transcription from the speech segments(see Steuble, [0033] As another example, the text packet or packets and the audio packet or packets may be sent in near real time, usually within a time delay of one another (e.g., within a few seconds of each other) dependent on the time to accumulate a semantic content and a minor transcription processing delay, to an artificial intelligence machine that reacts to the speech within the text packet or packets); receiving an updated transcript from the user ( see Steuble, [0034] In determination block 616 it may be determined whether additional tuning of the transcription engine is needed and/or available. For example, when an operator is present and reviewing the transcription an indication of an error and/or a correction in the transcription input by the operator may indicate additional tuning is needed); and using the updated transcript to retrain the artificial intelligence algorithm (see Steuble, [0034] In response to determining that additional tuning is available/needed (i.e., determination block 616="Yes"), in block 618 additional tuning may be applied to the transcription engine and the method 600 may return to block 604 and transcribe audio packets with the retuned transcription engine).
Regarding claim 15, Xu teaches the apparatus of claim 14, but fails to teach wherein the audio signal comprises air traffic control radio communications.  However Steuble teaches wherein the audio signal comprises air traffic control radio communications (see Steuble, [0032] in an embodiment, the transcription engine may be tuned with domain specific audio recordings and a domain constrained limited set of words and phrases. As an example, in an air traffic control domain the transcription engine may be tuned with past recordings of air traffic control voice communications and a constrained list of the words and phrases likely to be used in air traffic control voice communications, such as internationally recognized commands and the designations of runways and/or flights for a specific airport).
	Xu and Steuble are both considered to be analogous to the claimed invention because both relate to audio signal enhancement. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Xu to produce an enhanced audio signal with less apparent noise with the transcription of voice communications to be provided in parallel with an audio recording of the voice communications teachings of Steuble to improve textual search of voice communications for key words (see Steuble, [0004]).
	Regarding claim 16 Xu teaches the apparatus of claim 14, but fails to teach wherein the audio signal is selected from the group of audio signals consisting of live audio signals and recorded audio signals.  However, Steuble teaches wherein the audio signal is selected from the group of audio signals consisting of live audio signals and recorded audio signals (see Steuble, [0029] FIG. 2 illustrates an embodiment system 200 enabled to generate text packets from audio packets in real time or near real time; see Steuble[0032] FIG. 6 illustrates an embodiment method 600 for providing audio packets of a recorded voice communication and text packs of the transcription of the recorded voice communication in parallel; real time/near real time : interpreted as live).
Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Xu et. al. (US Patent Application Publication 2005/0071156) in view of Ramirez et al., "A New Kullback-Leibler VAD for Speech Recognition in Noise", IEEE Signal Processing Letters, Vol 11, No 2, February 2004 ( as cited in the IDS submitted on 9/15/2020).
Regarding claim 9, Xu teaches the method of claim 4, but fails to teach wherein comparing the power spectrum of the audio signal and the power spectrum of the noise comprises comparing the power spectrum of the audio signal and a power spectrum of Gaussian noise. However Ramirez teaches wherein comparing the power spectrum of the audio signal and the power spectrum of the noise comprises comparing the power spectrum of the audio signal and a power spectrum of Gaussian noise (see Ramirez, pg. 266, section III, the proposed VAD works in the Mel-scaled energy domain and assumes a Gaussian model for the logarithmic filter bank energy (FBE) distributions of speech and noise in each band).
	Xu and Ramirez are both considered to be analogous to the claimed invention because both relate to noise suppression methodologies in speech recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Xu to produce an enhanced audio signal with less apparent noise with the VAD based on Kullback–Leibler (KL) divergence measure teachings of Ramirez to produce sustained improvements in speech/nonspeech hit rates (see Ramirez, pg. 266, section I).
Regarding claim 18, Xu teaches the apparatus of claim 14, but fails to teach wherein wherein the voice activity detector is configured to compare the power spectrum of the audio signal and a power spectrum of Gaussian noise. However Ramirez teaches wherein the voice activity detector is configured to compare the power spectrum of the audio signal and a power spectrum of Gaussian noise (see Ramirez, pg. 266, section III, the proposed VAD works in the Mel-scaled energy domain and assumes a Gaussian model for the logarithmic filter bank energy (FBE) distributions of speech and noise in each band).
	Xu and Ramirez are both considered to be analogous to the claimed invention because both relate to noise suppression methodologies in speech recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed see Ramirez, pg. 266, section I).
Claims 10, 11 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Xu et. al. (US Patent Application Publication 2005/0071156) in view of V. Stahl, A. Fischer and R. Bippus, "Quantile based noise estimation for spectral subtraction and Wiener filtering," 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 2000, pp. 1875-1878 vol.3.
Regarding claim 10, Xu teaches the method of claim 4, however fails to teach wherein comparing the power spectrum of the audio signal and the power spectrum of the noise comprises using a quantile-quantile plot comparison to compare the power spectrum of the audio signal and the power spectrum of the noise. However Stahl teaches wherein comparing the power spectrum of the audio signal and the power spectrum of the noise comprises using a quantile-quantile plot comparison to compare the power spectrum of the audio signal and the power spectrum of the noise (see Stahl, pg. 1876, section 4.2 The idea is to estimate the noise energy in each frequency band by temporal quantiles in the power spectral domain. Pg. 1877, Sec.5 teaches hence we apply spectral subtraction and Wiener filtering for the noise elimination. The FIR Wiener filter is defined as the linear filter which minimizes the mean square error in the time domain. Spectral subtraction relies on the fact that the power spectrum of the sum of two independent random signals is the sum of the power spectra. The noise elimination rule of spectral is therefore simply to subtract the power spectrum of the estimated noise from the power spectrum of the observed signal).
see Stahl, pg. 1875, section 1).
Regarding claim 11, Xu teaches the method of claim 4, however fails to teach wherein comparing the power spectrum of the audio signal and the power spectrum of the noise comprises making a mean squared error comparison of a plot of the power spectrum of the audio signal and a plot of the power spectrum of the noise. However Stahl teaches wherein comparing the power spectrum of the audio signal and the power spectrum of the noise comprises making a mean squared error comparison of a plot of the power spectrum of the audio signal and a plot of the power spectrum of the noise (see Stahl, pg. 1877, Sec.5 teaches hence we apply spectral subtraction and Wiener filtering for the noise elimination. The FIR Wiener filter is defined as the linear filter which minimizes the mean square error in the time domain. Spectral subtraction relies on the fact that the power spectrum of the sum of two independent random signals is the sum of the power spectra. The noise elimination rule of spectral subtraction is therefore simply to subtract the power spectrum of the estimated noise from the power spectrum of the observed signal).
Regarding claim 19, Xu teaches the apparatus of claim 14, wherein the voice activity detector is configured to: compare a slope of a plot of the power spectrum of the audio signal and a slope of a plot of the power spectrum of the noise (see Xu, [0022] The continuously derived dynamic over-subtraction factors may then be fed to the spectral subtraction mechanism 140 where such over-subtraction factors are used in spectral subtraction to produce a subtracted signal 145 that has a lower energy. Further details related to the spectral subtraction mechanism 140 are described with reference to FIG. 6.  See Xu, [0050] with estimated signal energy, and noise energy at each frame for each subband frequency, and the over-subtraction factor at each frame, a subtraction amount for each frequency at each frame can be calculated, at 740, using, for example, the formula described herein. The computed subtraction amount may then be used to subtract, at 745, from the original signal energy to produce a reduced energy spectrum. The reduced signal power spectrum and the phase information of the original input audio signal are then used to perform, at 750, an inverse DFT operation to generate an enhanced audio signal which may subsequently used for further processing or usage at 755; power spectrum subtraction is interpreted as the comparing of the slope of the power spectrum of audio signal and slope of power spectrum of noise). However fails to teach use a quantile-quantile plot comparison to compare the power spectrum of the audio signal and the power spectrum of the noise, or make a mean squared error comparison of a plot of the power spectrum of the audio signal and a plot of the power spectrum of the noise. Stahl teaches use a quantile-quantile plot comparison to compare the power spectrum of the audio signal and the power spectrum of the noise (see Stahl, pg. 1876, section 4.2 the idea is to estimate the noise energy in each frequency band by temporal quantiles in the power spectral domain. Pg. 1877, Sec.5 teaches hence we apply spectral subtraction and Wiener filtering for the noise elimination. The FIR Wiener filter is defined as the linear filter which minimizes the mean square error in the time domain. Spectral subtraction relies on the fact that the power spectrum of the sum of two independent random signals is the sum of the power spectra. The noise elimination rule of spectral is therefore simply to subtract the power spectrum of the estimated noise from the power spectrum of the observed signal), or make a mean squared error comparison of a plot of the power spectrum of the audio signal and a plot of the power spectrum of the noise (see Stahl, pg. 1877, Sec.5 teaches hence we apply spectral subtraction and Wiener filtering for the noise elimination. The FIR Wiener filter is defined as the linear filter which minimizes the mean square error in the time domain. Spectral subtraction relies on the fact that the power spectrum of the sum of two independent random signals is the sum of the power spectra. The noise elimination rule of spectral subtraction is therefore simply to subtract the power spectrum of the estimated noise from the power spectrum of the observed signal).
Xu and Stahl are both considered to be analogous to the claimed invention because both relate to elimination of additive noise from speech signal in audio processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Xu to produce an enhanced audio signal with less apparent noise with the noise estimated as a temporal quantile in the power spectral domain teachings of Stahl to eliminate noise from the input signal (see Stahl, pg. 1875, section 1).
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Xu et. al. (US Patent Application Publication 2005/0071156) in view of Katagiri (US Patent Application Publication 2012/0253813).
Regarding claim 20, Xu teaches the apparatus of claim 14,  however fails to teach wherein the segmenter is configured to form the speech segments using criteria selected from the group of criteria consisting of segment length and amount of discontinuities in the portions of the audio signal that comprise speech.  However Katagiri teaches wherein the segmenter is configured to form the speech segments using criteria selected from the group of criteria consisting of segment length and amount of discontinuities in the portions of the audio signal that comprise speech (see Katagiri, [0059] FIG. 8 shows spectral entropy E1 that 
    PNG
    media_image3.png
    284
    398
    media_image3.png
    Greyscale
is calculated from the input signal S2 when the spectrum operation is not performed, and spectral entropy E2 that is calculated from the input signal S3 after the spectrum operation. In the spectral entropy E1, the spectral entropy value randomly changes and a difference in the spectral entropy values is not found between the speech segment and the non-speech segment. In contrast to this, in the spectral entropy E2, a difference in the spectral entropy values occurs between speech segments (I1 to I3) and non-speech segments (other than the speech segments I1 to I3). The determination portion 105 can accurately determine the speech segment I1, the speech segment I2 and the speech segment I3 based on the spectral entropy E2).
Xu and Katagiri are both considered to be analogous to the claimed invention because both relate to determination of speech from speech signal in audio processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Xu to produce an enhanced audio signal with less apparent noise with the power spectrum analysis teachings of Katagiri to determine speech segments in a non-stationary noise environment (see Katagiri [0008]).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Kopys et. al. (US Patent Application Publication 2018/0350351) teaches an MFCC on a neural network accelerator. The MFCC technique is separated out into several discrete sub-operations, each formed in a part of the hardware accelerator. The sub-operations may include windowing, pre-processing, pre-emphasis, Hanning window, DFT, power spectrum or logarithm spectrum, triangle filter, liftering, high-pass filtering, merging feature vectors. These functions are used to build an acoustic model. The output scores come from the acoustic model (see Kopys [0039]).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 2:00pm - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more 





/NANDINI SUBRAMANI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656