Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings were received on 10/7/2019.  These drawings are accepted.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1,2,3,6,7,8,9,10,11,14-20, is/are rejected under 35 U.S.C. 102a1 as being anticipated by Lopatka et al (US Publication No.: 20190043489).
Claim 1, Lopatka et al discloses
Preamble: A method for recognizing at least one of a non-verbal sound event and a scene in an audio signal (Fig. 1 shows a classifier of an audio signal or acoustic input signal. Label 150,170 as non-verbal sound event and a scene. Paragraph 21 discloses 
a sequence of frames of audio data (Fig. 1, label spectral frames.), the method comprising:
for each frame of the sequence (label 140,160 performs classification for each frame.):
processing the frame of the audio data to extract multiple acoustic features for the frame of audio data (Fig. 1, label 140,160 processes the spectral frames. Fig. 3, label 330 outputs acoustic features for the spectral frames of the audio data. Fig. 4, label 450 outputs the acoustic features for the spectral frames of the audio data.); and
classifying the acoustic features to classify the frame by determining, for each of a set of sound classes, a score that the frame represents the sound class (Paragraph 34 discloses the classifier outputs event scores, “which indicate the probability that an audio frame belongs to a given event class with an associated label. A variety of event classes are possible include classes that represent target event …” );
processing the sound class scores for multiple frames of the sequence of frames to generate, for each frame, a sound class decision for each frame (Fig. 2, label 230 performs processing of the outputs from the DNN classifier, shown in Fig. 5, to determine the class. Fig. Paragraph 38 discloses Fig. 7 shows the raw output scores vs time (frame) for each of 8 labeled events. Such indicates that the output of label 230 is for each frame.); and 
processing the sound class decisions for the sequence of frames to recognize that at least one of a non-verbal sound event and a scene. (Paragraph 1 discloses processing the sound class decisions output by label 230, for example, to trigger a desired action 
Claim 2, Lopatka et al discloses processing the sound class scores includes applying a temporal structure constraint to the sound class scores to generate the sound class decision. (Paragraph 40 discloses a smoothing constant applied to the scores in order to tune the backend processing circuit to react to longer events.)
Claim 3, Lopatka et al discloses classifying the acoustic features comprises classifying the frame of audio data using a set of first classifiers (Fig. 2, label 220 for both impulsive event detection and continuous event.) and wherein applying the temporal structure constraint comprises processing the sound class scores using a second classifier (Fig. 2, label 230 as the second classifier.).
Claim 6, Lopatka et al discloses the set of first classifiers comprises a set of neural network classifiers (Fig. 2, label DNN classifier, 220).
Claim 7, Lopatka et al discloses wherein processing the frame of audio data to extract the acoustic features for the frame of audio data comprises determining a feature vector defining the acoustic features for the frame of audio data (Fig. 3, label impulse features, Fig. 4, label MVCCs, 450. Paragraph 31,32).  
Claim 8, Lopatka et al discloses wherein the frame of audio data comprises time domain audio data for a time window (Fig. 1, label spectral frames, Paragraph 20 discloses STFT), and wherein processing the frame of audio data to extract the acoustic features for the frame of audio data comprises transforming the frame of audio data into frequency domain audio data (Paragraph 20 discloses STFT.).  
Claim 9, Lopatka et al discloses wherein processing the frame of audio data to extract multiple acoustic features for the frame of audio data comprises processing the 
Claim 10, Lopatka et al discloses wherein prior to said classifying the acoustic features to classify the frame (Fig. 1, label DNN classifier.), 
the method comprises concatenating the multiple acoustic features for the frame of audio data with multiple acoustic features for an adjacent frame of audio data in the sequence (Fig. 4, label 450 is the output of concatenation of label 430, Paragraph 32).  
Claim 11, Lopatka et al fails to disclose further comprising adjusting the sound class scores for multiple frames of the sequence of frames based on one or more of: knowledge about one or more of the sound classes; and knowledge about an environment in which the audio data was captured (Fig. 6, label target scores and non-target scores are related to the target events, according to the label associated with the score (paragraph 34). Adjusting of the sound class scores performed at labels 640,650 are based on the knowledge of the one or more sound classes, such as which classes are target and which are not.).  
Claim 14, Lopatka et al discloses wherein processing the sound class scores for multiple frames of the sequence of frames to generate, for each frame, a sound class decision for each frame comprises: filtering the sound class scores for the multiple frames to generate a 27MC1-029030 US PRI smoothed score for each frame (Fig. 6, label smoothing circuit performs smoothing or filtering of the difference between the scores, label 640.); and 
comparing each smooth score to a threshold to determine a sound class decision for each frame (Fig. 6, label 660 compares the smoothed scores to a threshold to determine the event detections.).  

Claim 16, Lopatka et al discloses A non-transitory data carrier carrying processor control code which when running on a device causes the device to perform the method of claim 1.  (paragraph 65. Please see claim 1.)
Claim 17, Lopatka et al discloses A computer system configured to implement the method of claim 1.  (paragraph 62-65. Please see claim 1.) 
Claim 18, Lopatka et al discloses A consumer electronic device comprising the computer system of claim 17.  (Paragraph 62-65, Please see claim 1,17.)
Claim 19, Lopatka et al discloses 
Preamble: A system for recognizing at least one of a non-verbal sound event and a scene in an audio signal (Fig. 1 shows a classifier of an audio signal or acoustic input signal. Label 150,170 as non-verbal sound event and a scene. Paragraph 21 discloses classification of the acoustic signal into classes of scenes. Paragraph 22 discloses classification of acoustic signal into classes of non-verbal sound event.) comprising a sequence of frames of audio data (Fig. 1, label spectral frames.), the system comprising 

wherein the system is configured to: 
for each frame of the sequence (label 140,160 performs classification for each frame.): 
process the frame of audio data to extract multiple acoustic features for the frame of audio data (Fig. 1, label 140,160 processes the spectral frames. Fig. 3, label 330 outputs acoustic features for the spectral frames of the audio data. Fig. 4, label 450 outputs the acoustic features for the spectral frames of the audio data.); and 
classify the acoustic features to classify the frame by determining, for each of a set of sound classes, a score that the frame represents the sound class (Paragraph 34 discloses the classifier outputs event scores, “which indicate the probability that an audio frame belongs to a given event class with an associated label. A variety of event classes are possible include classes that represent target event …” ); 
process the sound class scores for multiple frames of the sequence of frames to generate, for each frame, a sound class decision for each frame (Fig. 2, label 230 performs processing of the outputs from the DNN classifier, shown in Fig. 5, to determine the class. Fig. Paragraph 38 discloses Fig. 7 shows the raw output scores vs time (frame) for each of 8 labeled events. Such indicates that the output of label 230 is for each frame.); and 
process the class decisions for the sequence of frames to recognize the at least one of a non-verbal sound event and scene. (Paragraph 1 discloses processing the sound class decisions output by label 230, for example, to trigger a desired action based on the 
Claim 20, Lopatka et al discloses
Preamble: A sound recognition device for recognizing at least one of a non-verbal sound event and scene in an audio signal (Fig. 1 shows a classifier of an audio signal or acoustic input signal. Label 150,170 as non-verbal sound event and a scene. Paragraph 21 discloses classification of the acoustic signal into classes of scenes. Paragraph 22 discloses classification of acoustic signal into classes of non-verbal sound event.) comprising a sequence of frames of audio data (Fig. 1, label spectral frames.), the sound recognition device comprising: 
a microphone (paragraph 52) to capture the audio data (Fig. 1, label acoustic input); and 
a processor (Paragraph 52,53) configured to: 
28MC1-029030 US PRI receive the audio data from the microphone (Fig. 1, label acoustic input, Paragraph 52); and 
for each frame of the sequence (label 140,160 performs classification for each frame.): 
process the frame of audio data to extract multiple acoustic features for the frame of audio data (Fig. 1, label 140,160 processes the spectral frames. Fig. 3, label 330 outputs acoustic features for the spectral frames of the audio data. Fig. 4, label 450 outputs the acoustic features for the spectral frames of the audio data.); and 
classify the acoustic features to classify the frame by determining, for each of a set of sound classes, a score that the frame represents the sound class (Paragraph 34 discloses the classifier outputs event scores, “which indicate the probability that an 
process the sound class scores for multiple frames of the sequence of frames to generate, for each frame, a sound class decision for each frame (Fig. 2, label 230 performs processing of the outputs from the DNN classifier, shown in Fig. 5, to determine the class. Fig. Paragraph 38 discloses Fig. 7 shows the raw output scores vs time (frame) for each of 8 labeled events. Such indicates that the output of label 230 is for each frame.); and 
process the class decisions for the sequence of frames to recognize the at least one of a non-verbal sound event and scene (Paragraph 1 discloses processing the sound class decisions output by label 230, for example, to trigger a desired action based on the event. This indicates that the at least one non-verbal even and a scene (classes as indicated in paragraphs 21-22) are recognized.).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having 

Claims 12,13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lopatka et al (US Publication No.: 20190043489) in view of McLoughlin et al (Title: Continuous robust sound event classification using time-frequency features and deep learning).
Claim 12, Lopatka et al discloses wherein processing the sound class scores for multiple frames of the sequence of frames to generate, for each frame, a sound class decision for each frame (Fig. 2, label 230 performs processing of the outputs from the DNN classifier, shown in Fig. 5, to determine the class. Fig. Paragraph 38 discloses Fig. 7 shows the raw output scores vs time (frame) for each of 8 labeled events. Such indicates that the output of label 230 is for each frame.) comprises using features or MFCCs across more than one frame (Fig. 6,8 shows the event scores processed, wherein such event scores are generated by processing MFCCs or features in the DNN, Fig. 1, label DNN classifier.), but fails to disclose MFCC includes using an optimal path search algorithm.
	McLoughlin et al discloses performing MFCC-HMM using Viterbi algorithm. (Section MFCC-HMM discloses the use of Viterbi to determine features and Section SIF with SVM, DNN and CNN performs classification.) It would be obvious to one skilled in the art before the effective filing date of the application to modify the DNN classifier of Lopatka et al with the DNN using Viterbi algorithm as disclosed by McLoughlin et al so to improve the performance of the classifier.
	Claim 13, McLoughlin et al discloses the optimal path search algorithm is a Viterbi algorithm (Section MFCC-HMM discloses using Viterbi algorithm.).
Allowable Subject Matter
Claims 4,5 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LINDA WONG whose telephone number is (571)272-6044. The examiner can normally be reached 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LINDA WONG/Primary Examiner, Art Unit 2655