Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-4, 6-14, and 16-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Loui et al. (U.S. Patent Application Pub. No. 2014/0056432, hereinafter “Loui”).
In regard to claim 1, Loui discloses a computer-implemented method (Fig. 2) comprising: 
generating, by a computer, a plurality of audio frames partitioned from a plurality of audio signals (audio features 220 are determined for audio frames from input audio signals 200, paragraphs [0035-0036]); 
generating, by the computer, a plurality of clusters, each cluster comprising one or more audio frames having similar features and associated with a type of sound (frame level features are clustered together to generate clip-level features, the clip-level features each associated with a semantic concept, paragraphs [0037-0038]); 
generating, by the computer, a plurality of incoming audio frames partitioned from an incoming audio signal (during concept detection, frame-level features for each audio frame are determined, paragraph [0039]); and 
detecting, by the computer, the type of sound in each incoming audio frame having the similar features of at least one cluster (frame-level probabilities for the semantic concept detection values are determined, paragraph [0039]).

In regard to claim 2, Loui discloses for each of the audio frames partitioned from the plurality of audio signals: 
generating, by the computer, a feature vector for the particular audio frame based upon a set of one or more features of the particular audio frame (audio features are determined for each frame for use in, e.g., Support Vector Machine classifiers, paragraphs [0035-0036] and [0038]).

In regard to claim 3, Loui discloses generating each of the clusters comprises: 
determining, by the computer, the type of sound associated with the particular cluster based upon a feature vector generated for each of the audio frames of the particular cluster (frame-level features determine semantic concept detection values, using Support Vector Machine classifiers, paragraphs [0038-0039]).

In regard to claim 4, Loui discloses generating each of the clusters comprises: 
determining, by the computer, the type of sound associated with the particular cluster using one or more supervised classifiers trained to classify one or more types of sound (semantic concept detection values are determined using classifiers trained with labelled training data, paragraphs [0040] and [0042]).

In regard to claim 6, Loui discloses training, by the computer, a supervised classifier according to label indicating one or more types of sound associated with a particular cluster (Fig. 4, training labels 415 are used to train the model, paragraph [0042]).

In regard to claim 7, Loui discloses detecting the type of sound in each of the incoming audio frames comprises: 
generating, by the computer, a feature vector for the particular incoming audio frame based upon a set of one or more features of the particular incoming audio frame (aggregated frame level features are clustered for use in a Support Vector Machine classifier, paragraphs [0037-0038]).

In regard to claim 8, Loui discloses detecting the type of sound in the incoming audio frames comprises: 
applying, by the computer, one or more supervised classifiers on one or more feature vectors generated for each of the incoming audio frames, wherein each of the supervised classifiers is trained to classify one or more types of sound (Support Vector Machine classifiers are applied where each classifier is trained to classify a semantic concept indicating a type of sound, paragraphs [0038-0039]).

In regard to claim 9, Loui discloses generating, by the computer, a first output audio score by applying a first audio event detection component to a first type of feature of one or more incoming audio frames (Fig. 5, features extracted from an incoming audio signal are analyzed by a set of semantic concept detectors determining the probability of occurrence of a first semantic concept, paragraphs [0050-0051]); 
generating, by the computer, a second output audio score by applying a second audio event detection component on a second type of feature of the one or more incoming audio frames (features extracted from an incoming audio signal are analyzed by a set of semantic concept detectors determining the probability of occurrence of a second semantic concept, paragraphs [0050-0051]); and 
generating, by the computer, a fused score for the one or more incoming audio frames based upon the first output audio score and the second output audio score, wherein the computer detects the type of sound in the one or more incoming audio frames based upon the fused score (the preliminary semantic concept detection values are fused my calculating the product values of all possible assignments to generate final updated semantic concept detection values, paragraphs [0053-0054]).

In regard to claim 10, Loui discloses a type of feature includes at least one of:
Mel- Frequency Cepstral Coefficients, Perceptual Linear Prediction, and Relative Spectral Transform- Perceptual Linear Prediction (audio features include MFCCs among others, paragraph [0036]).

In regard to claim 11, Loui discloses a system (Fig. 1) comprising: 
a non-transitory storage medium configured to store a plurality of audio signals (data storage system 140, paragraph [0028]); and a processor (data processing system 110, paragraph [0027]) configured to: 
generate a plurality of audio frames partitioned from a plurality of audio signals (audio features 220 are determined for audio frames from input audio signals 200, paragraphs [0035-0036]); 
generate a plurality of clusters, each cluster comprising one or more audio frames having similar features and associated with a type of sound (frame level features are clustered together to generate clip-level features, the clip-level features each associated with a semantic concept, paragraphs [0037-0038]); 
generate a plurality of incoming audio frames partitioned from an incoming audio signal (during concept detection, frame-level features for each audio frame are determined, paragraph [0039]); and 
detect the type of sound in each incoming audio frame having the similar features of at least one cluster (frame-level probabilities for the semantic concept detection values are determined, paragraph [0039]).

In regard to claim 12, Loui discloses for each of the audio frames partitioned from the plurality of audio signals: 
generate a feature vector for the particular audio frame based upon a set of one or more features of the particular audio frame (audio features are determined for each frame for use in, e.g., Support Vector Machine classifiers, paragraphs [0035-0036] and [0038]).

In regard to claim 13, Loui discloses to: 
determine the type of sound associated with the particular cluster based upon a feature vector generated for each of the audio frames of the particular cluster (frame-level features determine semantic concept detection values, using Support Vector Machine classifiers, paragraphs [0038-0039]).

In regard to claim 14, Loui discloses to: 
determine the type of sound associated with the particular cluster using one or more supervised classifiers trained to classify one or more types of sound (semantic concept detection values are determined using classifiers trained with labelled training data, paragraphs [0040] and [0042]).

In regard to claim 16, Loui discloses to:
train a supervised classifier according to label indicating one or more types of sound associated with a particular cluster (Fig. 4, training labels 415 are used to train the model, paragraph [0042]).

In regard to claim 17, Loui discloses to: 
generate a feature vector for the particular incoming audio frame based upon a set of one or more features of the particular incoming audio frame (aggregated frame level features are clustered for use in a Support Vector Machine classifier, paragraphs [0037-0038]).

In regard to claim 18, Loui discloses to:
apply one or more supervised classifiers on one or more feature vectors generated for each of the incoming audio frames, wherein each of the supervised classifiers is trained to classify one or more types of sound (Support Vector Machine classifiers are applied where each classifier is trained to classify a semantic concept indicating a type of sound, paragraphs [0038-0039]).

In regard to claim 19, Loui discloses to:
generate a first output audio score by applying a first audio event detection component to a first type of feature of one or more incoming audio frames (Fig. 5, features extracted from an incoming audio signal are analyzed by a set of semantic concept detectors determining the probability of occurrence of a first semantic concept, paragraphs [0050-0051]); 
generate a second output audio score by applying a second audio event detection component on a second type of feature of the one or more incoming audio frames (features extracted from an incoming audio signal are analyzed by a set of semantic concept detectors determining the probability of occurrence of a second semantic concept, paragraphs [0050-0051]); and 
generate a fused score for the one or more incoming audio frames based upon the first output audio score and the second output audio score, wherein the computer detects the type of sound in the one or more incoming audio frames based upon the fused score (the preliminary semantic concept detection values are fused my calculating the product values of all possible assignments to generate final updated semantic concept detection values, paragraphs [0053-0054]).

In regard to claim 20, Loui discloses a type of feature includes at least one of:
Mel- Frequency Cepstral Coefficients, Perceptual Linear Prediction, and Relative Spectral Transform- Perceptual Linear Prediction (audio features include MFCCs among others, paragraph [0036]).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 5 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Loui, in view of Lim et al. (U.S. Patent Application Pub. No. 2016/0210988, hereinafter “Lim”).
In regard to claims 5 and 15, Loui does not disclose the one or more supervised classifiers include a supervised multi-class classifier trained to classify a plurality of types of sound.
Lim discloses a method/system for classifying sounds (see Abstract), wherein one or more supervised classifiers include a supervised multi-class classifier trained to classify a plurality of types of sound (a sound source classification unit classifies sounds using a multi-class Support Vector Machine (SVM), paragraphs [0060-0062]).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to use a supervised multi-class classifier as the one or more supervised classifiers, because when sound classification is conducted with such a classifier “there is an advantage of providing a user with reliability of the classified sound source”, as taught by Lim (paragraph [0064]).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616. The examiner can normally be reached Mon-Thurs 9AM-3PM (Part time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





BLA 10/24/22
/BRIAN L ALBERTALLI/Primary Examiner, Art Unit 2656