DETAILED ACTION
This communication is in response to the Application and Preliminary Amendment filed on 06/25/2019 and 08/16/2019. Claims 1-15 are pending and have been examined, with claims 1, 8, and 12 being independent. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/25/2019 and 08/28/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The abstract of the disclosure is objected to because of the following informalities:
The abstract (Item 57) is not submitted by itself on a separate sheet.  
Appropriate correction is required.
Applicant is reminded of the proper language and format for an abstract of the disclosure.
A patent abstract is a concise statement of the technical disclosure of the patent and should include that which is new in the art to which the invention pertains. The abstract should 
If the patent is of a basic nature, the entire technical disclosure may be new in the art, and the abstract should be directed to the entire disclosure. If the patent is in the nature of an improvement in an old apparatus, process, product, or composition, the abstract should include the technical disclosure of the improvement. The abstract should also mention by way of example any preferred modifications or alternatives. 
Where applicable, the abstract should include the following: (1) if a machine or apparatus, its organization and operation; (2) if an article, its method of making; (3) if a chemical compound, its identity and use; (4) if a mixture, its ingredients; (5) if a process, the steps.
Extensive mechanical and design details of an apparatus should not be included in the abstract. The abstract should be in narrative form and generally limited to a single paragraph within the range of 50 to 150 words in length.
See MPEP § 608.01(b) for guidelines for the preparation of patent abstracts.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the 
The disclosure is objected to because of the following informalities: 
Regarding Paragraph 48, “publically” is misspelled and should be changed to “publicly.”
Appropriate correction is required.
Claim Objections
Claims 13-15 are objected to because of the following informalities: 
Regarding Claim 13-15, inconsistent terminology “including" is used instead of “further including.”
Appropriate correction is required.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

1, 6, 8 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Eronen et. al. (Foreign Patent Publication No. WO 2016/102737 A1), hereinafter Eronen. 
Regarding Claim 1, Eronen discloses an audio signal classifier comprising:
a feature extractor (Eronen, Figure 2 and 3, Page 9, Lines 33-34: controller 202 acts as a feature extractor 30) to extract metadata from an audio signal (Eronen, Figure 6, audio signal 50; Figure 4, Steps s4.16 and s4.17, Page 29, Line 30: storing [as extracting] an indication that the tag 37 applies as part of metadata for the audio track), the metadata defining a plurality of features of the audio signal (Figure 4, Step s4.1: Extract acoustic features s14.10), the feature extractor to generate a feature vector including selected features of the audio signal (Eronen, Figure 4, Step s4.4 and s4.12: Define feature vector; Page 23, Lines 17-18: controller 202 defines a single "feature vector" for each set of selected features 31 or selected combination of features 31), the selected features including a duration of the audio signal (Eronen, Page 2, Lines 13-16: Acoustic features that may be extracted include…a duration of a musical track), and each selected feature having a feature value (Eronen, Figure 4, Step s4.4, Page 23, Line 26: calculate a respective probability [as feature value] for each feature vector); and
a machine learning model trained to classify the audio signal as one of a plurality of audio signal classes based on the feature vector (Eronen, Figure 3, Page 17, Line 25-26: the first classifiers 33 are support vector machine (SVM) [as machine learning model] classifiers), the machine learning model to generate a plurality of class values based on the feature values (Eronen, Page 24, Lines 1-2: different classifications [as class value] may be based on respective selections of features from the available features 31 extracted in step s4.2), each class value corresponding to one of the plurality of audio signal classes, the plurality of class values 

Regarding Claim 6, Eronen discloses all of Claim 1 limitations above. Furthermore, Eronen discloses an audio signal classifier further comprising:
the plurality of audio signal classes comprising a voice class, a music class, and a movie class. (Eronen, Page 2, Line 4-7: metadata may indicates at least one of the following characteristics of the audio data: a musical instrument included in the audio data, the presence or absence of vocals and/ or a vocalist gender [as a voice class], presence or absence of music [as movie class] and a musical genre [as music class]; Page 8, Lines 33-35: As well as audio content, the terminals 104 may be capable of streaming or downloading video content [as movie class] over the network 102 and presenting the video content using the speaker and a display)


A non-transitory computer-readable storage medium comprising computer-executable instructions, executable by at least one processor to: (Eronen, Page 6, Lines 10-14: non-transitory tangible computer program product includes computer readable instructions which, when executed by a processing arrangement, causes the processing arrangement to determine whether audio data matches an audio track in a catalogue of audio tracks)

Regarding Claim 12, method claim 12 is similar to system claim 1 in scope and content, and therefore, is similarly rejected under the same rationale as applied above with respect to the functions depicted by system claim 1.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the 


Claims 2, 3, 7, 9, 10, 13, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Eronen, in view of Jeremic et. al. (Non-Patent Literature, “Music Classification by genre using neural networks,” 6 January 2013, Page 1-10, Neuroph and SourceForge), hereinafter Jeremic.
Regarding Claim 2, Eronen discloses an audio signal classifier further comprising:
a model (Page 20, Lines 21-23: New SVM classifiers [as a model] can be added by collecting new training data and training the new classifiers. Since the SVM classifiers 33 are binary, new classifiers can be added alongside existing classifiers) trained with a plurality of modeled audio frames each representing a different sound of a plurality of sounds (Figure 5, Steps s.5.3 and s5.10: Frame blocking leads to output features [as modeled audio frames]), the model to generate a plurality of class values based on audio frames of the audio signal (Eronen, Page 24, Lines 1-2: different classifications [as class value] may be based on respective selections of features from the available features 31 extracted in step s4.2), the plurality of class values together indicating the class of the audio signal (Eronen, Page 25, Lines 32-34: first classifications may include probabilities p_instr that a particular instrument is included in the audio track [as class] and p_genr probabilities that the audio track belongs to a particular genre [as class]), the class of the audio signal to select audio presets to adjust audio output of loudspeakers. (Eronen, Page 10, Line 12-16: Such tags 37 may be stored by the tagging module 
Eronen does not disclose:
A deep learning model
However, Jeremic teaches an audio signal classifier comprising:
a deep learning model (Jeremic, Page 1, Introduction to the problem: training the neural network [as deep learning model] that uses music songs data set. Data set contains features [as plurality of modeled audio frames each representing a different sound of a plurality of sounds] from symbolic songs (MP3, in this case) and uses them to classify the recordings by genre [as class of the audio signal])
Eronen and Jeremic are considered to be analogous to the claimed invention because they are in the same field of audio classification. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Eronen (directed to audio signal classifier comprising an SVM classifier model) and Jeremic (directed to a deep learning model) and arrived at to audio signal classifier comprising a deep learning model. One of ordinary skill in the art would have been motivated to make such a combination because artificial neural networks have found profound success in the area of pattern recognition, can be trained to discern the criteria used to classify, and can do so in a generalized manner by repeatedly showing a neural network inputs classified into groups (Jeremic, Page 1, Introduction). 

Regarding Claim 3, Eronen in view of Jeremic discloses all of Claim 2 limitations above. Furthermore, Eronen discloses an audio signal classifier further comprising:
the feature extractor to generate a robustness value to indicate whether the extracted metadata is valid or invalid (Eronen, Page 32, Lines 14-18: The controller 202 then searches for tracks in the catalogue with matching metadata (step s14.8). In some embodiments, fuzzy string matching, for example using the Levenshtein distance [as robustness value to indicate whether the extracted metadata is valid or invalid], can be used to allow for minor differences in the textual metadata.)
a reliability evaluator to generate a reliability value to indicate whether the plurality of class values generated by the machine learning model is reliable or unreliable. (Eronen, Page 21, Lines 10-12: An accuracy [as reliability value] of the first classification based on the initial feature selection is determined. Further features are then added to the feature selection to determine whether or not the accuracy of the first classification is improved by their inclusion [as whether the plurality of class values generated by the machine learning model is reliable or unreliable]; Page 21, Lines 26-27: After a feature is added to the feature selection, the accuracy of the first classification is reassessed)

Regarding Claim 7, Eronen discloses all of Claim 1 limitations above, with the exception of the following:
the machine learning model comprising a neural network including: 

a plurality of output neurons, each output neuron providing a class value corresponding to a different one of the plurality of audio classes. 
However, Jeremic teaches an audio signal classifier further comprising:
the machine learning model comprising a neural network including: (Jeremic, Page 1, Introduction to the problem: training the neural network [as machine learning model comprising a neural network] that uses music songs data set.)
a plurality of input neurons, each input neuron corresponding to a different one of the selected features of the feature vector; and (Jeremic, Page 5, Step 4.5: creation of new network which will have 4 input neurons, 20 hidden and 4 output neurons; Page 5, Step 3.5: Features of music songs which we will take, would be just only those for which we believe they have a decisive influence on the classification, actually those are tempo, root mean square (RMS) amplitude, dynamic range and number of digital errors [corresponding to a different one of the selected features])
a plurality of output neurons, each output neuron providing a class value corresponding to a different one of the plurality of audio classes. (Jeremic, Page 5, Step 4.5: creation of new network which will have 4 input neurons, 20 hidden and 4 output neurons; Page 1, Introduction to the problem: train neural network to classify this 4 type of genre [as class value] and to discover which observed features has impact on classification. Each instance has one of 4 possible classes: classic, rock, jazz or folk)


Regarding Claim 9 and 13, CRM claim 9 and method claim 13 are similar to system claim 2 in scope and content, and therefore, are similarly rejected under the same rationale as applied above with respect to the functions depicted by system claim 2. 

Regarding Claim 10 and 14, CRM claim 10 and method claim 14 are similar to system claim 3 in scope and content, and therefore, are similarly rejected under the same rationale as applied above with respect to the functions depicted by system claim 3. 

Regarding Claim 15, Eronen in view of Jeremic discloses all of Claim 14 limitations above. Furthermore, Eronen discloses a method of classifying audio signals further comprising:

only the first plurality of class values when the robustness value indicates that the extracted metadata is valid (Eronen, Page 32, Lines 14-18: The controller 202 then searches for tracks in the catalogue with matching metadata [as valid metadata) (step s14.8). In some embodiments, fuzzy string matching, for example using the Levenshtein distance [as robustness value], can be used to allow for minor differences in the textual metadata.) and when the reliability value indicates that the first plurality of class values is reliable; (Eronen, Page 21, Lines 10-12: An accuracy [as reliability value] of the first classification based on the initial feature selection is determined. Further features are then added to the feature selection to determine whether or not the accuracy of the first classification is improved by their inclusion [as whether the plurality of class values generated by the machine learning model is reliable]; Figure 12 shows electric guitar with audio tracks above class average accuracy threshold [as indicates that the first plurality of class values is reliable]; 
only the second plurality of class values when the robustness value indicates that the extracted metadata is invalid; and (unmapped due to method claim being a conditional limitation requiring at least one condition being met)
the first plurality of class values and the second plurality of class values when the robustness value indicates that the extracted metadata is valid and the reliability value indicates that the first plurality of class values is unreliable. (unmapped due to method claim being a conditional limitation requiring at least one condition being met)

5 is rejected under 35 U.S.C. 103 as being unpatentable over Eronen, in view of Jeremic, in further view of Dimaria et. al. (US PGPub No. US 20160196270 A1), hereinafter Dimaria, and in further view of Jeong (US PGPub No. 20070294716 A1), hereinafter Jeong.
Regarding Claim 5, Eronen discloses all of Claim 1 limitations above. Furthermore, Eronen discloses an audio signal classifier further comprising:
the feature vector, in addition to the duration of audio signal, including the selected features of a bit-depth (Eronen, Page 2, Lines 13-16: Acoustic features that may be extracted include…a duration of a musical track; Page 10, Line 24, resolution of 16 bits [as bit depth])
Eronen does not disclose:
a sample rate, a presence or absence of video data, an audio channel count, and a presence or absence of object-based or channel-based audio.
However, Jeremic teaches:
a sample rate (Jeremic, Page 1, Introduction to the problem: attributes are sampling rate)
Eronen, in view of Jeremic, does not disclose:
a presence or absence of video data, an audio channel count, and a presence or absence of object-based or channel-based audio
However, DiMaria teaches:
an audio channel count and a presence or absence of object-based or channel-based audio (DiMaria, Paragraph 83: a number of channels (e.g., 2 audio channels or 5.1 audio channels) [which also indicates the presence of a channel-based audio])
Eronen, in view of Jeremic, and in further view of DiMaria, does not disclose:

However, Jeong teaches:
a presence or absence of video data (Jeong, Figure 5, Step S580: Detect video event; Paragraph 104: determined whether a video event has occurred in the video data)
Eronen and Jeremic are considered to be analogous to the claimed invention because they are in the same field of audio classification. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Eronen (directed to an audio signal classifier further comprising selected features of a bit-depth) and Jeremic (directed to a sample rate), and arrived at an audio signal classifier further comprising selected features of a bit-depth and a sample rate. One of ordinary skill in the art would have been motivated to make such a combination because artificial neural networks have found profound success in the area of pattern recognition, can be trained to discern the criteria used to classify, and can do so in a generalized manner by repeatedly showing a neural network inputs classified into groups (Jeremic, Page 1, Introduction). 
Eronen, Jeremic, and DiMaria are considered to be analogous to the claimed invention because they are in the same field of audio classification. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Eronen (directed to an audio signal classifier further comprising selected features of a bit-depth), Jeremic (directed to a sample rate), and DiMaria (directed to an audio channel count and a presence or absence of object-based or channel-based audio), and arrived at an audio signal classifier further comprising selected features of sample rate, a bit-depth, an audio channel count, and a presence or absence of object-based or channel-based audio. One of 
Eronen, Jeremic, DiMaria, and Jeong are considered to be analogous to the claimed invention because they are in the same field of audio classification. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Eronen (directed to an audio signal classifier further comprising selected features of a bit-depth), Jeremic (directed to a sample rate), DiMaria (directed to an audio channel count and a presence or absence of object-based or channel-based audio), and Jeong (directed to a presence or absence of video data) and arrived at an audio signal classifier further comprising selected features of sample rate, a bit-depth, a presence or absence of video data, an audio channel count, and a presence or absence of object-based or channel-based audio. One of ordinary skill in the art would have been motivated to make such a combination because 
detecting an event in real time in sports video data using previous received data by way of training and detected information in real time without having to use information of the entire stream when generating an online model, which may thereby improve processing speed (Jeong, Paragrapgh 121). 

Allowable Subject Matter
Claim 4 and 11 would be allowable if rewritten to include all of the limitations of the base claim and any intervening claims.  

an output decision model to determine a class of the audio signal from: (Eronen, Figure 6, audio signal 50; Figure 3, Page 17, Line 25-26: the first classifiers 33 are support vector machine (SVM) classifiers [as output decision model])
only the plurality of class values generated by the machine learning model when the robustness value indicates that the extracted metadata is valid and when the reliability value indicates that the plurality of class values generated by the machine learning model is reliable;
However, Eronen in view of Jeremic does not disclose:
only the plurality of class values generated by the deep learning model when the robustness value indicates that the extracted metadata is invalid; and
the plurality of class values generated by the machine learning model and the plurality of class values generated by the deep learning model when the robustness value indicates that the extracted metadata is valid and the reliability value indicates that the plurality of class values generated by the machine learning model is unreliable.

Regarding Claim 11, CRM claim 11 is similar to system claim 4 in scope and content, and therefore, are similarly allowed under the same rationale as applied above with respect to the functions depicted by system claim 4. 

Hence, none of the prior art of record teaches or makes obvious the combination of limitations as presently recited in claims 4 and 11. 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Guoyon et. al. (US PGPub No. US 20170140743 A1) teaches a model trainer 425 may modify the music model parameters according to any statistical, optimization, or machine learning algorithm such as gradient tree boosting, regression, logistic regression, or neural network training (Guoyon, Paragraph 65).
Zhang et. al. (US 10460231 B2) teaches training the neural network by back-propagation; one image quality attribute may include…bit depth (Zhang, Paragraph 11).
Zhang2 et.al. (US PGPub No. US 20070255755 A1) teaches video categorization may be formed as an ensemble of binary categorization problems with one SVM classifier for each category (Zhang2, Paragraph 42).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANUP CHANDORA whose telephone number is (571)272-4202.  The examiner can normally be reached on Full-time.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Dorvil Richemond can be reached on (571) 272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.



/ANUP CHANDORA/Examiner, Art Unit 2658                                                                                                                                                                                                        

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658