DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claims 1 – 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant argues that Eronen does not teach that the audio signal is classified as one audio signal class; using the one audio signal class of the audio signal to select audio presets to adjust audio output of loudspeakers when playing back the audio signal (Amendment, pages 10 – 12).

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 6, 8, 12 are rejected under 35 U.S.C. 103 as being unpatentable over Eronen et. al. (Foreign Patent Publication No. WO 2016/102737 A1) in view of Lu et al. (US PAP 2016/0078879).
Regarding Claim 1, Eronen discloses an audio signal classifier comprising:
a feature extractor (Eronen, Figure 2 and 3, Page 9, Lines 33-34: controller 202 acts as a feature extractor 30) to extract metadata from an audio signal (Eronen, Figure 6, audio signal 50; Figure 4, Steps s4.16 and s4.17, Page 29, Line 30: storing [as extracting] an indication that the tag 37 applies as part of metadata for the audio track), the metadata defining a plurality of features of the audio signal (Figure 4, Step s4.1: Extract acoustic features $14.10), the feature extractor to generate a feature vector including selected features of the audio signal (Eronen, Figure 4, Step s4.4 and s4.12: Define feature vector; Page 23, Lines 17-18: controller 202 defines a single "feature vector" for each set of selected features 31 or selected combination of features 31), the selected features including a duration of the audio signal (Eronen, Page 2, Lines 13-16: Acoustic features that may be extracted include...a duration of a musical track), and each selected feature having a feature value (Eronen, Figure 4, Step s4.4, Page 23, Line 26: calculate a respective probability [as feature value] for each feature vector); and
a machine learning model trained to classify the audio signal as one of a plurality of audio signal classes based on the feature vector (Eronen, Figure 3, Page 17, Line 25-26: the first classifiers 33 are support vector machine (SVM) [as machine learning model] classifiers), the machine learning model to generate a plurality of class values based on the feature values (Eronen, Page 24, Lines 1-2: different classifications [as class value] may be based on respective selections of features from the available features 31 extracted in step s4.2), each class value corresponding to one of the plurality of audio signal classes, the plurality of class values together indicating the class of the audio signal (Eronen, Page 25, Lines 32-34: first classifications may include probabilities p_instr that a particular instrument is included in the audio track [as class] and p_genr probabilities that the audio track belongs to a particular genre [as class]).
However, Eronen does not specifically teach that the audio signal is classified as one audio signal class; using the one audio signal class of the audio signal to select audio presets to adjust audio output of loudspeakers when playing back the audio signal.
Lu et al. disclose that the audio classifier 200 is for classifying an audio signal into at least one audio type in real time. It automatically identifies the audio types of the content on playback (paragraph 108).  Besides adjusting the equalization level, other aspects of an equalizer may be adjusted based on the confidence value(s) of the audio type(s) of an audio signal. For example, the adjusting operation 1104 may be configured to assign an equalization level and/or equalization profile and/or spectral balance preset to each audio type (paragraph 367).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adjust content output based on audio presets as taught by Lu et al. in Eronen, because that would help improve the experience of the audience (paragraph 109).

Regarding Claim 6, Eronen discloses all of Claim 1 limitations above. Furthermore, Eronen discloses an audio signal classifier further comprising the plurality of audio signal classes comprising a voice class, a music class, and a movie class. (Eronen, Page 2, Line 4-7: metadata may indicates at least one of the following characteristics of the audio data: a musical instrument included in the audio data, the presence or absence of vocals and/ or a vocalist gender [as a voice class], presence or absence of music [as movie class] and a musical genre [as music class]; Page 8, Lines 33-35: As well as audio content, the terminals 104 may be capable of streaming or downloading video content [as movie class] over the network 102 and presenting the video content using the speaker and a display).

Regarding Claim 8, CRM claim 8 is similar to system claim 1 in scope and content, and therefore, is similarly rejected under the same rationale as applied above with respect to the functions depicted by system claim 1.  Furthemore, Eronen discloses a non-transitory computer-readable storage medium comprising computer-executable instructions, executable by at least one processor to: (Eronen, Page 6, Lines 10-14: non- transitory tangible computer program product includes computer readable instructions which, when executed by a processing arrangement, causes the processing arrangement to determine whether audio data matches an audio track in a catalogue of audio tracks).

Regarding Claim 12, method claim 12 is similar to system claim 1 in scope and content, and therefore, is similarly rejected under the same rationale as applied above with respect to the functions depicted by system claim 1.

Claims 2, 3, 7, 9, 10, 13, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Eronen, in view of Lu et al. (US PAP 2016/0078879); and further in view of Jeremic et. al. (Non-Patent Literature, “Music Classification by genre using neural networks,” 6 January 2013, Page 1-10, Neuroph and SourceForge), hereinafter Jeremic.
Regarding Claim 2, Eronen in view of Lu et al further disclose an audio signal classifier further comprising: a model (Page 20, Lines 21-23: New SVM classifiers [as a model] can be added by collecting new training data and training the new classifiers. Since the SVM classifiers 33 are binary, new classifiers can be added alongside existing classifiers) trained with a plurality of modeled audio frames each representing a different sound of a plurality of sounds (Figure 5, Steps s.5.3 and s5.10: Frame blocking leads to output features [as modeled audio frames]), the model to generate a plurality of class values based on audio frames of the audio signal (Eronen, Page 24, Lines 1-2: different classifications [as class value] may be based on respective selections of features from the available features 31 extracted in step s4.2), the plurality of class values together indicating the one audio signal class of the audio signal (Eronen, Page 25, Lines 32-34: first classifications may include probabilities p_instr that a particular instrument is included in the audio track [as class] and p_genr probabilities that the audio track belongs to a particular genre [as class]);
the one audio signal class of the audio signal used to select audio presets to adjust audio output of loudspeakers when playing back the audio signal (“adjusting the equalization level, other aspects of an equalizer may be adjusted based on the confidence value(s) of the audio type(s) of an audio signal. For example, the adjusting operation 1104 may be configured to assign an equalization level and/or equalization profile and/or spectral balance preset to each audio type”; paragraphs 108, 367).
However, Eronen does not specifically teach a deep learning model
Jeremic teaches an audio signal classifier comprising: a deep learning model (Jeremic, Page 1, Introduction to the problem: training the neural network [as deep learning model] that uses music songs data set. Data set contains features [as plurality of modeled audio frames each representing a different sound of a plurality of sounds] from symbolic songs (MP3, in this case) and uses them to classify the recordings by genre [as class of the audio signal]).
Eronen and Jeremic are considered to be analogous to the claimed invention because they are in the same field of audio classification. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Eronen (directed to audio signal classifier comprising an SVM classifier model) and Jeremic (directed to a deep learning model) and arrived at to audio signal classifier comprising a deep learning model. One of ordinary skill in the art would have been motivated to make such a combination because artificial neural networks have found profound success in the area of pattern recognition, can be trained to discern the criteria used to classify, and can do so in a generalized manner by repeatedly showing a neural network inputs classified into groups (Jeremic, Page 1, Introduction).

Regarding Claim 3, Eronen in view of Jeremic discloses all of Claim 2 limitations above. Furthermore, Eronen discloses an audio signal classifier further comprising the feature extractor to generate a robustness value to indicate whether the extracted metadata is valid or invalid (Eronen, Page 32, Lines 14-18: The controller 202 then searches for tracks in the catalogue with matching metadata (step s14.8). In some embodiments, fuzzy string matching, for example using the Levenshtein distance [as robustness value to indicate whether the extracted metadata is valid or invalid], can be used to allow for minor differences in the textual metadata.);
a reliability evaluator to generate a reliability value to indicate whether the plurality of class values generated by the machine learning model is reliable or unreliable. (Eronen, Page 21, Lines 10-12: An accuracy [as reliability value] of the first classification based on the initial feature selection is determined. Further features are then added to the feature selection to determine whether or not the accuracy of the first classification is improved by their inclusion [as whether the plurality of class values generated by the machine learning model is reliable or unreliable]; Page 21, Lines 26-27: After a feature is added to the feature selection, the accuracy of the first classification is reassessed)

Regarding Claim 7, Eronen discloses all of Claim 1 limitations above, with the exception of the following the machine learning model comprising a neural network including a plurality of input neurons, each input neuron corresponding to a different one of the selected features of the feature vector; and a plurality of output neurons, each output neuron providing a class value corresponding to a different one of the plurality of audio classes.
However, Jeremic teaches an audio signal classifier further comprising the machine learning model comprising a neural network including: (Jeremic, Page 1, Introduction to the problem: training the neural network [as machine learning model comprising a neural network] that uses music songs data set.);
a plurality of input neurons, each input neuron corresponding to a different one of the selected features of the feature vector; and (Jeremic, Page 5, Step 4.5: creation of new network which will have 4 input neurons, 20 hidden and 4 output neurons; Page 5, Step 3.5: Features of music songs which we will take, would be just only those for which we believe they have a decisive influence on the classification, actually those are tempo, root mean square (RMS) amplitude, dynamic range and number of digital errors [corresponding to a different one of the selected features]);
a plurality of output neurons, each output neuron providing a class value corresponding to a different one of the plurality of audio classes. (Jeremic, Page 5, Step 4.5: creation of new network which will have 4 input neurons, 20 hidden and 4 output neurons; Page 1, Introduction to the problem: train neural network to classify this 4 type of genre [as class value] and to discover which observed features has impact on classification. Each instance has one of 4 possible classes: classic, rock, jazz or folk).
Eronen and Jeremic are considered to be analogous to the claimed invention because they are in the same field of audio classification. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Eronen (directed to audio signal classifier comprising a machine learning model) and Jeremic (directed to a model comprising a neural network) and arrived at to audio signal classifier comprising a machine learning model comprising a neural network. One of ordinary skill in the art would have been motivated to make such a combination because artificial neural networks have found profound success in the area of pattern recognition, can be trained to discern the criteria used to classify, and can do so in a generalized manner by repeatedly showing a neural network inputs classified into groups (Jeremic, Page 1, Introduction).

Regarding Claim 9 and 13, CRM claim 9 and method claim 13 are similar to system claim 2 in scope and content, and therefore, are similarly rejected under the same rationale as applied above with respect to the functions depicted by system claim 2.
Regarding Claim 10 and 14, CRM claim 10 and method claim 14 are similar to system claim 3 in scope and content, and therefore, are similarly rejected under the same rationale as applied above with respect to the functions depicted by system claim 3.

Regarding Claim 15, Eronen in view of Jeremic discloses all of Claim 14 limitations above. Furthermore, Eronen discloses a method of classifying audio signals further comprising determining a class of the audio signal from: (Eronen, Figure 6, audio signal 50; Figure 3, Page 17, Line 25-26: the first classifiers 33 are support vector machine (SVM) classifiers);
only the first plurality of class values when the robustness value indicates that the extracted metadata is valid (Eronen, Page 32, Lines 14-18: The controller 202 then searches for tracks in the catalogue with matching metadata [as valid metadata) (step 14.8). In some embodiments, fuzzy string matching, for example using the Levenshtein distance [as robustness value], can be used to allow for minor differences in the textual metadata.) and when the reliability value indicates that the first plurality of class values is reliable; (Eronen, Page 21, Lines 10-12: An accuracy [as reliability value] of the first classification based on the initial feature selection is determined. Further features are then added to the feature selection to determine whether or not the accuracy of the first classification is improved by their inclusion [as whether the plurality of class values generated by the machine learning model is reliable]; Figure 12 shows electric guitar with audio tracks above class average accuracy threshold [as indicates that the first plurality of class values is reliable];
only the second plurality of class values when the robustness value indicates that the extracted metadata is invalid; and (unmapped due to method claim being a conditional limitation requiring at least one condition being met);
the first plurality of class values and the second plurality of class values when the robustness value indicates that the extracted metadata is valid and the reliability value indicates that the first plurality of class values is unreliable. (unmapped due to method claim being a conditional limitation requiring at least one condition being met).

6.	Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Eronen, in view of Jeremic, in further view of Dimaria et. al. (US PGPub No. US 20160196270 A1), hereinafter Dimaria, and in further view of Jeong (US PGPub No. 20070294716 A1), hereinafter Jeong.
Regarding Claim 5, Eronen discloses all of Claim 1 limitations above. Furthermore, Eronen discloses an audio signal classifier further comprising the feature vector, in addition to the duration of audio signal, including the selected features of a bit-depth (Eronen, Page 2, Lines 13-16: Acoustic features that may be extracted include...a duration of a musical track; Page 10, Line 24, resolution of 16 bits [as bit depth])
Eronen does not disclose a sample rate, a presence or absence of video data, an audio channel count, and a presence or absence of object-based or channel-based audio.
However, Jeremic teaches a sample rate (Jeremic, Page 1, Introduction to the problem: attributes are sampling rate).
Eronen, in view of Jeremic, does not disclose a presence or absence of video data, an audio channel count, and a presence or absence of object-based or channel-based audio.
However, DiMaria teaches an audio channel count and a presence or absence of object-based or channel-based audio (DiMaria, Paragraph 83: a number of channels (e.g., 2 audio channels or 5.1 audio channels) [which also indicates the presence of a channel-based audio]).
Eronen, in view of Jeremic, and in further view of DiMaria, does not disclose a presence or absence of video data
However, Jeong teaches a presence or absence of video data (Jeong, Figure 5, Step $580: Detect video event; Paragraph 104: determined whether a video event has occurred in the video data).
Eronen and Jeremic are considered to be analogous to the claimed invention because they are in the same field of audio classification. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Eronen (directed to an audio signal classifier further comprising selected features of a bit- depth) and Jeremic (directed to a sample rate), and arrived at an audio signal classifier further comprising selected features of a bit-depth and a sample rate. One of ordinary skill in the art would have been motivated to make such a combination because artificial neural networks have found profound success in the area of pattern recognition, can be trained to discern the criteria used to classify, and can do so in a generalized manner by repeatedly showing a neural network inputs classified into groups (Jeremic, Page 1, Introduction).
Eronen, Jeremic, and DiMaria are considered to be analogous to the claimed invention because they are in the same field of audio classification. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Eronen (directed to an audio signal classifier further comprising selected features of a bit-depth), Jeremic (directed to a sample rate), and DiMaria (directed to an audio channel count and a presence or absence of object-based or channel-based audio), and arrived at an audio signal classifier further comprising selected features of sample rate, a bit-depth, an audio channel count, and a presence or absence of object-based or channel-based audio. One of ordinary skill in the art would have been motivated to make such a combination because methodologies described herein may obviate a need for certain efforts or resources that otherwise would be involved in providing media services and providing enhanced media experiences to users (DiMaria, Paragraph 88).
Eronen, Jeremic, DiMaria, and Jeong are considered to be analogous to the claimed invention because they are in the same field of audio classification. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Eronen (directed to an audio signal classifier further comprising selected features of a bit-depth), Jeremic (directed to a sample rate), DiMaria (directed to an audio channel count and a presence or absence of object-based or channel-based audio), and Jeong (directed to a presence or absence of video data) and arrived at an audio signal classifier further comprising selected features of sample rate, a bit-depth, a presence or absence of video data, an audio channel count, and a presence or absence of object-based or channel-based audio. One of ordinary skill in the art would have been motivated to make such a combination because
detecting an event in real time in sports video data using previous received data by way of training and detected information in real time without having to use information of the entire stream when generating an online model, which may thereby improve processing speed (Jeong, Paragrapgh 121).

Allowable Subject Matter
7.	Claims 4, 11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter:  
As to claims 4, 11, the prior art made of record does not teach or suggest an output decision model to determine one audio signal class of the audio signal from only the plurality of class values generated by the machine learning model when the robustness value indicates that the extracted metadata is valid and when the reliability value indicates that the plurality of class values generated by the machine learning model is reliable; only the plurality of class values generated by the deep learning model when the robustness value indicates that the extracted metadata is invalid; and the plurality of class values generated by the machine learning model and the plurality of class values generated by the deep learning model when the robustness value indicates that the extracted metadata is valid and the reliability value indicates that the plurality of class values generated by the machine learning model is unreliable.

Conclusion
8.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

9.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD SAINT-CYR whose telephone number is (571)272-4247. The examiner can normally be reached Monday- Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LEONARD SAINT-CYR/           Primary Examiner, Art Unit 2658