DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This Office Action is in response to correspondence filed 01 July 2022 in reference to application 16/905,810.  Claims 1-20 are pending and have been examined.

Response to Amendment
The amendment filed 01 January 2022 has been accepted and considered in this office action.  Claims 1, 3, 11, 13, 18 and 20 have been amended.

Response to Arguments
Applicant's arguments filed Remarks have been fully considered but they are not persuasive.

Applicant argues, see Remarks pages 8-9, that the prior art fails to teach the limitations of claim 1.  The examiner respectfully disagrees.   Applicants argues that because Bryan can use several models to predict quality metrics, it cannot read on the trained learning model as claimed.  However, nothing in the claims limits the trained learning model to be a single neural network model for example.  Nothing in the claim language prevents the claimed learning model to be comprised of multiple models as taught by Bryan.  Additionally, Bryan seems to contemplate using a single model as well at 0095 where Bryan refers to 1 or more models.  Applicant argues that the prior art does not teach the geometric mean now claimed.  However Examiner relied upon Olson to teach this limitation in the previous rejection.  Applicant has given no indication of why Olson does not teach the limitation.    For these reasons, Examiner believes the combination of Bryan and Olson to teach the limitations as laid out below. 

Claim Rejections - 35 USC § 102
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Claim(s) 20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Bryan (US PAP 2021/0125629).

Consider claim 20, Bryan teaches A system (abstract) comprising: 
one or more memories storing instructions (176, RAM); and 
one or more processors coupled to the one or more memories that (0176 processesors), when executing the instructions, perform the steps of:: 
computing a first set of feature values for a set of audio features based on a first audio clip (0112, 0125-26, extracting features, such as mel-spectrogram and VAD); 
generating a first plurality of predicted labels via a trained multitask learning model based on the first set of feature values, wherein the first plurality of predicted labels specifies metric values for a plurality of metrics that are relevant to audio quality (0128-29, 0057-60, using deep learning models to determine audio metrics in different categories); and 
computing a first audio quality score for the first audio clip based on the first plurality of predicted labels (0062, overall score may be generated using different quality metrics).

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Claim(s) 1- 5, 7-11, 15, 17, and 19,  is/are rejected under 35 U.S.C. 103 as being unpatentable over Bryan (US PAP 2021/0125629) in view of Olson (Geometric Mean Technique).

Consider claim 1, Bryan teaches A computer-implemented method for estimating perceived audio quality (abstract, figure 7), the method comprising: 
computing a first set of feature values for a set of audio features based on a first audio clip (0112, 0125-26, extracting features, such as mel-spectrogram and VAD); 
generating a first plurality of predicted labels via a trained multitask learning model based on the first set of feature values, wherein the first plurality of predicted labels specifies metric values for a plurality of metrics that are relevant to audio quality (0128-29, 0057-60, using deep learning models to determine audio metrics in different categories); and 
computing a first audio quality score for the first audio clip based on the first plurality of predicted labels (0062, overall score may be generated using different quality metrics).
Bryan does not specifically teach computing the first audio quality score based on a geometric mean based on the first plurality of predicted labels.
In the same filed of score aggregation Olson teaches computing the first audio quality score based on a geometric mean based on the first plurality of predicted labels (pp 69-71, geometric mean computing combining multiple labels of scores).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use a geometric mean as taught by Olson in the system of Bryan in order to allow comparison of scores that may be made of up multiple sub scores. (Olson pp 69).

Consider claim 2, Bryan teaches the computer-implemented method of claim 1, wherein computing the first audio quality score comprises: 
computing an aggregate score based on the first plurality of predicted labels (0060-63, determining overall acoustic quality score); and 
performing one or more scaling operations on the aggregate score to generate the first audio quality score (0060-63, normalizing scores to scale from 0-100).

Consider claim 3, Bryan and Olson teaches the computer-implemented method of claim 1, wherein computing the first audio quality score comprises: 
computing the geometric mean of the first plurality of predicted labels (Olson pp 69-71, geometric mean computing combining multiple labels of scores); and 
scaling the geometric mean to an Absolute Category Rating scale (Olson pp 71, additively normalizing the values).

Consider claim 4, Bryan teaches the computer-implemented method of claim 1, wherein the first audio clip is derived from a reference audio clip using an audio algorithm (0064, audio recording under test, 0065, retesting), and wherein the first audio quality score indicates a perceptual impact associated with the audio algorithm (0062-64, scores of audio recording, 0095, some measures may be perceptual such as perceived loudness).

Consider claim 5, Bryan teaches the computer-implemented method of claim 1, wherein generating the first plurality of predicted labels comprises inputting the first set of feature values into the trained multitask learning model that, in response, outputs the first plurality of predicted labels (0128-29, 0057-60, using deep learning models to determine audio metrics in different categories, 0104, training).

Consider claim 7, Bryan teaches the computer-implemented method of claim 1, wherein a first audio feature included in the set of audio features is associated with a first psycho-acoustic principle but not a second psycho-acoustic principle (0112, Mel frequency based on perceptual frequency response of human hearing, but not the presence of voice), and a second audio feature included in the set of audio features is associated with the second psycho-acoustic principle but not the first psycho-acoustic principle (0113-14, VAD features, based on perception of speech but not the perceptual frequency response of human hearing).

Consider claim 8, Bryan teaches the computer-implemented method of claim 1, wherein computing the first set of feature values for the set of audio features comprises: 
computing a second set of feature values for a set of source features based on the first audio clip and a first reference clip (0112, FFT, 0063-64, retesting original and new audio clip); and 
computing the first set of feature values for the set of audio features based on the second set of feature values for the set of source features and at least one scaling parameter associated with the trained multitask learning model (0112, mel scaling FFT to get mel spectrum).

Consider claim 9, Bryan teaches the computer-implemented method of claim 1, wherein the first audio clip includes at least one of dialogue, a sound effect, and background music (0037, speech and non-speech sounds).

Consider claim 10, Bryan teaches the computer-implemented method of claim 1, further comprising: 
computing a second set of feature values for the set of audio features based on a second audio clip (0112, 0125-26, extracting features, such as mel-spectrogram and VAD, 0064-65, restesting a new recording for example); 
generating a second plurality of predicted labels via the trained multitask learning model based on the second set of feature values (0128-29, 0057-60, using deep learning models to determine audio metrics in different categories); and 
computing a second audio quality score for the second audio clip based on the second plurality of predicted labels (0062, overall score may be generated using different quality metrics).

Consider claim 11, Bryan teaches One or more non-transitory computer readable media (0187, non-transitory media)including  instructions that, when executed by one or more processors (0185), cause the one or more processors to estimate perceived audio quality (abstract) by performing the steps of: 
computing a first set of feature values for a set of audio features based on a first audio clip (0112, 0125-26, extracting features, such as mel-spectrogram and VAD); 
generating a first plurality of predicted labels via a trained multitask learning model based on the first set of feature values, wherein the first plurality of predicted labels specifies metric values for a plurality of metrics that are relevant to audio quality (0128-29, 0057-60, using deep learning models to determine audio metrics in different categories); and 
computing a first audio quality score for the first audio clip based on the first plurality of predicted labels (0062, overall score may be generated using different quality metrics).
Bryan does not specifically teach computing the first audio quality score based on a geometric mean based on the first plurality of predicted labels.
In the same filed of score aggregation Olson teaches computing the first audio quality score based on a geometric mean based on the first plurality of predicted labels (pp 69-71, geometric mean computing combining multiple labels of scores).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use a geometric mean as taught by Olson in the system of Bryan in order to allow comparison of scores that may be made of up multiple sub scores. (Olson pp 69).

Consider claim 15, Bryan teaches the one or more non-transitory computer readable media of claim 11, wherein causing the trained multitask learning model to generate the first plurality of predicted labels comprises inputting the first set of feature values into the trained multitask learning model (0128-29, 0057-60, using deep learning models to determine audio metrics in different categories, 0104, training).

Claim 17 contains similar limitations as claim 7 and is therefore rejected for the same reasons.

Claim 19 contains similar limitations as claim 9 and is therefore rejected for the same reasons.

Claim 6 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bryan in view of Olson as applied to claims 1 and 11 above, and further in view of Dong et al. (AN Attention Enhanced Multi-Task Model for Objective Speech Assessment in Rea-Word Environments).

Consider claim 6, Bryan and Olson teach the computer-implemented method of claim 1, but do not specifically teach wherein a first predicted label included in the first plurality of predicted labels comprises an estimated value of a scaled Hearing-Aid Audio Quality Index, a scaled Perceptual Evaluation of Audio Quality, a scaled Perception Model Quality, or a scaled Virtual Speech Quality Objective Listener Audio metric.
In the same field of speech quality estimation, Dong teaches wherein a first predicted label included in the first plurality of predicted labels comprises an estimated value of a scaled Hearing-Aid Audio Quality Index, a scaled Perceptual Evaluation of Audio Quality, a scaled Perception Model Quality, or a scaled Virtual Speech Quality Objective Listener Audio metric (Introduction, determining Perceptual Evaluation of Speech Quality using neural network).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to use PESQ as taught by Dong as one of the quality measures in Bryan and Olson in order to provide a well-known standard measure of audio quality. (Dong Abstract).

Claim 16 contains similar limitations as claim 6 and is therefore rejected for the same reasons.

Claim 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bryan in view of Olson as applied to claim 11 above, and further in view of Wilkinghoff et al. (Robust Speaker Identification by Fusing Classification Scores with a Neural Network).

Consider claim 12, Bryan and Olson teach the one or more non-transitory computer readable media of claim 11, but do not specifically teach wherein computing the first audio quality score comprises inputting one or more predicted labels included in the first plurality of predicted labels into a machine learning model that, in response, generates the first audio quality score.
In the same field of using neural networks for classification, Wilkinghoff teaches inputting one or more predicted labels included in the first plurality of predicted labels into a machine learning model that, in response, generates the first audio quality score (Figure 2, Section 3, section 3.3, fusing classification scores using a neural network to generate a final score).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use neural network to generate the final score as taught by Wilkinghoff in the system of Bryanand Olson  in order to increase the accuracy of the final scoring (Wilkinghoff abstract).

Claim 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bryan in view of Olson as applied to claim 11 above, and further in view of Sharma et al. (Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks).

Consider claim 14, Bryan  and Olson teach the one or more non-transitory computer readable media of claim 11, but do not specifically teach wherein the first audio clip includes at least one artifact associated with an audio algorithm, and wherein the first audio quality score indicates a quality versus processing efficiency tradeoff associated with the audio algorithm.
In the same field of audio evaluation, Sharma teaches wherein the first audio clip includes at least one artifact associated with an audio algorithm, and wherein the first audio quality score indicates a quality versus processing efficiency tradeoff associated with the audio algorithm (Introduction, estimating quality of audio that has been subjected to audio coding algorithms).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to test the effects of coding as taught by Sharma in the system of Bryan and Olson in order to provide an accurate evaluation of if the audio quality output by a coding algorithm is acceptable to human perception (Sharma Abstract).

Claim 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bryan in view of Olson as applied to claim 11 above, and further in view of Zhang et al. (Unsupervised Learning in Cross-Corpus Acoustic Emotion Recognition).

Consider claim 18, Bryan teaches the one or more non-transitory computer readable media of claim 11, wherein computing the first set of feature values for the set of audio features comprises: 
computing a second set of feature values for a set of source features based on the first audio clip and a first reference clip (0112, FFT, 0063-64, retesting original and new audio clip); and 
computing the first set of feature values for the set of audio features based on the second set of feature values for the set of source features and at least one scaling parameter associated with the trained multitask learning model (0112, mel scaling FFT to get mel spectrum).
	Bryan and Olson do not specifically teach performing at least one min-max scaling operation on the first feature value based on at least one scaling parameter associated with the trained multitask learning model.
	IN the same field of applying acoustic features to machine learning models, Zhang teaches performing at least one min-max scaling operation on the first feature value based on at least one scaling parameter associated with the trained multitask learning model (Section IV  A, min-max normalization).
	Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use min-max scaling as taught by Zhang in the system of Bryan and Olson to mitigate the effects of different speakers Section IV  A).

Allowable Subject Matter
Claim 13 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  The following is a statement of reasons for the indication of allowable subject matter:  
Consider claim 13, Bryan and Olson teach he one or more non-transitory computer readable media of claim 11. However the prior art of record does not teach or fiiry suggest the limitations of  “wherein computing the first audio quality score comprises determining a plurality of weights associated with the first plurality of predicted labels based on one or more types of audio content included in the first audio clip” when combined with each and every other limitation of the claim.  Therefore claim 13 contains allowable subject matter.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 



Any inquiry concerning this communication or earlier communications from the examiner should be directed to DOUGLAS C GODBOLD whose telephone number is (571)270-1451. The examiner can normally be reached 6:30am-5pm Monday-Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DOUGLAS GODBOLD
Examiner
Art Unit 2655



/DOUGLAS GODBOLD/Primary Examiner, Art Unit 2655