DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This Office Action is in response to correspondence filed 18 June 2020 in reference to 16/905,810.  Claims 1-20 are pending and have been examined.

Claim Objections
Claim 18 objected to because of the following informalities:  Line 3 should read “computing a second feature value…” instead of “computing a first feature value…”.  Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 2, 4, 5, 7-11, 15, 17, 19, and 20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Bryan (US PAP 2021/0125629).

Consider claim 1, Bryan teaches A computer-implemented method for estimating perceived audio quality (abstract, figure 7), the method comprising: 
computing a first set of feature values for a set of audio features based on a first audio clip (0112, 0125-26, extracting features, such as mel-spectrogram and VAD); 
generating a first plurality of predicted labels via a trained multitask learning model based on the first set of feature values, wherein the first plurality of predicted labels specifies metric values for a plurality of metrics that are relevant to audio quality (0128-29, 0057-60, using deep learning models to determine audio metrics in different categories); and 
computing a first audio quality score for the first audio clip based on the first plurality of predicted labels (0062, overall score may be generated using different quality metrics).

Consider claim 2, Bryan teaches the computer-implemented method of claim 1, wherein computing the first audio quality score comprises: 
computing an aggregate score based on the first plurality of predicted labels (0060-63, determining overall acoustic quality score); and 
performing one or more scaling operations on the aggregate score to generate the first audio quality score (0060-63, normalizing scores to scale from 0-100).

Consider claim 4, Bryan teaches the computer-implemented method of claim 1, wherein the first audio clip is derived from a reference audio clip using an audio algorithm (0064, audio recording under test, 0065, retesting), and wherein the first audio 

Consider claim 5, Bryan teaches the computer-implemented method of claim 1, wherein generating the first plurality of predicted labels comprises inputting the first set of feature values into the trained multitask learning model that, in response, outputs the first plurality of predicted labels (0128-29, 0057-60, using deep learning models to determine audio metrics in different categories, 0104, training).

Consider claim 7, Bryan teaches the computer-implemented method of claim 1, wherein a first audio feature included in the set of audio features is associated with a first psycho-acoustic principle but not a second psycho-acoustic principle (0112, Mel frequency based on perceptual frequency response of human hearing, but not the presence of voice), and a second audio feature included in the set of audio features is associated with the second psycho-acoustic principle but not the first psycho-acoustic principle (0113-14, VAD features, based on perception of speech but not the perceptual frequency response of human hearing).

Consider claim 8, Bryan teaches the computer-implemented method of claim 1, wherein computing the first set of feature values for the set of audio features comprises: 

computing the first set of feature values for the set of audio features based on the second set of feature values for the set of source features and at least one scaling parameter associated with the trained multitask learning model (0112, mel scaling FFT to get mel spectrum).

Consider claim 9, Bryan teaches the computer-implemented method of claim 1, wherein the first audio clip includes at least one of dialogue, a sound effect, and background music (0037, speech and non-speech sounds).

Consider claim 10, Bryan teaches the computer-implemented method of claim 1, further comprising: 
computing a second set of feature values for the set of audio features based on a second audio clip (0112, 0125-26, extracting features, such as mel-spectrogram and VAD, 0064-65, restesting a new recording for example); 
generating a second plurality of predicted labels via the trained multitask learning model based on the second set of feature values (0128-29, 0057-60, using deep learning models to determine audio metrics in different categories); and 
computing a second audio quality score for the second audio clip based on the second plurality of predicted labels (0062, overall score may be generated using different quality metrics).

Consider claim 11, Bryan teaches One or more non-transitory computer readable media (0187, non-transitory media)including  instructions that, when executed by one or more processors (0185), cause the one or more processors to estimate perceived audio quality (abstract) by performing the steps of: 
computing a first set of feature values for a set of audio features based on a first audio clip (0112, 0125-26, extracting features, such as mel-spectrogram and VAD); 
generating a first plurality of predicted labels via a trained multitask learning model based on the first set of feature values, wherein the first plurality of predicted labels specifies metric values for a plurality of metrics that are relevant to audio quality (0128-29, 0057-60, using deep learning models to determine audio metrics in different categories); and 
computing a first audio quality score for the first audio clip based on the first plurality of predicted labels (0062, overall score may be generated using different quality metrics).

Consider claim 15, Bryan teaches the one or more non-transitory computer readable media of claim 11, wherein causing the trained multitask learning model to generate the first plurality of predicted labels comprises inputting the first set of feature values into the trained multitask learning model (0128-29, 0057-60, using deep learning models to determine audio metrics in different categories, 0104, training).

Claim 17 contains similar limitations as claim 7 and is therefore rejected for the same reasons.

Claim 19 contains similar limitations as claim 9 and is therefore rejected for the same reasons.

Consider claim 20, Bryan teaches A system (abstract) comprising: 
one or more memories storing instructions (176, RAM); and 
one or more processors coupled to the one or more memories that (0176 processesors), when executing the instructions, perform the steps of:: 
computing a first set of feature values for a set of audio features based on a first audio clip (0112, 0125-26, extracting features, such as mel-spectrogram and VAD); 
generating a first plurality of predicted labels via a trained multitask learning model based on the first set of feature values, wherein the first plurality of predicted labels specifies metric values for a plurality of metrics that are relevant to audio quality (0128-29, 0057-60, using deep learning models to determine audio metrics in different categories); and 
computing a first audio quality score for the first audio clip based on the first plurality of predicted labels (0062, overall score may be generated using different quality metrics).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 3 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bryan in view of Olson (Geometric Mean Technique).

Consider claim 3, Bryan teaches the computer-implemented method of claim 1, but does not specifically teach wherein computing the first audio quality score comprises: 
computing a geometric mean based on the first plurality of predicted labels; and 
scaling the geometric mean to an Absolute Category Rating scale.
In the same filed of score aggregation Olson teaches wherein computing the first audio quality score comprises: 
computing a geometric mean based on the first plurality of predicted labels (pp 69-71, geometric mean computing combining multiple labels of scores); and 
scaling the geometric mean to an Absolute Category Rating scale (pp 71, additively normalizing the values).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use a geometric mean as taught by Olson in the system of Bryan in 

Consider claim 13, Bryan teaches The one or more non-transitory computer readable media of claim 11, but does not specifically teach wherein computing the first audio quality score comprises computing a weighted arithmetic mean based on the first plurality of predicted labels.
In the same filed of score aggregation Olson teaches wherein computing the first audio quality score comprises computing a weighted arithmetic mean based on the first plurality of predicted labels (pp 69-71, geometric mean computing combining multiple labels of scores, pp 71, additively normalizing the values). 
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use a geometric mean as taught by Olson in the system of Bryan in order to allow comparison of scores that may be made of up multiple sub scores. (Olson pp 69).

Claim 6 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bryan in view of Dong et al. (AN Attention Enhanced Multi-Task Model for Objective Speech Assessment in Rea-Word Environments).

Consider claim 6, Bryan teaches The computer-implemented method of claim 1, but does not specifically teach wherein a first predicted label included in the first plurality of predicted labels comprises an estimated value of a scaled Hearing-Aid Audio Quality 
In the same field of speech quality estimation, Dong teaches wherein a first predicted label included in the first plurality of predicted labels comprises an estimated value of a scaled Hearing-Aid Audio Quality Index, a scaled Perceptual Evaluation of Audio Quality, a scaled Perception Model Quality, or a scaled Virtual Speech Quality Objective Listener Audio metric (Introduction, determining Perceptual Evaluation of Speech Quality using neural network).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to use PESQ as taught by Dong as one of the quality measures in Bryan in order to provide a well-known standard measure of audio quality. (Dong Abstract).

Claim 16 contains similar limitations as claim 6 and is therefore rejected for the same reasons.

Claim 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bryan in view of Wilkinghoff et al. (Robust Speaker Identification by Fusing Classification Scores with a Neural Network).

Consider claim 12, Bryan teaches the one or more non-transitory computer readable media of claim 11, but does not specifically teach wherein computing the first audio quality score comprises inputting one or more predicted labels included in the first 
In the same field of using neural networks for classification, Wilkinghoff teaches inputting one or more predicted labels included in the first plurality of predicted labels into a machine learning model that, in response, generates the first audio quality score (Figure 2, Section 3, section 3.3, fusing classification scores using a neural network to generate a final score).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use neural network to generate the final score as taught by Wilkinghoff in the system of Bryan in order to increase the accuracy of the final scoring (Wilkinghoff abstract).

Claim 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bryan in view of Sharma et al. (Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks).

Consider claim 14, Bryan teaches the one or more non-transitory computer readable media of claim 11, but does not specifically teach wherein the first audio clip includes at least one artifact associated with an audio algorithm, and wherein the first audio quality score indicates a quality versus processing efficiency tradeoff associated with the audio algorithm.
In the same field of audio evaluation, Sharma teaches wherein the first audio clip includes at least one artifact associated with an audio algorithm, and wherein the first 
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to test the effects of coding as taught by Sharma in the system of Bryan in order to provide an accurate evaluation of if the audio quality output by a coding algorithm is acceptable to human perception (Sharma Abstract).

Claim 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bryan in view of Zhang et al. (Unsupervised Learning in Cross-Corpus Acoustic Emotion Recognition).

Consider claim 18, Bryan teaches the one or more non-transitory computer readable media of claim 11, wherein computing the first set of feature values for the set of audio features comprises: 
computing a second set of feature values for a set of source features based on the first audio clip and a first reference clip (0112, FFT, 0063-64, retesting original and new audio clip); and 
computing the first set of feature values for the set of audio features based on the second set of feature values for the set of source features and at least one scaling parameter associated with the trained multitask learning model (0112, mel scaling FFT to get mel spectrum).

	IN the same field of applying acoustic features to machine learning models, Zhang teaches performing at least one min-max scaling operation on the first feature value based on at least one scaling parameter associated with the trained multitask learning model (Section IV  A, min-max normalization).
	Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use min-max scaling as taught by Zhang in the system of Bryan to mitigate the effects of different speakers Section IV  A).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Chae (US PAP 2021/0217403) also combines audio quality metrics into an overall score.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DOUGLAS C GODBOLD whose telephone number is (571)270-1451. The examiner can normally be reached 6:30am-5pm Monday-Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DOUGLAS GODBOLD
Examiner
Art Unit 2655



/DOUGLAS GODBOLD/Primary Examiner, Art Unit 2655