DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The disclosure is objected to because of the following informalities:
In paragraph 0019, line 6, “these types perceptual quality models” should read “these types of perceptual quality models”.
In paragraph 0032, lines 7-8, “audio distortions are artifacts” should read “audio distortions and artifacts”.
In paragraph 0045, line 8, “For explanatory purposes only.” should read “For explanatory purposes only,”.
In paragraph 0104, line 3, “training engine computes sets” should read “training engine sets”.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 8, 9, 10, and 19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and 
Regarding claim 8, claim 8 depends from claim 1, and the claim 8 limitation “wherein training the multitask learning model comprises” is indefinite because it is not clear how the limitation further limits the subject matter of claim 1.  Claim 1 includes the term “training a multitask learning model” in the preamble, “a computer-implemented method for training a multitask learning model to assess perceived audio quality, the method comprising”, and includes the term “training the multitask learning model” in the claim element “and training the multitask learning model based on the first set of feature values and the first plurality of pseudo labels to generate a trained multitask learning model”.  Therefore, it is not clear if the claim 8 limitation “wherein training the multitask learning model comprises” further limits or replaces the training steps of claim 1, or if the limitation modifies the training claim element.  Using the term “further comprises” instead of “comprises” would make it clear that the claim element is further limiting subject matter of claim 1.  For examination purposes, the limitation will be interpreted to further limit the claim 1 element “and training the multitask learning model based on the first set of feature values and the first plurality of pseudo labels to generate a trained multitask learning model”.
Regarding claim 9, claim 9 is rejected as it depends from claim 8, and thus recites the limitations of claim 8, and does not resolve the indefinite language from claim 8.
Regarding claim 10, claim 10 depends from claim 1, and the claim 10 limitation “wherein training the multitask learning model comprises” is indefinite because it is not 
Regarding claim 19, claim 19 depends from claim 11, and the claim 19 limitation “wherein training the multitask learning model comprises” is indefinite because it is not clear how the limitation further limits the subject matter of claim 11.  Claim 11 includes the term “train a multitask learning model” in the preamble, “one or more non-transitory computer readable media including instructions that, when executed by one or more processors, cause the one or more processors to train a multitask learning model to assess perceived audio quality by performing the steps of”, and includes the term “training the multitask learning model” in the claim element “and training the multitask learning model based on the first set of feature values and the first plurality of pseudo 
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 8 – 11, 14, and 19 – 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Sharma et al. (“Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks”), hereinafter Sharma.
Regarding claim 1, Sharma teaches a computer-implemented method for training a multitask learning model to assess perceived audio quality (Section I, lines 51-54, "This paper proposes training a recurrent neural network (RNN) on a large training 
computing a first plurality of pseudo labels based on a first audio clip and a plurality of models, wherein the first plurality of pseudo labels specifies metric values for a plurality of metrics that are relevant to audio quality (Section IID, lines 4-7, "The context length is a parameter that can be varied from 10 ms up to the maximum length of an utterance in the data and represents the context available as input to the multi-task RNN."; Section IID, line 17-19, "One of the output nodes estimates the POLQA score and the second node estimates the Mean Voice activity Posterior (MVP)."; The audio clip reads on the utterance in the data and the pseudo labels specifying metric values read on the POLQA and MVP models providing POLQA and MVP scores.);
computing a first set of feature values for a set of audio features based on the first audio clip (Section IID, line 1, "The 24 MFCC and 24 MDCC features");
and training the multitask learning model based on the first set of feature values and the first plurality of pseudo labels to generate a trained multitask learning model (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; Section IID, lines 32-35, "The multi-task RNN is trained for 10 epochs on the training data using a mini-batch size of 1000 observations with an POLQA+RMSEMVP)."; Training the multitask learning model reads on training the recursive neural network, the feature values read on the MFCC and MDCC features, and the pseudo labels read on the POLQA and MVP scores),
wherein the trained multitask learning model maps different sets of feature values for the set of audio features to different pluralities of predicted labels (Section V, lines 3-10, "The proposed method uses narrowband MFCC features in combination with features extracted from a novel compressed representation of the modulation spectrum, that help reduce the RMSE of the system by 31.0% relative to the MFCC only system.  The use of an RNN topology with these features allows the proposed method to reliably estimate POLQA and VAD with 300 ms of context."; The mapping of feature values to predicted labels reads on the MFCC features allowing the RNN topology to reliably estimate POLQA and VAD values.),
and wherein each plurality of predicted labels specifies estimated metric values for the plurality of metrics (Section IV, lines 20-22, "Table II presents the detailed results for POLQA and MVP estimation for a 30 frame context for the different test sets using the full feature set."; The predicted labels read on the POLQA and MVP estimations.).
Regarding claim 8, as best understood based on the 35 U.S.C. 112(b) issues identified above, Sharma teaches the computer-implemented method as claimed in claim 1, wherein training the multitask learning model comprises:
inputting the first set of feature values into the multitask learning model that, in response, outputs a first plurality of predicted labels (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard 
computing a loss based on the first plurality of predicted labels and the first plurality of pseudo labels (Section IIIC, lines 1-3, "In the following, Pe and Pt are the estimated and true POLQA scores, respectively and the error in estimating a sample is defined as E(n) = P(n)e - P(n)t."; The predicted labels read on the estimated POLQA scores, the pseudo labels read on the true POLQA scores, and the loss reads on the error in estimating a sample.);
and performing one or more optimization operations on the multitask learning model based on the loss (Section IID, lines 32-35, "The multi-task RNN is trained for 10 epochs on the training data using a mini-batch size of 1000 observations with an Root Mean Square (RMSE) error metric (RMSEPOLQA+RMSEMVP)."; Performing optimization operations on the multitask learning model based on the loss reads on training the recursive neural network using the error metric.).
Regarding claim 9, as best understood based on the 35 U.S.C. 112(b) issues identified above, Sharma teaches the computer-implemented method as claimed in claim 1, wherein training the multitask learning model comprises computing a mean squared error between the first plurality of predicted labels and the first plurality of pseudo labels (Section IID, lines 32-35, "The multi-task RNN is trained for 10 epochs on the training data using a mini-batch size of 1000 observations with an Root Mean Square (RMSE) error metric (RMSEPOLQA+RMSEMVP)."; The mean squared error POLQA and the RMSEMVP.).
Regarding claim 10, as best understood based on the 35 U.S.C. 112(b) issues identified above, Sharma teaches the computer-implemented method as claimed in claim 1, wherein training the multitask learning model comprises executing a multitask learning algorithm on the multitask learning model based on the first set of feature values, the first plurality of pseudo labels, a second set of feature values associated with a second audio clip, and a second plurality of feature values associated with the second audio clip (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; Section IIIA, lines 1-4, "The training data for the proposed and baseline methods is based on the clean 100 hrs training partition of the Librispeech corpus, which is derived from audiobooks read by a large number of speakers."; The feature values read on the MFCC and MDCC features, the predicted labels read on the VAD and POLQSA estimations, and the second audio clip reads on the 100 hours training partition.).
Regarding claim 11, Sharma teaches one or more non-transitory computer readable media including instructions that, when executed by one or more processors, cause the one or more processors to train a multitask learning model to assess perceived audio quality (Section I, lines 51-54, "This paper proposes training a recurrent neural network (RNN) on a large training dataset to estimate the POLQA score non–intrusively, on a short-time basis, using speech features."; Section I, lines 59-61, "Another novel contribution of this work is the joint estimation of voice activity and 
causing a plurality of models to generate a plurality of output values based on a first audio clip and computing a plurality of pseudo labels based on the plurality of output values, wherein the plurality of pseudo labels specifies metric values for a plurality of metrics that are relevant to audio quality (Section IID, lines 4-7, "The context length is a parameter that can be varied from 10 ms up to the maximum length of an utterance in the data and represents the context available as input to the multi-task RNN."; Section IID, line 17-19, "One of the output nodes estimates the POLQA score and the second node estimates the Mean Voice activity Posterior (MVP)."; The audio clip reads on the utterance in the data and the pseudo labels specifying metric values read on the POLQA and MVP models providing POLQA and MVP scores.);
computing a first set of feature values for a set of audio features based on the first audio clip (Section IID, line 1, "The 24 MFCC and 24 MDCC features");
and training the multitask learning model based on the first set of feature values and the plurality of pseudo labels to generate a trained multitask learning model (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; Section IID, lines 32-35, "The multi-task RNN is trained for 10 epochs on the training data using a mini-batch size of 1000 observations with an Root Mean Square (RMSE) error metric (RMSEPOLQA+RMSEMVP)."; Training the 
wherein the trained multitask learning model maps different sets of feature values for the set of audio features to different pluralities of predicted labels (Section V, lines 3-10, "The proposed method uses narrowband MFCC features in combination with features extracted from a novel compressed representation of the modulation spectrum, that help reduce the RMSE of the system by 31.0% relative to the MFCC only system.  The use of an RNN topology with these features allows the proposed method to reliably estimate POLQA and VAD with 300 ms of context."; The mapping of feature values to predicted labels reads on the MFCC features allowing the RNN topology to reliably estimate POLQA and VAD values.),
and wherein each plurality of predicted labels specifies estimated metric values for the plurality of metrics (Section IV, lines 20-22, "Table II presents the detailed results for POLQA and MVP estimation for a 30 frame context for the different test sets using the full feature set."; The predicted labels read on the POLQA and MVP estimations.).
Regarding claim 14, Sharma teaches the one or more non-transitory computer readable media as claimed in claim 11, wherein the multitask learning model comprises at least one of a neural network, a decision tree, or a random forest (Section I, lines 51-54, "This paper proposes training a recurrent neural network (RNN) on a large training dataset to estimate the POLQA score non–intrusively, on a short-time basis, using speech features."; Section I, lines 59-61, "Another novel contribution of this work is the joint estimation of voice activity and POLQA using a multi-task RNN.").

inputting the first set of feature values into the multitask learning model that, in response, outputs a first plurality of predicted labels (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; The feature values read on the MFCC and MDCC features and the predicted labels read on the VAD and POLQA estimations.);
computing a loss based on the first plurality of predicted labels and the plurality of pseudo labels (Section IIIC, lines 1-3, "In the following, Pe and Pt are the estimated and true POLQA scores, respectively and the error in estimating a sample is defined as E(n) = P(n)e - P(n)t."; The predicted labels read on the estimated POLQA scores, the pseudo labels read on the true POLQA scores, and the loss reads on the error in estimating a sample.);
and performing one or more optimization operations on the multitask learning model based on the loss (Section IID, lines 32-35, "The multi-task RNN is trained for 10 epochs on the training data using a mini-batch size of 1000 observations with an Root Mean Square (RMSE) error metric (RMSEPOLQA+RMSEMVP)."; Performing optimization operations on the multitask learning model based on the loss reads on training the recursive neural network using the error metric.).

computing a first plurality of pseudo labels based on a first audio clip and a plurality of models, wherein the first plurality of pseudo labels specifies metric values for a plurality of metrics that are relevant to audio quality (Section IID, lines 4-7, "The context length is a parameter that can be varied from 10 ms up to the maximum length of an utterance in the data and represents the context available as input to the multi-task RNN."; Section IID, line 17-19, "One of the output nodes estimates the POLQA score and the second node estimates the Mean Voice activity Posterior (MVP)."; The audio clip reads on the utterance in the data and the pseudo labels specifying metric values read on the POLQA and MVP models providing POLQA and MVP scores.);
computing a first set of feature values for a set of audio features based on the first audio clip (Section IID, line 1, "The 24 MFCC and 24 MDCC features");
and executing at least one multitask learning algorithm based on the first set of feature values and the first plurality of pseudo labels to generate a trained multitask learning model (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; Section IID, lines 32-35, "The multi-task RNN is trained for 10 epochs on the training data using a mini-batch size of 1000 observations with an Root Mean Square (RMSE) error metric (RMSEPOLQA+RMSEMVP)."; Generating a trained multitask learning model reads on training the recursive neural 
wherein the trained multitask learning model maps different sets of feature values for the set of audio features to different pluralities of predicted labels (Section V, lines 3-10, "The proposed method uses narrowband MFCC features in combination with features extracted from a novel compressed representation of the modulation spectrum, that help reduce the RMSE of the system by 31.0% relative to the MFCC only system.  The use of an RNN topology with these features allows the proposed method to reliably estimate POLQA and VAD with 300 ms of context."; The mapping of feature values to predicted labels reads on the MFCC features allowing the RNN topology to reliably estimate POLQA and VAD values.),
and wherein each plurality of predicted labels specifies estimated metric values for the plurality of metrics (Section IV, lines 20-22, "Table II presents the detailed results for POLQA and MVP estimation for a 30 frame context for the different test sets using the full feature set."; The predicted labels read on the POLQA and MVP estimations.).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Sharma in view of Güzelarslan (US Patent Application Publication No. 2020/0402530).

Güzelarslan teaches: wherein a first model included in the plurality of models comprises a Hearing-Aid Audio Quality Index expert system, a Perceptual Evaluation of Audio Quality expert system, a Perception Model Quality expert system, or a Virtual Speech Quality Objective Listener Audio expert system (Paragraph 0004, lines 1-3, "A corresponding international standard for the perceptual quality measurement of wide-band audio signals is PEAQ (perceptual evaluation of audio quality)."; Paragraph 0079, lines 1-5, "The output score value according to different ITU measurement methods, such as for example the PESQ, PEAQ and POLQA value are then to compared in order to produce a signal for supervised learning of the neural network 10.").  Güzelarslan teaches using the Perceptual Evaluation of Audio Quality (PEAQ) measurement to train a neural network to provide an audio quality score to measure audio quality in an objective manner (Paragraph 0054, lines 1-8, "As can be seen by comparison to FIG. 1 according to this aspect the invention proposes to use a neural network 10, alternatively or in addition to the human/subjective test for producing for example the LQS score parameter.  In other words, according to the invention, the output signal (feedback signal) of the system 2 under test is used as the input signal for a neural network 10, thus producing the LQS score in an objective manner.").
.
Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma in view of Francombe et al. (US Patent No. 9,635,483), hereinafter Francombe.
Regarding claim 3, Sharma discloses the computer-implemented method as claimed in claim 1, but does not specifically disclose: wherein a first audio feature included in the set of audio features is associated with a Cepstral Correlation, a Noise-to-Mask Ratio, a Perceptual Similarity Measure, a Neurogram Similarity Index Measure, or a bitrate.
Francombe teaches: wherein a first audio feature included in the set of audio features is associated with a Cepstral Correlation, a Noise-to-Mask Ratio, a Perceptual Similarity Measure, a Neurogram Similarity Index Measure, or a bitrate (Column 25, lines 49-51, "the cross correlation is used to produce the Perceptual Similarity Measure (PSM) which is taken to be a measure of the audio quality"; Column 51, lines 58-61, "The resulting PSMs are then mapped to the OPS, TPS, IPS, and APS predictions by a non-linear function (one hidden layer feed forward neural network) trained on listening test results.").  Francombe teaches using the Perceptual Similarity Measure (PSM) audio feature to train a neural network to measure audio quality to determine the 
Sharma and Francombe are considered to be analogous to the claimed invention because they are in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Francombe to use the Perceptual Similarity Measure (PSM) audio feature to train a neural network to measure audio quality.  Doing so would allow for determining the amount of audio interference from nearby audio sources and providing parameter changes for modifying the audio to reduce the interference.
Regarding claim 13, Sharma discloses the one or more non-transitory computer readable media as claimed in claim 11, but does not specifically disclose: wherein a first audio feature included in the set of audio features is associated with a Cepstral Correlation, a Noise-to-Mask Ratio, a Perceptual Similarity Measure, a Neurogram Similarity Index Measure, or a bitrate.

Sharma and Francombe are considered to be analogous to the claimed invention because they are in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Francombe to use the Perceptual Similarity Measure (PSM) .
Claims 4, 6, 15, and 17 – 18 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma in view of Bryan (US Patent Application Publication No. 2021/0125629).
Regarding claim 4, Sharma discloses the computer-implemented method as claimed in claim 1, wherein computing the first set of feature values comprises: computing a plurality of source feature values for a first source feature based on a plurality of audio clips that includes the first audio clip (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; Section IIIA, lines 18-19, "The resulting training set contains nearly 100 hrs of processed data."; The feature values read on the MFCC and MDCC feature and the audio clip reads on the 100 hours of processed data).
Sharma does not specifically disclose: computing at least one scaling parameter based on the plurality of source feature values; and computing a first feature value for a first audio feature included in the set of audio features based on a first source feature value included in the plurality of source features values and the at least one scaling parameter, wherein the first feature value is included in the first set of feature values.
Bryan teaches:
computing at least one scaling parameter based on the plurality of source feature values (Paragraph 0060, lines 1-5, "In one or more embodiments, a score of 100 
and computing a first feature value for a first audio feature included in the set of audio features based on a first source feature value included in the plurality of source features values and the at least one scaling parameter, wherein the first feature value is included in the first set of feature values (Paragraph 0026, lines 5-9, "For deep learning models, in some embodiments, the acoustic improvement system generates a neural network with architecture customized to efficiently determine corresponding acoustic quality metrics."; Paragraph 0027, lines 8-11, "Further, in some embodiments, the acoustic improvement system can scale and/or combine the acoustic quality metrics to determine an overall audio quality score.").
Bryan teaches scaling audio quality measurements to allow different audio quality measurements to be compared (Paragraph 0133, lines 5-12, "For example, the acoustic improvement system scales or otherwise weights the acoustic quality metric 802 to a common range (e.g., 0-100). In this manner, each of the acoustic quality metrics is normalized to the same metric range, which enables a user to easily compare acoustic quality metrics when presented within the interactive graphical user interface.").
Sharma and Bryan are considered to be analogous to the claimed invention because they are in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have 
Regarding claim 6, Sharma in view of Bryan discloses the computer-implemented method as claimed in claim 4, further comprising computing an overall quality score for a second audio clip based on the trained multitask learning model and the at least one scaling parameter (Bryan, Paragraph 0027, lines 8-11, "Further, in some embodiments, the acoustic improvement system can scale and/or combine the acoustic quality metrics to determine an overall audio quality score."; Bryan, Paragraph 0082, lines 1-5, "To illustrate, FIG. 4B shows the acoustic improvement system capturing a second audio recording 414b, analyzing it, determining updated acoustic quality metrics 212, as well as calculating a new overall acoustic quality score, each of which is previously described.").
Bryan teaches calculating an overall audio quality score from scaled audio quality measurements from a trained neural network to determine ways to improve the evaluated audio (Paragraph 0024, lines 3-10, "Based on the audio input, the acoustic improvement system can determine multiple acoustic quality metrics corresponding to different acoustic quality categories by utilizing multiple acoustic quality measurement models. In addition, the acoustic improvement system can utilize the acoustic quality metrics to determine key actionable acoustic improvement suggestions from a larger set of actionable acoustic improvement suggestions.").
Sharma and Bryan are considered to be analogous to the claimed invention because they are in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of 
Regarding claim 15, Sharma discloses the one or more non-transitory computer readable media as claimed in claim 11, wherein computing the first set of feature values comprises: computing a plurality of source feature values for a first source feature based on a plurality of audio clips that includes the first audio clip (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; Section IIIA, lines 18-19, "The resulting training set contains nearly 100 hrs of processed data."; The feature values read on the MFCC and MDCC feature and the audio clip reads on the 100 hours of processed data).
Sharma does not specifically disclose: computing at least one scaling parameter based on the plurality of source feature values; and computing a first feature value for a first audio feature included in the set of audio features based on a first source feature value included in the plurality of source features values and the at least one scaling parameter, wherein the first feature value is included in the first set of feature values.
Bryan teaches:
computing at least one scaling parameter based on the plurality of source feature values (Paragraph 0060, lines 1-5, "In one or more embodiments, a score of 100 indicates an optimal result for a particular acoustic quality category. For example, the 
and computing a first feature value for a first audio feature included in the set of audio features based on a first source feature value included in the plurality of source features values and the at least one scaling parameter, wherein the first feature value is included in the first set of feature values (Paragraph 0026, lines 5-9, "For deep learning models, in some embodiments, the acoustic improvement system generates a neural network with architecture customized to efficiently determine corresponding acoustic quality metrics."; Paragraph 0027, lines 8-11, "Further, in some embodiments, the acoustic improvement system can scale and/or combine the acoustic quality metrics to determine an overall audio quality score.").
Bryan teaches scaling audio quality measurements to allow different audio quality measurements to be compared (Paragraph 0133, lines 5-12, "For example, the acoustic improvement system scales or otherwise weights the acoustic quality metric 802 to a common range (e.g., 0-100). In this manner, each of the acoustic quality metrics is normalized to the same metric range, which enables a user to easily compare acoustic quality metrics when presented within the interactive graphical user interface.").
Sharma and Bryan are considered to be analogous to the claimed invention because they are in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Bryan to scale audio quality measurements.  Doing so would allow different audio quality measurements to be compared.

Bryan teaches calculating an overall audio quality score from scaled audio quality measurements from a trained neural network to determine ways to improve the evaluated audio (Paragraph 0024, lines 3-10, "Based on the audio input, the acoustic improvement system can determine multiple acoustic quality metrics corresponding to different acoustic quality categories by utilizing multiple acoustic quality measurement models. In addition, the acoustic improvement system can utilize the acoustic quality metrics to determine key actionable acoustic improvement suggestions from a larger set of actionable acoustic improvement suggestions.").
Sharma and Bryan are considered to be analogous to the claimed invention because they are in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Bryan to calculate an overall audio quality score from scaled 
Regarding claim 18, Sharma discloses the one or more non-transitory computer readable media as claimed in claim 11, but does not specifically disclose: wherein computing a first pseudo label included in the plurality of pseudo labels comprises scaling a first output value included in the plurality of output values based on a theoretical range of a first output of a first model included in the plurality of models to generate the first pseudo label.
Bryan teaches: wherein computing a first pseudo label included in the plurality of pseudo labels comprises scaling a first output value included in the plurality of output values based on a theoretical range of a first output of a first model included in the plurality of models to generate the first pseudo label (Paragraph 0060, lines 1-5, "In one or more embodiments, a score of 100 indicates an optimal result for a particular acoustic quality category. For example, the acoustic improvement system normalizes and/or scales each of the acoustic quality metrics 212 to a 100 point scale (or another value).").  Bryan teaches scaling audio quality measurements to allow different audio quality measurements to be compared (Paragraph 0133, lines 5-12, "For example, the acoustic improvement system scales or otherwise weights the acoustic quality metric 802 to a common range (e.g., 0-100). In this manner, each of the acoustic quality metrics is normalized to the same metric range, which enables a user to easily compare acoustic quality metrics when presented within the interactive graphical user interface.").
Sharma and Bryan are considered to be analogous to the claimed invention because they are in the same field of training a learning model to provide a .
Claims 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma in view of Bryan, and further in view of Teague (US Patent Application Publication No. 2020/0349467).
Regarding claim 5, Sharma in view of Bryan discloses the computer-implemented method as claimed in claim 4, but does not specifically disclose: wherein computing the at least one scaling parameter comprises: setting a first scaling parameter equal to a minimum source feature value included in the plurality of source feature values; and setting a second scaling parameter equal to a maximum source feature value included in the plurality of source feature values.
Teague teaches: wherein computing the at least one scaling parameter comprises: setting a first scaling parameter equal to a minimum source feature value included in the plurality of source feature values; and setting a second scaling parameter equal to a maximum source feature value included in the plurality of source feature values (Paragraph 0041, lines 4-6, "As another example, min-max scaling may be applied based on the minimum and maximum values of the train set.").  Teague teaches scaling machine learning training data based on minimum and maximum values in the training set to allow for a consistent data format between different training sets (Paragraph 0041, lines 25-32, "Additional data may be collected after training and this later acquired data may be processed in a manner similar to the initial training data 
Sharma, Bryan, and Teague are considered to be analogous to the claimed invention because they are in the same field of training a machine learning model.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Bryan and further in view of Teague to scale machine learning training data based on minimum and maximum values in the training set.  Doing so would provide for a consistent data format between different training sets.
Regarding claim 16, Sharma in view of Bryan discloses the one or more non-transitory computer readable media as claimed in claim 15, but does not specifically disclose: wherein computing the first feature value comprises performing at least one min-max scaling operation on the first source feature value.
Teague teaches: wherein computing the first feature value comprises performing at least one min-max scaling operation on the first source feature value (Paragraph 0041, lines 4-6, "As another example, min-max scaling may be applied based on the minimum and maximum values of the train set.").  Teague teaches scaling machine learning training data based on minimum and maximum values in the training set to allow for a consistent data format between different training sets (Paragraph 0041, lines 25-32, "Additional data may be collected after training and this later acquired data may be processed in a manner similar to the initial training data sets to provide a constant 
Sharma, Bryan, and Teague are considered to be analogous to the claimed invention because they are in the same field of training a machine learning model.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Bryan and further in view of Teague to scale machine learning training data based on minimum and maximum values in the training set.  Doing so would provide for a consistent data format between different training sets.
Claims 7 is rejected under 35 U.S.C. 103 as being unpatentable over Sharma in view of Güzelarslan and Bryan.
Regarding claim 7, Sharma discloses the computer-implemented method as claimed in claim 1, but does not specifically disclose: wherein computing a first pseudo label included in the first plurality of pseudo labels comprises: inputting the first audio clip and a first reference audio clip into a first model included in the plurality of models that, in response, outputs a first metric value for a first metric included in the plurality of metrics; and scaling the first metric value based on a theoretical range of the first metric to generate the first pseudo label.
Güzelarslan teaches: wherein computing a first pseudo label included in the first plurality of pseudo labels comprises: inputting the first audio clip and a first reference audio clip into a first model included in the plurality of models that, in response, outputs 
Sharma and Güzelarslan are considered to be analogous to the claimed invention because they are in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have 
Sharma in view of Güzelarslan does not specifically disclose scaling the first metric value based on a theoretical range of the first metric to generate the first pseudo label.
Bryan teaches scaling the first metric value based on a theoretical range of the first metric to generate the first pseudo label (Paragraph 0060, lines 1-5, "In one or more embodiments, a score of 100 indicates an optimal result for a particular acoustic quality category. For example, the acoustic improvement system normalizes and/or scales each of the acoustic quality metrics 212 to a 100 point scale (or another value).").  Bryan teaches scaling audio quality measurements to allow different audio quality measurements to be compared (Paragraph 0133, lines 5-12, "For example, the acoustic improvement system scales or otherwise weights the acoustic quality metric 802 to a common range (e.g., 0-100). In this manner, each of the acoustic quality metrics is normalized to the same metric range, which enables a user to easily compare acoustic quality metrics when presented within the interactive graphical user interface.").
Sharma, Güzelarslan, and Bryan are considered to be analogous to the claimed invention because they are in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Güzelarslan and further in view of Bryan to scale audio .
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Sharma in view of Suzuki (US Patent Application Publication No.  2019/0385480).
Regarding claim 12, Sharma teaches the one or more non-transitory computer readable media as claimed in claim 11, but does not specifically disclose: wherein a first model included in the plurality of models comprises a perceptual quality model that is trained based on subjective scores assigned by human listeners.
Suzuki teaches: wherein a first model included in the plurality of models comprises a perceptual quality model that is trained based on subjective scores assigned by human listeners (Abstract, lines 2-6, "A machine learning engine may be trained using audio file variables from a plurality of audio files and human generated scores for a comprehensibility, accentedness and intelligibility for each audio file.").  Suzuki teaches using human scoring to training a machine learning model to measure audio quality to allow the trained model to predict human ratings (Paragraph 0072, lines 8-14, "This score-generation algorithm “learns” how to score pronunciation like expert human judges by examining expert ratings and maximizing predictions of those ratings by iterative combinations of features and weights. The scoring model may be the method for selecting features from the speech recognition process and applying them to predict human ratings.").
Sharma and Suzuki are considered to be analogous to the claimed invention because they are in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of .
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Wu et al. (US Patent Application Publication No. 2021/0350820) is a co-pending application that teaches estimating perceived audio quality using a trained multitask learning model.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, 



JAMES BOGGS/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657