DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The Amendment filed July 1, 2022 has been entered.  Claims 1 – 20 remain pending in the application.  Applicant’s amendments to the Specification and Claims have overcome each and every objection previously set forth in the Non-Final Office Action mailed April 1, 2022.  Applicant’s amendments to claims 8 and 19 have overcome the previously set forth 35 U.S.C. 112(b) rejections of claims 8 -– 9 and 19.  Claim 10 has not been amended to overcome the previously set forth 35 U.S.C. 112(b) rejection; thus, that rejection is maintained below.
Response to Arguments
Applicant’s arguments filed July 1, 2022 with respect to claims 1 – 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Objections
Claim 1 is objected to because of the following informalities:
In line 5, “metricvalue” should read “metric value”.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 10 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 10, claim 10 depends from claim 1, and the claim 10 limitation “wherein training the multitask learning model comprises” is indefinite because it is not clear how the limitation further limits the subject matter of claim 1.  Claim 1 includes the term “training a multitask learning model” in the preamble, “a computer-implemented method for training a multitask learning model to assess perceived audio quality, the method comprising”, and includes the term “training the multitask learning model” in the claim element “and training the multitask learning model based on the first set of feature values and the first plurality of pseudo labels to generate a trained multitask learning model”.  Therefore, it is not clear if the claim 10 limitation “wherein training the multitask learning model comprises” further limits or replaces the training steps of claim 1, or if the limitation modifies the training claim element.  Using the term “further comprises” instead of “comprises” would make it clear that the claim element is further limiting subject matter of claim 1.  For examination purposes, the limitation will be interpreted to further limit the claim 1 element “and training the multitask learning model based on the first set of feature values and the first plurality of pseudo labels to generate a trained multitask learning model”.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The Claims 1, 8 – 11, 14, and 18 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma et al. (“Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks”), hereinafter Sharma, in view of Kang et al. ("Machine Learning: Data Pre‐processing”), hereinafter Kang.
Regarding claim 1, Sharma discloses a computer-implemented method for training a multitask learning model to assess perceived audio quality (Section I, lines 51-54, "This paper proposes training a recurrent neural network (RNN) on a large training dataset to estimate the POLQA score non–intrusively, on a short-time basis, using speech features."; Section I, lines 59-61, "Another novel contribution of this work is the joint estimation of voice activity and POLQA using a multi-task RNN."; Training a multitask learning model reads on training a multi-task recurrent neural network (RNN) and perceived audio quality reads on the Perceptual Objective Listening Quality Analysis (POLQA) score.), the method comprising:
computing a first plurality of pseudo labels based on a first audio clip and a plurality of models, wherein each pseudo label included in the first plurality of pseudo labels specifies a metric value that is relevant to audio quality (Section IID, lines 4-7, "The context length is a parameter that can be varied from 10 ms up to the maximum length of an utterance in the data and represents the context available as input to the multi-task RNN."; Section IID, line 17-19, "One of the output nodes estimates the POLQA score and the second node estimates the Mean Voice activity Posterior (MVP)."; The audio clip reads on the utterance in the data and the pseudo labels specifying metric values read on the POLQA and MVP models providing POLQA and MVP scores.),
computing a first set of feature values for a set of audio features based on the first audio clip (Section IID, line 1, "The 24 MFCC and 24 MDCC features");
and training the multitask learning model based on the first set of feature values and the first plurality of pseudo labels to generate a trained multitask learning model (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; Section IID, lines 32-35, "The multi-task RNN is trained for 10 epochs on the training data using a mini-batch size of 1000 observations with an Root Mean Square (RMSE) error metric (RMSEPOLQA+RMSEMVP)."; Training the multitask learning model reads on training the recursive neural network, the feature values read on the MFCC and MDCC features, and the pseudo labels read on the POLQA and MVP scores),
wherein the trained multitask learning model maps different sets of feature values for the set of audio features to different pluralities of predicted labels (Section V, lines 3-10, "The proposed method uses narrowband MFCC features in combination with features extracted from a novel compressed representation of the modulation spectrum, that help reduce the RMSE of the system by 31.0% relative to the MFCC only system.  The use of an RNN topology with these features allows the proposed method to reliably estimate POLQA and VAD with 300 ms of context."; The mapping of feature values to predicted labels reads on the MFCC features allowing the RNN topology to reliably estimate POLQA and VAD values.),
and wherein each plurality of predicted labels specifies estimated metric values for a plurality of metrics computed by the plurality of models (Section IV, lines 20-22, "Table II presents the detailed results for POLQA and MVP estimation for a 30 frame context for the different test sets using the full feature set."; The predicted labels read on the POLQA and MVP estimations.).
Sharma does not specifically disclose: a metric value that is scaled based on a theoretical range of an output of a corresponding model included in the plurality of models.
Kang teaches:
a metric value that is scaled based on a theoretical range of an output of a corresponding model included in the plurality of models (Chapter 5.2, lines 12-15, "The Min–Max normalization method scales the values of feature X of a dataset according to its minimum and maximum values. That is, the method converts a value x of the feature X to x̂ in the range [low, high] by using: x̂ = low + (high − low)(x − Xmin)/(Xmax − Xmin)"; Xmax − Xmin reads on the theoretical range, where the theoretical maximum value and the theoretical minimum value are used in the calculation.).
Kang teaches normalizing data values by scaling the values based on range in order to prevent bias to large-scale data (Chapter 5.2, lines 1-5, "Supervised/unsupervised machine learning algorithms have been widely used for the development of data-driven anomaly detection, diagnosis, and prognosis methods. Additionally, the use of high-dimensional data is indispensable for PHM of complex electronics. However, if each of the dimensions is not normalized to a similar level, the output of the machine learning algorithms can be biased to some of the large-scale data.").
Sharma and Kang are considered to be analogous to the claimed invention because they are in the same field of machine learning models.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma to incorporate the teachings of Kang to normalizing data values by scaling the values based on range.  Doing so would allow for preventing bias to large-scale data.
Regarding claim 8, Sharma in view of Kang discloses the computer-implemented method as claimed in claim 1.  Sharma further discloses: wherein training the multitask learning model based on the first set of feature values and the first plurality of pseudo labels comprises:
inputting the first set of feature values into the multitask learning model that, in response, outputs a first plurality of predicted labels (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; The feature values read on the MFCC and MDCC features and the predicted labels read on the VAD and POLQSA estimations.);
computing a loss based on the first plurality of predicted labels and the first plurality of pseudo labels (Section IIIC, lines 1-3, "In the following, Pe and Pt are the estimated and true POLQA scores, respectively and the error in estimating a sample is defined as E(n) = P(n)e - P(n)t."; The predicted labels read on the estimated POLQA scores, the pseudo labels read on the true POLQA scores, and the loss reads on the error in estimating a sample.);
and performing one or more optimization operations on the multitask learning model based on the loss (Section IID, lines 32-35, "The multi-task RNN is trained for 10 epochs on the training data using a mini-batch size of 1000 observations with an Root Mean Square (RMSE) error metric (RMSEPOLQA+RMSEMVP)."; Performing optimization operations on the multitask learning model based on the loss reads on training the recursive neural network using the error metric.).
Regarding claim 9, Sharma in view of Kang discloses the computer-implemented method as claimed in claim 1.  Sharma further discloses:
wherein training the multitask learning model comprises computing a mean squared error between the first plurality of predicted labels and the first plurality of pseudo labels (Section IID, lines 32-35, "The multi-task RNN is trained for 10 epochs on the training data using a mini-batch size of 1000 observations with an Root Mean Square (RMSE) error metric (RMSEPOLQA+RMSEMVP)."; The mean squared error between the predicted labels and the pseudo labels reads on the RMSEPOLQA and the RMSEMVP.).
Regarding claim 10, as best understood based on the 35 U.S.C. 112(b) issues identified above, Sharma in view of Kang discloses the computer-implemented method as claimed in claim 1.  Sharma further discloses:
wherein training the multitask learning model comprises executing a multitask learning algorithm on the multitask learning model based on the first set of feature values, the first plurality of pseudo labels, a second set of feature values associated with a second audio clip, and a second plurality of feature values associated with the second audio clip (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; Section IIIA, lines 1-4, "The training data for the proposed and baseline methods is based on the clean 100 hrs training partition of the Librispeech corpus, which is derived from audiobooks read by a large number of speakers."; The feature values read on the MFCC and MDCC features, the predicted labels read on the VAD and POLQSA estimations, and the second audio clip reads on the 100 hours training partition.).
Regarding claim 11, Sharma discloses one or more non-transitory computer readable media including instructions that, when executed by one or more processors, cause the one or more processors to train a multitask learning model to assess perceived audio quality (Section I, lines 51-54, "This paper proposes training a recurrent neural network (RNN) on a large training dataset to estimate the POLQA score non–intrusively, on a short-time basis, using speech features."; Section I, lines 59-61, "Another novel contribution of this work is the joint estimation of voice activity and POLQA using a multi-task RNN."; Training a multitask learning model reads on training a multi-task recurrent neural network (RNN) and perceived audio quality reads on the Perceptual Objective Listening Quality Analysis (POLQA) score.) by performing the steps of:
causing a plurality of models to generate a plurality of output values based on a first audio clip and computing a plurality of pseudo labels based on the plurality of output values, wherein each pseudo label included in the plurality of pseudo labels specifies a metric value that is relevant to audio quality (Section IID, lines 4-7, "The context length is a parameter that can be varied from 10 ms up to the maximum length of an utterance in the data and represents the context available as input to the multi-task RNN."; Section IID, line 17-19, "One of the output nodes estimates the POLQA score and the second node estimates the Mean Voice activity Posterior (MVP)."; The audio clip reads on the utterance in the data and the pseudo labels specifying metric values read on the POLQA and MVP models providing POLQA and MVP scores.),
computing a first set of feature values for a set of audio features based on the first audio clip (Section IID, line 1, "The 24 MFCC and 24 MDCC features");
and training the multitask learning model based on the first set of feature values and the plurality of pseudo labels to generate a trained multitask learning model (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; Section IID, lines 32-35, "The multi-task RNN is trained for 10 epochs on the training data using a mini-batch size of 1000 observations with an Root Mean Square (RMSE) error metric (RMSEPOLQA+RMSEMVP)."; Training the multitask learning model reads on training the recursive neural network, the feature values read on the MFCC and MDCC features, and the pseudo labels read on the POLQA and MVP scores),
wherein the trained multitask learning model maps different sets of feature values for the set of audio features to different pluralities of predicted labels (Section V, lines 3-10, "The proposed method uses narrowband MFCC features in combination with features extracted from a novel compressed representation of the modulation spectrum, that help reduce the RMSE of the system by 31.0% relative to the MFCC only system.  The use of an RNN topology with these features allows the proposed method to reliably estimate POLQA and VAD with 300 ms of context."; The mapping of feature values to predicted labels reads on the MFCC features allowing the RNN topology to reliably estimate POLQA and VAD values.),
and wherein each plurality of predicted labels specifies estimated metric values for a plurality of metrics computed by the plurality of models (Section IV, lines 20-22, "Table II presents the detailed results for POLQA and MVP estimation for a 30 frame context for the different test sets using the full feature set."; The predicted labels read on the POLQA and MVP estimations.).
Sharma does not specifically disclose: a metric value that is scaled based on a theoretical range of an output of a corresponding model included in the plurality of models.
Kang teaches:
a metric value that is scaled based on a theoretical range of an output of a corresponding model included in the plurality of models (Chapter 5.2, lines 12-15, "The Min–Max normalization method scales the values of feature X of a dataset according to its minimum and maximum values. That is, the method converts a value x of the feature X to x̂ in the range [low, high] by using: x̂ = low + (high − low)(x − Xmin)/(Xmax − Xmin)"; Xmax − Xmin reads on the theoretical range, where the theoretical maximum value and the theoretical minimum value are used in the calculation.).
Kang teaches normalizing data values by scaling the values based on range in order to prevent bias to large-scale data (Chapter 5.2, lines 1-5, "Supervised/unsupervised machine learning algorithms have been widely used for the development of data-driven anomaly detection, diagnosis, and prognosis methods. Additionally, the use of high-dimensional data is indispensable for PHM of complex electronics. However, if each of the dimensions is not normalized to a similar level, the output of the machine learning algorithms can be biased to some of the large-scale data.").
Sharma and Kang are considered to be analogous to the claimed invention because they are in the same field of machine learning models.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma to incorporate the teachings of Kang to normalizing data values by scaling the values based on range.  Doing so would allow for preventing bias to large-scale data.
Regarding claim 14, Sharma in view of Kang discloses the one or more non-transitory computer readable media as claimed in claim 11.  Sharma further discloses:
wherein the multitask learning model comprises at least one of a neural network, a decision tree, or a random forest (Section I, lines 51-54, "This paper proposes training a recurrent neural network (RNN) on a large training dataset to estimate the POLQA score non–intrusively, on a short-time basis, using speech features."; Section I, lines 59-61, "Another novel contribution of this work is the joint estimation of voice activity and POLQA using a multi-task RNN.").
Regarding claim 18, Sharma in view of Kang discloses the one or more non-transitory computer readable media as claimed in claim 11.  Kang further teaches:
wherein computing a first pseudo label included in the plurality of pseudo labels comprises scaling a first output value included in the plurality of output values based on a theoretical minimum and a theoretical maximum of a first output of a first model included in the plurality of models to generate the first pseudo label (Chapter 5.2, lines 12-15, "The Min–Max normalization method scales the values of feature X of a dataset according to its minimum and maximum values. That is, the method converts a value x of the feature X to x̂ in the range [low, high] by using: x̂ = low + (high − low)(x − Xmin)/(Xmax − Xmin)"; Xmin reads on the theoretical minimum and Xmax reads on the theoretical maximum, where the theoretical maximum value and the theoretical minimum value are used in the calculation.).
Kang teaches normalizing data values by scaling the values based on minimum and maximum values in order to prevent bias to large-scale data (Chapter 5.2, lines 1-5, "Supervised/unsupervised machine learning algorithms have been widely used for the development of data-driven anomaly detection, diagnosis, and prognosis methods. Additionally, the use of high-dimensional data is indispensable for PHM of complex electronics. However, if each of the dimensions is not normalized to a similar level, the output of the machine learning algorithms can be biased to some of the large-scale data.").
Sharma and Kang are considered to be analogous to the claimed invention because they are in the same field of machine learning models.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Kang to further incorporate the teachings of Kang to normalizing data values by scaling the values based on minimum and maximum values.  Doing so would allow for preventing bias to large-scale data.
Regarding claim 19, Sharma in view of Kang discloses the one or more non-transitory computer readable media as claimed in claim 11.  Sharma further discloses: wherein training the multitask learning model based on the first set of feature values and the first plurality of pseudo labels comprises:
inputting the first set of feature values into the multitask learning model that, in response, outputs a first plurality of predicted labels (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; The feature values read on the MFCC and MDCC features and the predicted labels read on the VAD and POLQA estimations.);
computing a loss based on the first plurality of predicted labels and the plurality of pseudo labels (Section IIIC, lines 1-3, "In the following, Pe and Pt are the estimated and true POLQA scores, respectively and the error in estimating a sample is defined as E(n) = P(n)e - P(n)t."; The predicted labels read on the estimated POLQA scores, the pseudo labels read on the true POLQA scores, and the loss reads on the error in estimating a sample.);
and performing one or more optimization operations on the multitask learning model based on the loss (Section IID, lines 32-35, "The multi-task RNN is trained for 10 epochs on the training data using a mini-batch size of 1000 observations with an Root Mean Square (RMSE) error metric (RMSEPOLQA+RMSEMVP)."; Performing optimization operations on the multitask learning model based on the loss reads on training the recursive neural network using the error metric.).
Regarding claim 20, Sharma discloses a system comprising: one or more memories storing instructions and one or more processors coupled to the one or more memories that, when executing the instructions, perform the steps of:
computing a first plurality of pseudo labels based on a first audio clip and a plurality of models, wherein each pseudo label included in the first plurality of pseudo labels specifies a metric value that is relevant to audio quality (Section IID, lines 4-7, "The context length is a parameter that can be varied from 10 ms up to the maximum length of an utterance in the data and represents the context available as input to the multi-task RNN."; Section IID, line 17-19, "One of the output nodes estimates the POLQA score and the second node estimates the Mean Voice activity Posterior (MVP)."; The audio clip reads on the utterance in the data and the pseudo labels specifying metric values read on the POLQA and MVP models providing POLQA and MVP scores.);
computing a first set of feature values for a set of audio features based on the first audio clip (Section IID, line 1, "The 24 MFCC and 24 MDCC features");
and executing at least one multitask learning algorithm based on the first set of feature values and the first plurality of pseudo labels to generate a trained multitask learning model (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; Section IID, lines 32-35, "The multi-task RNN is trained for 10 epochs on the training data using a mini-batch size of 1000 observations with an Root Mean Square (RMSE) error metric (RMSEPOLQA+RMSEMVP)."; Generating a trained multitask learning model reads on training the recursive neural network, the feature values read on the MFCC and MDCC features, and the pseudo labels read on the POLQA and MVP scores),
wherein the trained multitask learning model maps different sets of feature values for the set of audio features to different pluralities of predicted labels (Section V, lines 3-10, "The proposed method uses narrowband MFCC features in combination with features extracted from a novel compressed representation of the modulation spectrum, that help reduce the RMSE of the system by 31.0% relative to the MFCC only system.  The use of an RNN topology with these features allows the proposed method to reliably estimate POLQA and VAD with 300 ms of context."; The mapping of feature values to predicted labels reads on the MFCC features allowing the RNN topology to reliably estimate POLQA and VAD values.),
and wherein each plurality of predicted labels specifies estimated metric values for a plurality of metrics computed by the plurality of models (Section IV, lines 20-22, "Table II presents the detailed results for POLQA and MVP estimation for a 30 frame context for the different test sets using the full feature set."; The predicted labels read on the POLQA and MVP estimations.).
Sharma does not specifically disclose: a metric value that is scaled based on a theoretical range of an output of a corresponding model included in the plurality of models.
Kang teaches:
a metric value that is scaled based on a theoretical range of an output of a corresponding model included in the plurality of models (Chapter 5.2, lines 12-15, "The Min–Max normalization method scales the values of feature X of a dataset according to its minimum and maximum values. That is, the method converts a value x of the feature X to x̂ in the range [low, high] by using: x̂ = low + (high − low)(x − Xmin)/(Xmax − Xmin)"; Xmax − Xmin reads on the theoretical range, where the theoretical maximum value and the theoretical minimum value are used in the calculation.).
Kang teaches normalizing data values by scaling the values based on range in order to prevent bias to large-scale data (Chapter 5.2, lines 1-5, "Supervised/unsupervised machine learning algorithms have been widely used for the development of data-driven anomaly detection, diagnosis, and prognosis methods. Additionally, the use of high-dimensional data is indispensable for PHM of complex electronics. However, if each of the dimensions is not normalized to a similar level, the output of the machine learning algorithms can be biased to some of the large-scale data.").
Sharma and Kang are considered to be analogous to the claimed invention because they are in the same field of machine learning models.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma to incorporate the teachings of Kang to normalizing data values by scaling the values based on range.  Doing so would allow for preventing bias to large-scale data.
Claim 2 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma in view of Kang, and further in view of Güzelarslan (US Patent Application Publication No. 2020/0402530).
Regarding claim 2, Sharma in view of Kang discloses the computer-implemented method as claimed in claim 1, but does not specifically disclose: wherein a first model included in the plurality of models comprises a Hearing-Aid Audio Quality Index expert system, a Perceptual Evaluation of Audio Quality expert system, a Perception Model Quality expert system, or a Virtual Speech Quality Objective Listener Audio expert system.
Güzelarslan teaches: wherein a first model included in the plurality of models comprises a Hearing-Aid Audio Quality Index expert system, a Perceptual Evaluation of Audio Quality expert system, a Perception Model Quality expert system, or a Virtual Speech Quality Objective Listener Audio expert system (Paragraph 0004, lines 1-3, "A corresponding international standard for the perceptual quality measurement of wide-band audio signals is PEAQ (perceptual evaluation of audio quality)."; Paragraph 0079, lines 1-5, "The output score value according to different ITU measurement methods, such as for example the PESQ, PEAQ and POLQA value are then to compared in order to produce a signal for supervised learning of the neural network 10.").  Güzelarslan teaches using the Perceptual Evaluation of Audio Quality (PEAQ) measurement to train a neural network to provide an audio quality score to measure audio quality in an objective manner (Paragraph 0054, lines 1-8, "As can be seen by comparison to FIG. 1 according to this aspect the invention proposes to use a neural network 10, alternatively or in addition to the human/subjective test for producing for example the LQS score parameter.  In other words, according to the invention, the output signal (feedback signal) of the system 2 under test is used as the input signal for a neural network 10, thus producing the LQS score in an objective manner.").
Güzelarslan is considered to be analogous to the claimed invention because it is in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Kang to incorporate the teachings of Güzelarslan to use the Perceptual Evaluation of Audio Quality (PEAQ) measurement to train a neural network to provide an audio quality score.  Doing so would allow for measuring audio quality in an objective manner.
Regarding claim 7, Sharma in view of Kang discloses the computer-implemented method as claimed in claim 1.  Kang further teaches:
scaling the first metric value based on a theoretical minimum and theoretical maximum of the first metric computed by the first model to generate the first pseudo label (Chapter 5.2, lines 12-15, "The Min–Max normalization method scales the values of feature X of a dataset according to its minimum and maximum values. That is, the method converts a value x of the feature X to x̂ in the range [low, high] by using: x̂ = low + (high − low)(x − Xmin)/(Xmax − Xmin)"; Xmin reads on the theoretical minimum and Xmax reads on the theoretical maximum, where the theoretical maximum value and the theoretical minimum value are used in the calculation.).
Kang teaches normalizing data values by scaling the values based on minimum and maximum values in order to prevent bias to large-scale data (Chapter 5.2, lines 1-5, "Supervised/unsupervised machine learning algorithms have been widely used for the development of data-driven anomaly detection, diagnosis, and prognosis methods. Additionally, the use of high-dimensional data is indispensable for PHM of complex electronics. However, if each of the dimensions is not normalized to a similar level, the output of the machine learning algorithms can be biased to some of the large-scale data.").
Sharma and Kang are considered to be analogous to the claimed invention because they are in the same field of machine learning models.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Kang to further incorporate the teachings of Kang to normalizing data values by scaling the values based on minimum and maximum values.  Doing so would allow for preventing bias to large-scale data.
Sharma in view of Kang does not specifically disclose: wherein computing a first pseudo label included in the first plurality of pseudo labels comprises: inputting the first audio clip and a first reference audio clip into a first model included in the plurality of models that, in response, outputs a first metric value for a first metric included in the plurality of metrics.
Güzelarslan teaches: wherein computing a first pseudo label included in the first plurality of pseudo labels comprises: inputting the first audio clip and a first reference audio clip into a first model included in the plurality of models that, in response, outputs a first metric value for a first metric included in the plurality of metrics (Paragraph 0012, lines 1-3, " The input signal (reference signal), together with the output signal (degraded signal) are furthermore supplied to a double ended objective model 6"; Paragraph 0070, lines 1-4, "Furthermore, the reference signal (input signal) is also fed to the objective model 6, and an output thereof is used as training data by comparing these objectively produced training data with the output of the neural network 10. In methods known as such these training data can be used for supervised learning of the neural network 10.".; Paragraph 0075, lines 1- 4, "Preferably, a plurality of quality score values according to different ITU measurement methods is produced simultaneously, such for example values according to PESQ, PEAQ, and/or POLQA."; The audio clip reads on the degraded signal, the reference audio clip reads on the reference signal, the models read on the PESQ, PEAQ, and POLQA, and the metrics read on the quality score values.).  Güzelarslan teaches using audio quality measurement models to determine audio quality scores for reference audio and modified audio derived from the reference audio to train a neural network to performing audio quality measurements (Paragraph 0079, lines 1- 5, "The output score value according to different ITU measurement methods, such as for example the PESQ, PEAQ and POLQA value are then to compared in order to produce a signal for supervised learning of the neural network 10.").
Güzelarslan is considered to be analogous to the claimed invention because it is in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Kang to incorporate the teachings of Güzelarslan to use audio quality measurement models to determine audio quality scores for reference audio and modified audio derived from the reference audio.  Doing so would allow for training a neural network to performing audio quality measurements.
Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma in view of Kang, and further in view of Francombe et al. (US Patent No. 9,635,483), hereinafter Francombe.
Regarding claim 3, Sharma in view of Kang discloses the computer-implemented method as claimed in claim 1, but does not specifically disclose: wherein a first audio feature included in the set of audio features is associated with a Cepstral Correlation, a Noise-to-Mask Ratio, a Perceptual Similarity Measure, a Neurogram Similarity Index Measure, or a bitrate.
Francombe teaches: wherein a first audio feature included in the set of audio features is associated with a Cepstral Correlation, a Noise-to-Mask Ratio, a Perceptual Similarity Measure, a Neurogram Similarity Index Measure, or a bitrate (Column 25, lines 49-51, "the cross correlation is used to produce the Perceptual Similarity Measure (PSM) which is taken to be a measure of the audio quality"; Column 51, lines 58-61, "The resulting PSMs are then mapped to the OPS, TPS, IPS, and APS predictions by a non-linear function (one hidden layer feed forward neural network) trained on listening test results.").  Francombe teaches using the Perceptual Similarity Measure (PSM) audio feature to train a neural network to measure audio quality to determine the amount of audio interference from nearby audio sources and provide parameter changes for modifying the audio to reduce the interference (Column 19, lines 7-11, lines "Methods are described above of determining the interference in one area or zone from sound provided in another area or zone or a value representing a quality of the sound generated in one zone when sound is audible from another zone."; Abstract, lines 5-10, "If the interference value exceeds a predetermined threshold, one or more parameter changes are determined to parameters of the conversion which, when implemented, will ensure that the interference in the first zone by sound in the second zone is maintained below a threshold limit.").
Francombe is considered to be analogous to the claimed invention because it is in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Kang to incorporate the teachings of Francombe to use the Perceptual Similarity Measure (PSM) audio feature to train a neural network to measure audio quality.  Doing so would allow for determining the amount of audio interference from nearby audio sources and providing parameter changes for modifying the audio to reduce the interference.
Regarding claim 13, Sharma in view of Kang discloses the one or more non-transitory computer readable media as claimed in claim 11, but does not specifically disclose: wherein a first audio feature included in the set of audio features is associated with a Cepstral Correlation, a Noise-to-Mask Ratio, a Perceptual Similarity Measure, a Neurogram Similarity Index Measure, or a bitrate.
Francombe teaches: wherein a first audio feature included in the set of audio features is associated with a Cepstral Correlation, a Noise-to-Mask Ratio, a Perceptual Similarity Measure, a Neurogram Similarity Index Measure, or a bitrate (Column 25, lines 49-51, "the cross correlation is used to produce the Perceptual Similarity Measure (PSM) which is taken to be a measure of the audio quality"; Column 51, lines 58-61, "The resulting PSMs are then mapped to the OPS, TPS, IPS, and APS predictions by a non-linear function (one hidden layer feed forward neural network) trained on listening test results.").  Francombe teaches using the Perceptual Similarity Measure (PSM) audio feature to train a neural network to measure audio quality to determine the amount of audio interference from nearby audio sources and provide parameter changes for modifying the audio to reduce the interference (Column 19, lines 7-11, lines "Methods are described above of determining the interference in one area or zone from sound provided in another area or zone or a value representing a quality of the sound generated in one zone when sound is audible from another zone."; Abstract, lines 5-10, "If the interference value exceeds a predetermined threshold, one or more parameter changes are determined to parameters of the conversion which, when implemented, will ensure that the interference in the first zone by sound in the second zone is maintained below a threshold limit.").
Francombe is considered to be analogous to the claimed invention because it is in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Kang to incorporate the teachings of Francombe to use the Perceptual Similarity Measure (PSM) audio feature to train a neural network to measure audio quality.  Doing so would allow for determining the amount of audio interference from nearby audio sources and providing parameter changes for modifying the audio to reduce the interference.
Claims 4, 6, 15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma in view of Kang, and further in view of Bryan (US Patent Application Publication No. 2021/0125629).
Regarding claim 4, Sharma in view of Kang discloses the computer-implemented method as claimed in claim 1.  Sharma further discloses:
wherein computing the first set of feature values comprises: computing a plurality of source feature values for a first source feature based on a plurality of audio clips that includes the first audio clip (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; Section IIIA, lines 18-19, "The resulting training set contains nearly 100 hrs of processed data."; The feature values read on the MFCC and MDCC feature and the audio clip reads on the 100 hours of processed data).
Sharma in view of Kang does not specifically disclose: computing at least one scaling parameter based on the plurality of source feature values; and computing a first feature value for a first audio feature included in the set of audio features based on a first source feature value included in the plurality of source feature values and the at least one scaling parameter, wherein the first feature value is included in the first set of feature values.
Bryan teaches:
computing at least one scaling parameter based on the plurality of source feature values (Paragraph 0060, lines 1-5, "In one or more embodiments, a score of 100 indicates an optimal result for a particular acoustic quality category. For example, the acoustic improvement system normalizes and/or scales each of the acoustic quality metrics 212 to a 100 point scale (or another value).");
and computing a first feature value for a first audio feature included in the set of audio features based on a first source feature value included in the plurality of source feature values and the at least one scaling parameter, wherein the first feature value is included in the first set of feature values (Paragraph 0026, lines 5-9, "For deep learning models, in some embodiments, the acoustic improvement system generates a neural network with architecture customized to efficiently determine corresponding acoustic quality metrics."; Paragraph 0027, lines 8-11, "Further, in some embodiments, the acoustic improvement system can scale and/or combine the acoustic quality metrics to determine an overall audio quality score.").
Bryan teaches scaling audio quality measurements to allow different audio quality measurements to be compared (Paragraph 0133, lines 5-12, "For example, the acoustic improvement system scales or otherwise weights the acoustic quality metric 802 to a common range (e.g., 0-100). In this manner, each of the acoustic quality metrics is normalized to the same metric range, which enables a user to easily compare acoustic quality metrics when presented within the interactive graphical user interface.").
Bryan is considered to be analogous to the claimed invention because it is in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Kang to incorporate the teachings of Bryan to scale audio quality measurements.  Doing so would allow different audio quality measurements to be compared.
Regarding claim 6, Sharma in view of Kang and further in view of Bryan discloses the computer-implemented method as claimed in claim 4.  Bryan further teaches:
further comprising computing an overall quality score for a second audio clip based on the trained multitask learning model and the at least one scaling parameter (Paragraph 0027, lines 8-11, "Further, in some embodiments, the acoustic improvement system can scale and/or combine the acoustic quality metrics to determine an overall audio quality score."; Paragraph 0082, lines 1-5, "To illustrate, FIG. 4B shows the acoustic improvement system capturing a second audio recording 414b, analyzing it, determining updated acoustic quality metrics 212, as well as calculating a new overall acoustic quality score, each of which is previously described.").
Bryan teaches calculating an overall audio quality score from scaled audio quality measurements from a trained neural network to determine ways to improve the evaluated audio (Paragraph 0024, lines 3-10, "Based on the audio input, the acoustic improvement system can determine multiple acoustic quality metrics corresponding to different acoustic quality categories by utilizing multiple acoustic quality measurement models. In addition, the acoustic improvement system can utilize the acoustic quality metrics to determine key actionable acoustic improvement suggestions from a larger set of actionable acoustic improvement suggestions.").
Bryan is considered to be analogous to the claimed invention because it is in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Kang and further in view of Bryan to further incorporate the teachings of Bryan to calculate an overall audio quality score from scaled audio quality measurements from a trained neural network.  Doing so would allow for determining ways to improve the evaluated audio.
Regarding claim 15, Sharma in view of Kang discloses the one or more non-transitory computer readable media as claimed in claim 11.  Sharma further discloses:
wherein computing the first set of feature values comprises: computing a plurality of source feature values for a first source feature based on a plurality of audio clips that includes the first audio clip (Section IID, lines 1-3, "The 24 MFCC and 24 MDCC features are standardized (zero mean and unit standard deviation) before being used to train an RNN that jointly estimates VAD and POLQA"; Section IIIA, lines 18-19, "The resulting training set contains nearly 100 hrs of processed data."; The feature values read on the MFCC and MDCC feature and the audio clip reads on the 100 hours of processed data).
Sharma in view of Kang does not specifically disclose: computing at least one scaling parameter based on the plurality of source feature values; and computing a first feature value for a first audio feature included in the set of audio features based on a first source feature value included in the plurality of source feature values and the at least one scaling parameter, wherein the first feature value is included in the first set of feature values.
Bryan teaches:
computing at least one scaling parameter based on the plurality of source feature values (Paragraph 0060, lines 1-5, "In one or more embodiments, a score of 100 indicates an optimal result for a particular acoustic quality category. For example, the acoustic improvement system normalizes and/or scales each of the acoustic quality metrics 212 to a 100 point scale (or another value).");
and computing a first feature value for a first audio feature included in the set of audio features based on a first source feature value included in the plurality of source feature values and the at least one scaling parameter, wherein the first feature value is included in the first set of feature values (Paragraph 0026, lines 5-9, "For deep learning models, in some embodiments, the acoustic improvement system generates a neural network with architecture customized to efficiently determine corresponding acoustic quality metrics."; Paragraph 0027, lines 8-11, "Further, in some embodiments, the acoustic improvement system can scale and/or combine the acoustic quality metrics to determine an overall audio quality score.").
Bryan teaches scaling audio quality measurements to allow different audio quality measurements to be compared (Paragraph 0133, lines 5-12, "For example, the acoustic improvement system scales or otherwise weights the acoustic quality metric 802 to a common range (e.g., 0-100). In this manner, each of the acoustic quality metrics is normalized to the same metric range, which enables a user to easily compare acoustic quality metrics when presented within the interactive graphical user interface.").
Bryan is considered to be analogous to the claimed invention because it is in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Kang to incorporate the teachings of Bryan to scale audio quality measurements.  Doing so would allow different audio quality measurements to be compared.
Regarding claim 17, Sharma in view of Kang and further in view of Bryan discloses the computer-implemented method as claimed in claim 15.  Bryan further teaches:
further comprising computing an overall quality score for a second audio clip based on the trained multitask learning model and the at least one scaling parameter (Paragraph 0027, lines 8-11, "Further, in some embodiments, the acoustic improvement system can scale and/or combine the acoustic quality metrics to determine an overall audio quality score."; Paragraph 0082, lines 1-5, "To illustrate, FIG. 4B shows the acoustic improvement system capturing a second audio recording 414b, analyzing it, determining updated acoustic quality metrics 212, as well as calculating a new overall acoustic quality score, each of which is previously described.").
Bryan teaches calculating an overall audio quality score from scaled audio quality measurements from a trained neural network to determine ways to improve the evaluated audio (Paragraph 0024, lines 3-10, "Based on the audio input, the acoustic improvement system can determine multiple acoustic quality metrics corresponding to different acoustic quality categories by utilizing multiple acoustic quality measurement models. In addition, the acoustic improvement system can utilize the acoustic quality metrics to determine key actionable acoustic improvement suggestions from a larger set of actionable acoustic improvement suggestions.").
Bryan is considered to be analogous to the claimed invention because it is in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Kang and further in view of Bryan to further incorporate the teachings of Bryan to calculate an overall audio quality score from scaled audio quality measurements from a trained neural network.  Doing so would allow for determining ways to improve the evaluated audio.
Claims 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma in view of Kang and Bryan, and further in view of Teague (US Patent Application Publication No. 2020/0349467).
Regarding claim 5, Sharma in view of Kang and Bryan discloses the computer-implemented method as claimed in claim 4, but does not specifically disclose: wherein computing the at least one scaling parameter comprises: setting a first scaling parameter equal to a minimum source feature value included in the plurality of source feature values; and setting a second scaling parameter equal to a maximum source feature value included in the plurality of source feature values.
Teague teaches: wherein computing the at least one scaling parameter comprises: setting a first scaling parameter equal to a minimum source feature value included in the plurality of source feature values; and setting a second scaling parameter equal to a maximum source feature value included in the plurality of source feature values (Paragraph 0041, lines 4-6, "As another example, min-max scaling may be applied based on the minimum and maximum values of the train set.").  Teague teaches scaling machine learning training data based on minimum and maximum values in the training set to allow for a consistent data format between different training sets (Paragraph 0041, lines 25-32, "Additional data may be collected after training and this later acquired data may be processed in a manner similar to the initial training data sets to provide a constant and consistently formatted data for the ML system, such as to train a new ML system with consistently formatted data to iterate on the ML system techniques in isolation of random noise effects of data set processing, or to generate predictions from the trained ML system.").
Teague is considered to be analogous to the claimed invention because it is in the same field of training a machine learning model.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Kang and Bryan to incorporate the teachings of Teague to scale machine learning training data based on minimum and maximum values in the training set.  Doing so would provide for a consistent data format between different training sets.
Regarding claim 16, Sharma in view of Kang and Bryan discloses the one or more non-transitory computer readable media as claimed in claim 15, but does not specifically disclose: wherein computing the first feature value comprises performing at least one min-max scaling operation on the first source feature value.
Teague teaches: wherein computing the first feature value comprises performing at least one min-max scaling operation on the first source feature value (Paragraph 0041, lines 4-6, "As another example, min-max scaling may be applied based on the minimum and maximum values of the train set.").  Teague teaches scaling machine learning training data based on minimum and maximum values in the training set to allow for a consistent data format between different training sets (Paragraph 0041, lines 25-32, "Additional data may be collected after training and this later acquired data may be processed in a manner similar to the initial training data sets to provide a constant and consistently formatted data for the ML system, such as to train a new ML system with consistently formatted data to iterate on the ML system techniques in isolation of random noise effects of data set processing, or to generate predictions from the trained ML system.").
Teague is considered to be analogous to the claimed invention because it is in the same field of training a machine learning model.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Kang and Bryan to incorporate the teachings of Teague to scale machine learning training data based on minimum and maximum values in the training set.  Doing so would provide for a consistent data format between different training sets.
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Sharma in view of Kang, and further in view of Suzuki (US Patent Application Publication No.  2019/0385480).
Regarding claim 12, Sharma in view of Kang teaches the one or more non-transitory computer readable media as claimed in claim 11, but does not specifically disclose: wherein a first model included in the plurality of models comprises a perceptual quality model that is trained based on subjective scores assigned by human listeners.
Suzuki teaches: wherein a first model included in the plurality of models comprises a perceptual quality model that is trained based on subjective scores assigned by human listeners (Abstract, lines 2-6, "A machine learning engine may be trained using audio file variables from a plurality of audio files and human generated scores for a comprehensibility, accentedness and intelligibility for each audio file.").  Suzuki teaches using human scoring to training a machine learning model to measure audio quality to allow the trained model to predict human ratings (Paragraph 0072, lines 8-14, "This score-generation algorithm “learns” how to score pronunciation like expert human judges by examining expert ratings and maximizing predictions of those ratings by iterative combinations of features and weights. The scoring model may be the method for selecting features from the speech recognition process and applying them to predict human ratings.").
Suzuki is considered to be analogous to the claimed invention because it is in the same field of training a learning model to provide a measurement of audio quality.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharma in view of Kang to incorporate the teachings of Suzuki to use human scoring to training a machine learning model to measure audio quality.  Doing so would allow the trained model to predict human ratings.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JAMES BOGGS/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657