Status of Claims
Claims 1-12 are pending.
This communication is in response to the communication filed 6/29/2019.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings are objected to under 37 CFR 1.83(a).  The drawings must show every feature of the invention specified in the claims.  Therefore, the “comparing the score signals output by the neural network and the analytical quality testing device, respectively, and using the result of the comparison for training the neural network” of Claim 6 must be shown or the feature(s) canceled from the claim(s).  No new matter should be entered.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or 
Figures 1-3 should be designated by a legend such as --Prior Art-- because only that which is old is illustrated.  See MPEP § 608.02(g).  Corrected drawings in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. The replacement sheet(s) should be labeled “Replacement Sheet” in the page header (as per 37 CFR 1.84(c)) so as not to obstruct any portion of the drawing figures. If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Objections
Claims 1-3, 5-6, and 8-12 are objected to because of the following numerous informalities:  a suggested correction for each respective informality is listed below.  Appropriate correction is required.

1. An apparatus for generating a score signal representing the quality of an audio or video signal supplied to the apparatus, the apparatus comprising: 
- an input for supplying [[an]] the supplied audio or video signal[[,]]; and
- a computing unit implementing a neural network, the computing unit being supplied with the supplied audio or video signal, and producing a score signal representing the quality of [[an]] the supplied audio or video signal, the supplied audio or video signal representing at least one predefined quality parameter of the supplied audio or video signal, the neural network being set up by being trained with training data of a specific transmission standard and/or codec used for generating data of the supplied audio or video [[data]].

2. The apparatus of claim 1, wherein the supplied audio or video signal is a digital audio signal and the score signal represents the speech quality according to at least one of the following ITU-T (International Telecommunications Union-Telecommunication Standardization Sector) speech quality testing methods: PESQ (Perceptual Evaluation of Speech Quality); PEAQ (Perceptual Evaluation of Audio Quality); and POLQ (Perceptual Objective Listening Quality analysis).

3. The apparatus of claim 2, wherein the score signal represents simultaneously the speech quality according to at least two of the following ITU-T speech quality testing methods: PESQ; PEAQ; and POLQ.

5. The apparatus of claim 1, wherein the supplied audio or video signal is a speech signal and the score signal represents the ITU (International Telecommunications Union) P.800 value LQS (Listening Quality Subjective).

6. The apparatus according to claim 1, wherein the neural network is obtained by the following supervised learning steps: 
a training audio or video signal to the neural network to obtain a first training output signal[[,]]; 
- feeding said training audio or video signal to an objective analytical quality testing device, together with a reference signal network to obtain a second training output signal[[,]]; and
- comparing the first and second training score signals output by the neural network and the analytical quality testing device, 

8. The apparatus of claim 1, wherein the supplied audio or video signal is a VoIP signal.

9. An apparatus for generating a score signal representing the quality of a speech signal supplied to the apparatus, the apparatus comprising:
 - an input for supplying a supplied speech signal[[,]]; and 
- a computing unit implementing a neural network, the computing unit being supplied with the supplied speech signal, and producing a score signal representing the quality of the supplied speech signal supplied representing at least one predefined quality parameter of the supplied speech signal, wherein the score signal represents the ITU (International Telecommunications Union) P.800 value LQS (Listening Quality Subjective). 

a score signal representing the quality of an audio or video signal supplied to the apparatus, wherein the apparatus implements a Siamese network and comprises: 
- a first neural network being supplied with a supplied reference audio or video signal and designed to generate a first output signal[[,]]; 
- a second neural network being supplied with a supplied audio or video signal for which a score signal is to be generated, the second neural network being designed to generate a second output signal; and
- a third neural network supplied with the first and second output signal, respectively, and generating the score signal. 

11. A computer-implemented method for generating a score signal representing the quality of an audio or video signal, comprising the steps of: 
- supplying [[an]] a supplied audio or video signal[[,]]; and 
- supplying a trained neural network with the supplied audio or video signal, the neural network producing a score signal representing the quality of [[an]] the supplied audio or video signal, the supplied audio or video signal representing at least one predefined quality parameter of the supplied audio or video signal, wherein the training data for the trained neural network are specific for a transmission standard and/or codec used for generating the supplied audio or video signal.


- supplying a speech signal[[,]]; and 
- supplying a trained neural network the supplied speech signal, the trained neural network producing a score signal representing the quality of the supplied speech signal, the supplied speech signal representing at least one predefined quality parameter of the supplied speech signal, wherein the score signal represents the ITU (International Telecommunications Union) P.800 value LQS (Listening Quality Subjective).
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6 and 11 are rejected under 35 U.S. C. 112(b) as being indefinite.

Claim 6 recites the limitation "comparing the score signals" in line 7 (of the claim).  There is insufficient antecedent basis for this limitation in the claim.  It is suggested to amend the claim as listed respectively above under Claim Objections.



Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-2 and 11 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Sung (US 20190164052 A1).
As per independent claim 1, Sung discloses an apparatus for generating a score signal representing the quality of an audio or video signal supplied to the apparatus (see Sung [0002], which notes audio [when limitations are presented in the alternative, only one of the limitations need be cited to show anticipation] signal encoding method and apparatus and audio signal decoding method and apparatus reflecting a parameter learned using a weighted error/score function based on a human auditory characteristic), the apparatus comprising: 
- an input for supplying an audio or video signal (see Sung [0101], which notes with reference to FIG. 5, an audio signal encoding apparatus 510 to which a neural network is applied may receive an input audio signal), 
- a computing unit implementing a neural network, the computing unit being supplied with the audio or video signal (see Sung [0101], which notes an audio signal encoding apparatus 510 to which a neural network is applied may receive an input audio signal), and producing a score signal representing the quality of an audio or video signal supplied representing at least one predefined quality parameter of the audio or video signal (see Sung [0092], which notes a perceptual quality evaluation/scoring 406 may be performed by comparing the predicted audio spectrum Z/predefined quality parameter to the first audio signal y(n) or the frequency spectrum Y of the first audio), the neural network being set up by being trained with training data of a specific transmission standard and/or codec used for generating the audio or video data (see Sung [0131], which notes FIG. 8 is a flowchart illustrating a process of training a neural network applied to an audio signal decoding method using an audio signal decoding apparatus according to an example embodiment; see Sung [0136] which notes in operation 803, the audio signal decoding apparatus may generate a weighted error function obtained by correcting a preset error function using the weight matrix; see Sung [0137], which notes the model may include a neural network that may include an autoencoder and a topology; see Sung [0138], which notes in operation 804, the audio signal decoding apparatus may generate a second audio signal by applying a parameter learned using the weighted error function to the first audio signal. The parameter of the model may be learned using the weighted error function and the predicted audio signal may be the second audio signal; see Sung [0140], which notes the second audio signal may be compared to the first audio signal, so that a perceptual quality evaluation is performed. The perceptual quality evaluation may include, for example, an objective evaluation such as PESQ, POLQA, and PEAQ, where PESQ, POLQA, and PEAQ are ITU-T international standards).  

As per claim 2, Sung discloses all the limitations of claim 1 above.  Sung further teaches wherein the signal is a digital audio signal and the score signal represents the speech quality according to at least one of the following ITU-T speech quality testing methods PESQ PEAQ POLQ (see Sung [0093] The perceptual quality evaluation may include an objective evaluation based on a perceptual evaluation of speech quality (PESQ), a perceptual objective listening quality assessment (POLQA), or a perceptual evaluation of audio quality (PEAQ), and a subjective evaluation based on a mean opinion score (MOS) or multiple stimuli with hidden reference and anchor (MUSHRA)).

As per independent claim 11, Sung discloses a A computer-implemented method for generating a score signal representing the quality of an audio or video signal (see Sung [0002], which notes audio [when limitations are presented in the alternative, only one of the limitations need be cited to show anticipation] signal encoding method and apparatus and audio signal decoding method and apparatus reflecting a parameter learned using a weighted error/score function based on a human auditory characteristic), comprising the steps of: 
- supplying an audio or video signal (see Sung [0101], which notes with reference to FIG. 5, an audio signal encoding apparatus 510 to which a neural network is applied may receive an input audio signal), 
- supplying a trained neural network with the audio or video signal, the neural network producing a score signal representing the quality of an audio or video signal supplied representing at least one predefined quality parameter of the audio or video signal (see Sung [0092], which notes a perceptual quality evaluation/scoring 406 may be performed by comparing the predicted audio spectrum Z/predefined quality parameter to the first audio signal y(n) or the frequency spectrum Y of the first audio), wherein the training data for the trained neural network are specific for a transmission standard and/or codec used for generating the audio or video signal (see Sung [0131], which notes FIG. 8 is a flowchart illustrating a process of training a neural network applied to an audio signal decoding method using an audio signal decoding apparatus according to an example embodiment; see Sung [0136] which notes in operation 803, the audio signal decoding apparatus may generate a weighted error function obtained by correcting a preset error function using the weight matrix; see Sung [0137], which notes the model may include a neural network that may include an autoencoder and a topology; see Sung [0138], which notes in operation 804, the audio signal decoding apparatus may generate a second audio signal by applying a parameter learned using the weighted error function to the first audio signal. The parameter of the model may be learned using the weighted error function and the predicted audio signal may be the second audio signal; see Sung [0140], which notes the second audio signal may be compared to the first audio signal, so that a perceptual quality evaluation is performed. The perceptual quality evaluation may include, for example, an objective evaluation such as PESQ, POLQA, and PEAQ, where PESQ, POLQA, and PEAQ are ITU-T international standards).  

Claim 10 is rejected under 35 U.S.C. 102(a)(1) as being anticipated by Sheng (A Feature Learning Siamese Model for Intelligent Control of the Dynamic Range Compressor; arXiv:1905.01022v1 [eess.AS] 1 May 2019).

As per claim 10, Sheng discloses apparatus for generating an score signal representing the quality of an audio or video signal supplied to the apparatus, wherein the apparatus implements a Siamese network (see Sheng, page 2, right column, last full paragraph, which notes since our problem also requires the model to pay attention to subtle changes in an audio signal, such as the changes in note attack times and to learn features related to them, we consider to use a siamese network) and comprises:   
- a first neural network being supplied with a reference audio or video signal and designed to generate a first output signal (see Sheng, page 3, Fig. 1 and FIG. 1 caption, which note a first input for a first neural network of an unprocessed audio branch of a twin-siamese  DNN model, the first neural network for outputting a first parameter signal that is received by a differencing label generator), 
- a second neural network being supplied with a audio or video signal for which a score signal is to be generated (see Sheng, page 3, Fig. 1 and FIG. 1 caption, which note a second input for a second neural network of a processed audio branch of the twin-siamese DNN model, the second neural network for outputting a second parameter signal that is received by the differencing label generator), 
- a third neural network supplied with the first and second output signal, respectively, and generating the score signal (see Sheng, page 3, Fig. 1 and FIG. 1 caption, which note the differencing label generator can be implemented as a random forest regressor trained by DNN feature embedding, the output of the random forest regressor being the signal Θ for conveying the parameters/scores: attack time, release time, ratio and threshold; and see Sheng, Abstract, which notes The best model is able to produce a universal feature embedding that is capable of predicting multiple DRC parameters simultaneously; and see Sheng, page 1, right column, first full paragraph, lines 1-12,  where DRC is dynamic range compression processing for generating the audio or video signal for which a score signal is to be generated).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:



Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Sung in view of Hameenniemi (US 20150237199 A1).
As per claim 3, Sung teaches all the elements of claim 2 above.
Sung fails to specifically teach wherein score signal represents simultaneously the speech quality according to at least two of the following ITU-T speech quality testing methods: PESQ PEAQ POLQ. 
However, Hameenniemi does teach wherein score signal represents simultaneously the speech quality according to 25at least two of the following ITU-T speech quality testing methods: PESQ PEAQ POLQ (see Hameenniemi [0118], which notes after the communication between the controller and the UT stops, the controller may process the received sample audio data and determine the quality on the basis of ITU PESQ and/or POLQA recommendations, for example; and see Hameenniemi [0067], which notes in step 552, the user terminal is configured to determine on the basis of the comparison one or more quality indicators describing the quality of the established wireless connection. The controller may determine the quality on the basis of ITU PESQ and/or POLQA recommendations, for example).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Sung with transmission to the controller of the quality determination of the input signal as (see Hameenniemi [0067], which notes in step 552, the user terminal is configured to determine on the basis of the comparison one or more quality indicators describing the quality of the established wireless connection. The controller may determine the quality of the wireless connection on the basis of ITU PESQ and/or POLQA recommendations; see Hameenniemi [0068], which notes in step 554, the user terminal is configured to transmit the one or more quality indicators describing the quality of the established wireless connection to the external controller using the digital connection; and see Hameenniemi [0069], which notes in this embodiment the user terminal is configured to determine quality indicators and transmit the indicators to the external controller, so that the processing burden of the controller is reduced).

Claims 4 and 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Sung in view of Abdelal (US 20140358526 A1).
As per claim 4, Sung teaches all of the limitations of claim 1 above.  
	Sung fails to specifically teach wherein the neural network is not supplied with a reference signal. 
However, Abdelal does teach wherein the neural network is not supplied with a reference signal (see Abdelal, Abstract, which notes a non-intrusive objective speech quality assessment is performed on a degraded speech signal; and see Abdelal [0007], which notes a reference-free model, also known as a "non-intrusive" or "single ended" model, depends on the latter degraded signal but does not require the availability of the original uncorrupted original speech signal).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Sung with bit/packet loss-based and signal energy level-based quality determination of an input signal as taught by Abdelal in order to avoid downgrading an input signal quality assessment based on missing bits/packets during intervals of silence in the speech conveyed by the input signal (see Abdelal [0024], which notes the overall joint score is adjusted in some embodiments based on other information. In one embodiment, packet loss is estimated based on received packet header information, e.g., packet size and/or packet rate, and detected gap durations, based on detected edges and measured signal energy level of the received signal being monitored in relation to energy level thresholds. An advantage of estimating packet loss, based on measured signal energy levels in accordance with a feature of some embodiments of the present invention, is that packet losses, which occur during silence intervals, which do not impact user perception of speech quality, will not be counted. Thus, this approach gives a better estimate of quality than an approach which uses an E-model to detect packet loss. The determined joint quality score is adjusted, based on estimated packet loss information, network level statistics and/or codec parameters, to determine a final overall quality score. Thus, the final overall quality score may, and in various embodiments does, depend on analysis of a variety of signal features, packet loss and/or the codec used to communicate the speech signal).


As per claim 7, Sung teaches all of the limitations of claim 1 above.  
	Sung fails to specifically teach comprising a user interface for inputting information as to one or more of the transmission standard, codec and fading data as to the supplied audio or video signal.
	However, Abdelal does teach comprising a user interface for inputting information (see Abdelal [0101], which notes the display 802 can be used to display, e.g., an image, signal energy profile graph and/or other generated signal processing results, etc., in accordance with the invention and for displaying one or more control screens which may display control information, e.g., user selected control parameters and information. The input device 804 includes, e.g., a keyboard, microphone, camera and/or other input device and can be used to provide input to the system 800. The user can, and in some embodiments does, input control parameters using the input device 804) as to one or more of the transmission standard, codec and fading data as to the supplied audio or video signal (see Abdelal [0016], which notes various exemplary embodiments are well suited for numerous applications including, e.g., non-intrusive voice quality monitoring, perceptional-based adaptive codec type/mode control, perceptional-based adaptive sender-bit rate control, and perceptional-based playout-buffer optimization; and see Abdelal [0106], which notes the stored data/information 820 in memory 810 includes, received control parameters, encoder parameters, codec parameters, e.g., received via input device 104).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Sung with bit/packet loss-based and signal energy level-based quality determination of an input signal as taught by Abdelal in order to avoid downgrading an input signal quality assessment based on missing bits/packets during intervals of silence in the speech conveyed by the input signal (see Abdelal 0024] The overall joint score is adjusted in some embodiments based on other information. In one embodiment, packet loss is estimated based on received packet header information, e.g., packet size and/or packet rate, and detected gap durations, based on detected edges and measured signal energy level of the received signal being monitored in relation to energy level thresholds. An advantage of estimating packet loss, based on measured signal energy levels in accordance with a feature of some embodiments of the present invention, is that packet losses, which occur during silence intervals, which do not impact user perception of speech quality, will not be counted. Thus, this approach gives a better estimate of quality than an approach which uses an E-model to detect packet loss. The determined joint quality score is adjusted, based on estimated packet loss information, network level statistics and/or codec parameters, to determine a final overall quality score. Thus, the final overall quality score may, and in various embodiments does, depend on analysis of a variety of signal features, packet loss and/or the codec used to communicate the speech signal).


As per claim 8 Sung teaches all of the limitations of claim 1 above.  
	Sung fails to specifically teach wherein the audio or video signal is a VoIP signal. 
However, Abdelal does teach wherein the audio or video signal is a VoIP signal (see Abdelal [0121], which notes some features of the methods of the present invention address one or more shortcomings in the previous system such as the ITU P.563 related to handling packet loss. Packet loss is the main source of quality degradation in VoIP networks. In the P.563 standard the packet loss is determined by counting sharp level drops in the signal, and then it applies the count toward the final MOS score output a the psycho-acoustic model. Clearly the approach used in the P.563 standard does not take in account consecutive packet loss as well as the codec robustness for packet losses. Such factors are considered for evaluating signal quality in various embodiments of the present invention and therefore the described features of various embodiments are both novel and better).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Sung with bit/packet loss-based and signal energy level-based quality determination of an input signal as taught by Abdelal in order to avoid downgrading an input signal quality assessment based on missing bits/packets during intervals of silence in the speech conveyed by the input (see Abdelal 0024] The overall joint score is adjusted in some embodiments based on other information. In one embodiment, packet loss is estimated based on received packet header information, e.g., packet size and/or packet rate, and detected gap durations, based on detected edges and measured signal energy level of the received signal being monitored in relation to energy level thresholds. An advantage of estimating packet loss, based on measured signal energy levels in accordance with a feature of some embodiments of the present invention, is that packet losses, which occur during silence intervals, which do not impact user perception of speech quality, will not be counted. Thus, this approach gives a better estimate of quality than an approach which uses an E-model to detect packet loss. The determined joint quality score is adjusted, based on estimated packet loss information, network level statistics and/or codec parameters, to determine a final overall quality score. Thus, the final overall quality score may, and in various embodiments does, depend on analysis of a variety of signal features, packet loss and/or the codec used to communicate the speech signal).
The combination of Sung with Abdelal includes predictable results, such as input signal quality assessments based on determinations of bit/packet loss while discounting such bit/packet losses that occur during intervals of silence in the speech conveyed by the input signal.
  
Claims 5, 9, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Sung in view of Kaniewska (US 20170270946 A1).


	Sung fails to specifically teach wherein the signal is a speech signal and the score signal represents the ITU P.800 value LQS. 
However, Kaniewska does teach wherein the supplied audio or video signal is a speech signal (see Kaniewska US 20170270946 [0243], which notes as tests showed, it can also predict with high accuracy the MOS-LQS of speech signals coded with either the Adaptive Multi-Rate NB (AMR-NB) codec or AMR-WB codec) and the score signal represents the ITU P.800 value LQS (see Kaniewska [0233], which notes most of the statistical models work best if they are trained on normalized input and output data. Therefore, in this implementation, not only the feature dimensions (as described above) were normalized during training, but also the desired target values MOS-LQS 216).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Sung with the statistical models as taught by Kaniewska in order in order to increase language independence and in order to provide reliable quality prediction in evaluation and comparison of artificial bandwidth extension (ABE) solutions (see Kaniewska [0243] The proposed measure advantageously does not need a phonetic transcription. Furthermore, the underlying statistical model can be trained on several languages to minimize language-dependency. The proposed measure can exhibit high linear correlation and rank correlation, as well as low Root Mean Square Error (RMSE) between MOS-LQO and MOS-LQS. Therefore, it can be used for reliable quality prediction in evaluation and comparison of ABE solutions. As tests showed, it can also predict with high accuracy the MOS-LQS of speech signals coded with either the Adaptive Multi-Rate NB (AMR-NB) codec or AMR-WB codec).
The combination of Sung with Kaniewska includes predictable results, such as providing reliable quality prediction in evaluation and comparison of ABE speech solutions.

As per independent claim 9, Sung teaches an apparatus for generating a score signal representing the quality of a speech signal supplied to the apparatus, the apparatus comprising:  
25- an input for supplying a speech signal (see Sung [0101], which notes with reference to FIG. 5, an audio signal encoding apparatus 510 to which a neural network is applied may receive an input audio signal), 
- a computing unit implementing a neural network, the computing unit being supplied with the speech signal (see Sung [0101], which notes an audio signal encoding apparatus 510 to which a neural network is applied may receive an input audio signal), and producing a score signal representing the quality of the speech signal supplied representing at least one predefined quality parameter of the speech signal (see Sung [0092], which notes a perceptual quality evaluation/scoring 406 may be performed by comparing the predicted audio spectrum Z/predefined quality parameter to the first audio signal y(n) or the frequency spectrum Y of the first audio).
Sung fails to specifically teach 30wherein the score signal represents the ITU P.8oo value LQS. 
(see Kaniewska US 20170270946 [0243], which notes as tests showed, it can also predict with high accuracy the MOS-LQS of speech signals coded with either the Adaptive Multi-Rate NB (AMR-NB) codec or AMR-WB codec; and see Kaniewska [0233], which notes most of the statistical models work best if they are trained on normalized input and output data. Therefore, in this implementation, not only the feature dimensions (as described above) were normalized during training, but also the desired target values MOS-LQS 216).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Sung with the statistical models as taught by Kaniewska in order in order to increase language independence and in order to provide reliable quality prediction in evaluation and comparison of artificial bandwidth extension (ABE) solutions (see Kaniewska [0243] The proposed measure advantageously does not need a phonetic transcription. Furthermore, the underlying statistical model can be trained on several languages to minimize language-dependency. The proposed measure can exhibit high linear correlation and rank correlation, as well as low Root Mean Square Error (RMSE) between MOS-LQO and MOS-LQS. Therefore, it can be used for reliable quality prediction in evaluation and comparison of ABE solutions. As tests showed, it can also predict with high accuracy the MOS-LQS of speech signals coded with either the Adaptive Multi-Rate NB (AMR-NB) codec or AMR-WB codec).
The combination of Sung with Kaniewska includes predictable results, such as providing reliable quality prediction in evaluation and comparison of ABE speech solutions.


- supplying a speech signal (see Sung [0101], which notes with reference to FIG. 5, an audio signal encoding apparatus 510 to which a neural network is applied may receive an input audio signal), 
- supplying a trained neural network the speech signal (see Sung [0101], which notes an audio signal encoding apparatus 510 to which a neural network is applied may receive an input audio signal), the neural network producing a score signal representing the quality of the speech signal supplied representing at least one predefined quality parameter of the speech signal (see Sung [0092], which notes a perceptual quality evaluation/scoring 406 may be performed by comparing the predicted audio spectrum Z/predefined quality parameter to the first audio signal y(n) or the frequency spectrum Y of the first audio), 
Sung fails to specifically teach 25wherein the score signal represents the ITU P.8oo value LQS.
However, Kaniewska does teach wherein the score signal represents the ITU P.8oo value LQS (see Kaniewska US 20170270946 [0243], which notes as tests showed, it can also predict with high accuracy the MOS-LQS of speech signals coded with either the Adaptive Multi-Rate NB (AMR-NB) codec or AMR-WB codec; and see Kaniewska [0233], which notes most of the statistical models work best if they are trained on normalized input and output data. Therefore, in this implementation, not only the feature dimensions (as described above) were normalized during training, but also the desired target values MOS-LQS 216).
(see Kaniewska [0243] The proposed measure advantageously does not need a phonetic transcription. Furthermore, the underlying statistical model can be trained on several languages to minimize language-dependency. The proposed measure can exhibit high linear correlation and rank correlation, as well as low Root Mean Square Error (RMSE) between MOS-LQO and MOS-LQS. Therefore, it can be used for reliable quality prediction in evaluation and comparison of ABE solutions. As tests showed, it can also predict with high accuracy the MOS-LQS of speech signals coded with either the Adaptive Multi-Rate NB (AMR-NB) codec or AMR-WB codec).
The combination of Sung with Kaniewska includes predictable results, such as providing reliable quality prediction in evaluation and comparison of ABE speech solutions.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Sung in view of Hollier (US  6119083 A) and in view of Jin (US 10380185 B2).
As per claim 6, Sung teaches all of the limitations of claim 1 above.  
	Sung fails to specifically teach wherein the neural network is obtained by the following supervised learning steps: - feeding an audio or video signal to the neural network, - feeding said audio or video signal to an objective analytical quality testing 10device, together with a reference signal, - comparing the score signals output by the neural network and the analytical 
	However, Hollier does teach wherein the neural network is obtained by the following supervised learning steps (see Hollier US 6119083 A col. 9, lines 24-29, which notes a good/reference signal from the store 8a, and its corresponding distorted version from the store 8b, are fed through respective first and second inputs 11, 12 to an analysis unit 9 which provides an output comprising a sequence of labels/supervised learning which are then transmitted to the neural net as shown in FIG. 1; and see Hollier col. 7, lines 31-37, which notes supervised learning is a form of training which involves presenting known examples of classes to the network and then modifying the interconnecting weights in order to minimise the difference between the desired and actual response of the system, where the training is repeated for many examples from each of the classes of inputs until the network reaches a steady state):
- feeding an audio or video signal to the neural network (see Hollier, col. 7, lines 42-49, which notes the system shown in FIGS. 1 and 5 comprise a source of training data 1 (FIG. 1) and a source of live speech traffic (real data) 2 (FIG. 5), both of which provide inputs to a (vocal tract or spectral) analyser 3, where parameters/audio or video signal associated with the training data are also supplied from the training data source 1 to a classification unit 5, which is shown as a trainable process, specifically in this embodiment a neural network 5; see Hollier col. 9, lines 29-33, which notes a distorted version/audio or video signal is sent to a segmenter 10, which divides the signal into individual segments (typically 20 milliseconds) corresponding to the labels, where such segments are then transmitted to the vocal tract analyser 3 (FIG. 1);  see Hollier col. 7, lines 50-51, which notes parameters/audio or video signal output by the (vocal tract or spectral) analyser 3 are fed to the neural network 5; see Hollier col. 6, lines 63-67, which notes spectral analysis may be used as an alternative to a vocal tract mode; and see Hollier col. 1, lines 17-19, which notes embodiments are described of application to audio signals carrying speech, and to video signals),  
- feeding said audio or video signal to an objective analytical quality testing 10device, together with a reference signal (see Hollier col. 9, lines 24-29, which notes the good/reference signal from the store 8a, and its corresponding distorted version/audio or video signal from the store 8b, are fed through respective first and second inputs 11, 12 to an analysis unit/objective analytical quality testing device 9, which provides an output comprising a sequence of labels which are then transmitted to the neural net as shown in FIG. 1).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Sung with the test-signal transmission over the system to be monitored as taught by Hollier in order to fine tune the system for actual operating conditions and in order to provide further training of the system to allow the system to evolve (see Hollier, col. 15, lines 36-46, which notes the arrangement of FIG. 8 shows a modification of the system of FIG. 7, in which the system is trainable by means of a test signal transmitted over the system to be monitored. This allows the trainable process to be fine-tuned for actual operating conditions, and also allows further training of the system to allow it to adapt as the monitored system evolves. FIG. 8 also illustrates a multiple classifier architecture for the monitoring system. Although illustrated for handling video signals, it will be apparent that both the on-line training and the multiple classifier architecture are also suitable for use with the audio embodiment).
The combination of Sung and Hollier includes predictable results, such as in situ training of neural nets.
The combination of Sung and Hollier fails to specifically teach comparing the score signals output by the neural network and the analytical quality testing device, respectively, and using the result of the comparison for training the neural network. 
However, Jin does teach comparing the score signals output by the neural network and the analytical quality testing device, respectively (see Jin, col. 34, lines 25-39, which notes: initially, non-neuromorphic and neuromorphic implementations of the analytical function may be performed at least partially in parallel with the same input data values being provided to both, and with the corresponding output data values of each being compared to test the degree of accuracy of the neural network in performing the analytical function), and using the result of the comparison for training the neural network (see Jin, col. 34, lines 39-50, which notes as the neural network demonstrates a degree of accuracy that at least meets a predetermined threshold, the testing may change such that the neuromorphic implementation is used, and priority is given to providing processing resources to it, while the non-neuromorphic implementation is used at least partially in parallel solely to provide output data values for further comparisons to corresponding ones provided by the neuromorphic implementation. Presuming that the neural network continues to demonstrate a degree of accuracy that meets or exceeds the predetermined threshold, further use of the non-neuromorphic implementation of the analytical function may cease, entirely).
(see Jin, col. 33, line 65—col. 34, line 2, which notes a predetermined condition that defines the completion of testing, such as a threshold of a characteristic of performance of the neural network having been found to have been met during testing; and see Jin, col 34., lines 7-24, which notes such a neural network may be part of an effort to transition from performing a particular analytical function using non-neuromorphic processing (i.e., processing in which a neural network is not used) to performing the same analytical function using neuromorphic processing (i.e., processing in which a neural network is used). Such a transition may represent a tradeoff in accuracy for speed, as the performance of the analytical function using neuromorphic processing may not achieve the perfect accuracy (or at least the degree of accuracy) that is possible via the performance of the analytical function using non-neuromorphic processing, but the performance of the analytical function using neuromorphic processing may be faster by one or more orders of magnitude, depending on whether the neural network is implemented with software-based simulations of its artificial neurons executed by one or more CPUs or GPUs, or hardware-based implementations of its artificial neurons provided by one or more neuromorphic devices).
The combination of Sung and Hollier with Jin includes predictable results, such as enhancing speed while achieving a predetermined level of accuracy.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK R HENNINGS whose telephone number is (571) 272-9676. The examiner can normally be reached on Monday-Friday 8:00 am-5:00 pm. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MARK HENNINGS/

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659