DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Introduction
This office action is in response to communications filed on 10/12/2020. Claims 1-20 are pending, and likewise Claims 1-20 have been examined.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/13/2020, 03/01/2021 and 06/16/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 6, 9, is/are rejected under 35 U.S.C. 102(2) as being anticipated by Reynolds et al. (WO 0022803 A1).
Regarding Claim 1:
Reynolds teaches a voice alignment method, comprising: obtaining an original voice and a test voice, wherein the test voice is a voice generated after the original voice is transmitted over a communications network(Abstract, Ln 1-2, assessing the performance of telecommunications systems by comparison of a reference signal with the same signal as degraded by the system under test. Pg 1, Ln 4, specifically a speech signal); 
performing loss detection and/or discontinuity detection on the test voice, wherein the loss detection is used to determine whether the test voice has a voice loss compared with the original voice, and the discontinuity detection is used to determine whether the test voice has voice discontinuity compared with the original voice(Pg 11, Ln 22-23, discontinuities in delay may be identified. Pg 11, Ln 26, This process estimates for a given pair (original/degraded) of speech files); 
and aligning the test voice with the original voice based on a result of the loss detection and/or the discontinuity detection, to obtain an aligned test voice and an aligned original voice(Pg 10, Ln 23-24, As a result the degraded utterance 61 can be aligned with the reference utterance. Pg 11, Ln 26-27, This process estimates for a given pair (original/degraded) of speech files any temporal delays present in the degraded file and the locations. Pg 12, Ln 8, aid the alignment process. Pg 15, Ln 14-20, processed as a whole to identify and cancel any constant delay. The signal is then divided in two and alignment performed on each section separately. Each section so aligned is recursively sub-divided until some pre-determined minimum duration is reached), 
wherein the result of the loss detection and/or the discontinuity detection is used to indicate a manner of aligning the test voice with the original voice(Pg 11, Ln 26-27, This process estimates for a given pair (original/degraded) of speech files any temporal delays present in the degraded file and the locations. See Fig 9, Element 806, 810-811, 818, 827, these shown that the alignment process changes according to the delays. Pg 15, Ln 14-20, processed as a whole to identify and cancel any constant delay. The signal is then divided in two and alignment performed on each section separately. Each section so aligned is recursively sub-divided until some pre-determined minimum duration is reached).

Regarding Claim 6:
Reynolds teaches the method according to claim 1, wherein the aligning the test voice with the original voice based on a result of the loss detection and/or the discontinuity detection further comprises: adding a third silent statement after an end time domain location of the test voice when the end time domain location of the test voice is before an end time domain location of the original voice(Pg 11, Ln 29-31, If the degraded signal is shorter than the original signal, zero padding is added to the end of the degraded signal), 
wherein duration of the third silent statement is equal to a time difference between the end time domain location of the test voice and the end time domain location of the original voice(Pg 10, Ln 20-25, The mode of the correlation function identifies the precise start point of the degraded utterance 61 . As a result the degraded utterance 61 can be aligned with the reference utterance 51 , allowing this portion to be processed through the rest of the analysis unit.  Pg 11, Ln 29-31, If the degraded signal is shorter than the original signal, zero padding is added to the end of the degraded signal); 
or deleting a fourth silent statement after an end time domain location of the test voice when the end time domain location of the test voice is after an end time domain location of the original voice(Pg 11, Ln 31-32, If the degraded signal is longer, its length is adjusted by truncating it), 
wherein duration of the fourth silent statement is equal to a time difference between the end time domain location of the test voice and the end time domain location of the original voice(Pg 10, Ln 20-25, The mode of the correlation function identifies the precise start point of the degraded utterance 61 . As a result the degraded utterance 61 can be aligned with the reference utterance 51 , allowing this portion to be processed through the rest of the analysis unit. Pg 11, Ln 31-32, If the degraded signal is longer, its length is adjusted by truncating it).

Regarding Claim 9:
Reynolds teaches a voice alignment apparatus, comprising: at least one processor; and a non-transitory computer-readable storage medium coupled to the at least one processor and storing programming instructions, which when executed by the at least one processor, cause the at least one processor to perform operations(Pg 11, Ln 1-5, stored in the memory 74 until called up by the central processor 72, which operates in accordance with the instructions carried in the program. Pg 10, Ln 31-34,  The operating instructions for controlling the computer may be supplied in machine-readable form on a carrier such as a magnetic disc or tape 70), 
the operations comprising: obtaining an original voice and a test voice, wherein the test voice is a voice generated after the original voice is transmitted over a communications network(Abstract, Ln 1-2, assessing the performance of telecommunications systems by comparison of a reference signal with the same signal as degraded by the system under test. Pg 1, Ln 4, specifically a speech signal); 
performing loss detection and/or discontinuity detection on the test voice, wherein the loss detection is used to determine whether the test voice has a voice loss compared with the original voice, and the discontinuity detection is used to determine whether the test voice has voice discontinuity compared with the original voice(Pg 11, Ln 22-23, discontinuities in delay may be identified. Pg 11, Ln 26, This process estimates for a given pair (original/degraded) of speech files); 
and aligning the test voice with the original voice based on a result of the loss detection and/or the discontinuity detection, to obtain an aligned test voice and an aligned original voice(Pg 10, Ln 23-24, As a result the degraded utterance 61 can be aligned with the reference utterance. Pg 11, Ln 26-27, This process estimates for a given pair (original/degraded) of speech files any temporal delays present in the degraded file and the locations. Pg 12, Ln 8, aid the alignment process. Pg 15, Ln 14-20, processed as a whole to identify and cancel any constant delay. The signal is then divided in two and alignment performed on each section separately. Each section so aligned is recursively sub-divided until some pre-determined minimum duration is reached), 
wherein the result of the loss detection and/or the discontinuity detection is used to indicate a manner of aligning the test voice with the original voice(Pg 11, Ln 26-27, This process estimates for a given pair (original/degraded) of speech files any temporal delays present in the degraded file and the locations. See Fig 9, Element 806, 810-811, 818, 827, these shown that the alignment process changes according to the delays. Pg 15, Ln 14-20, processed as a whole to identify and cancel any constant delay. The signal is then divided in two and alignment performed on each section separately. Each section so aligned is recursively sub-divided until some pre-determined minimum duration is reached).

Regarding Claim 14:
Claim 14 contains similar limitations as Claim 6 and is therefore rejected for the same reasons.

Regarding Claim 17:
Reynolds teaches a non-transitory computer-readable storage medium storing computer program code, which when run on a computer, causes the computer to perform operations(Pg 11, Ln 1-5, stored in the memory 74 until called up by the central processor 72, which operates in accordance with the instructions carried in the program. Pg 10, Ln 31-34,  The operating instructions for controlling the computer may be supplied in machine-readable form on a carrier such as a magnetic disc or tape 70) 
comprising: obtaining an original voice and a test voice, wherein the test voice is a voice generated after the original voice is transmitted over a communications network(Abstract, Ln 1-2, assessing the performance of telecommunications systems by comparison of a reference signal with the same signal as degraded by the system under test. Pg 1, Ln 4, specifically a speech signal); 
performing loss detection or discontinuity detection on the test voice, wherein the loss detection is used to determine whether the test voice has a voice loss compared with the original voice, and the discontinuity detection is used to determine whether the test voice has voice discontinuity compared with the original voice(Pg 11, Ln 22-23, discontinuities in delay may be identified. Pg 11, Ln 26, This process estimates for a given pair (original/degraded) of speech files); 
and aligning the test voice with the original voice based on a result of the loss detection or the discontinuity detection, to obtain an aligned test voice and an aligned original voice(Pg 10, Ln 23-24, As a result the degraded utterance 61 can be aligned with the reference utterance. Pg 11, Ln 26-27, This process estimates for a given pair (original/degraded) of speech files any temporal delays present in the degraded file and the locations. Pg 12, Ln 8, aid the alignment process. Pg 15, Ln 14-20, processed as a whole to identify and cancel any constant delay. The signal is then divided in two and alignment performed on each section separately. Each section so aligned is recursively sub-divided until some pre-determined minimum duration is reached), 
wherein the result of the loss detection or the discontinuity detection is used to indicate a manner of aligning the test voice with the original voice(Pg 11, Ln 26-27, This process estimates for a given pair (original/degraded) of speech files any temporal delays present in the degraded file and the locations. See Fig 9, Element 806, 810-811, 818, 827, these shown that the alignment process changes according to the delays. Pg 15, Ln 14-20, processed as a whole to identify and cancel any constant delay. The signal is then divided in two and alignment performed on each section separately. Each section so aligned is recursively sub-divided until some pre-determined minimum duration is reached).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Reynolds as applied to claim 1 above, and further in view of Hines et al. “ViSQOL: an objective speech quality model” hereinafter Hines.

Regarding Claim 7:
Reynolds teaches the method according to claim 1, but does not teach wherein before the aligning the test voice with the original voice based on a result of the loss detection and/or the discontinuity detection, the method further comprises: detecting the test voice based on a preset abnormal voice detection model, to determine whether the test voice is an abnormal voice, wherein the preset abnormal voice detection model is a non-machine learning model, content detected by the non-machine learning model is different from content detected by the loss detection, and/or content detected by the non-machine learning model is different from content detected by the discontinuity detection.
In the same field of Intrusive Signal Quality Estimation, Hines teaches wherein before the aligning the test voice with the original voice based on a result of the loss detection and/or the discontinuity detection, the method further comprises: detecting the test voice based on a preset abnormal voice detection model, to determine whether the test voice is an abnormal voice(Pg 6, 4.1 Pre-Processing, Ln 1-2, The pre-processing stage scales the degraded signal y(t), to match the power level of the reference signal x(t), Ln 12-14, The test spectrograms are floored to the minimum value in the reference spectrogram to level the signals with a 0-dB reference. Test voice is determined abnormal as it is scaled to match reference in volume. Pg 4, See Fig 1, signal leveling is before time alignment), 
wherein the preset abnormal voice detection model is a non-machine learning model(Pg 6, 4.1 Pre-Processing, Ln 1-2, The pre-processing stage scales the degraded signal y(t), to match the power level of the reference signal x(t), Ln 3-4, Short-term Fourier transform (STFT) spectrogram, Ln 12-14, The test spectrograms are floored to the minimum value in the reference spectrogram to level the signals with a 0-dB reference), 
content detected by the non-machine learning model is different from content detected by the loss detection, and/or content detected by the non-machine learning model is different from content detected by the discontinuity detection(Pg 6, 4.1 Pre-Processing, Ln 1-2, The pre-processing stage scales the degraded signal y(t), to match the power level of the reference signal x(t)).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Reynolds with the pre-processing of Hines, as it assists in improving performance(Pg 17, Conclusion, Ln 1-5).

Regarding Claim 15:
Reynolds teaches the apparatus according to claim 9, but does not teach wherein before aligning the test voice with the original voice based on the result of the loss detection and/or the discontinuity detection, the operations further comprise: detecting the original voice and the test voice based on a preset abnormal voice detection model, to determine whether the test voice is an abnormal voice, wherein the preset abnormal voice detection model is a non-machine learning model, content detected by the non-machine learning model is different from content detected by the loss detection, and/or content detected by the non-machine learning model is different from content detected by the discontinuity detection.
In the same field of Intrusive Signal Quality Estimation, Hines teaches wherein before aligning the test voice with the original voice based on the result of the loss detection and/or the discontinuity detection, the operations further comprise: detecting the original voice and the test voice based on a preset abnormal voice detection model, to determine whether the test voice is an abnormal voice(Pg 6, 4.1 Pre-Processing, Ln 1-2, The pre-processing stage scales the degraded signal y(t), to match the power level of the reference signal x(t), Ln 12-14, The test spectrograms are floored to the minimum value in the reference spectrogram to level the signals with a 0-dB reference. Test voice is determined abnormal as it is scaled to match reference in volume. Pg 4, See Fig 1, signal leveling is before time alignment), 
wherein the preset abnormal voice detection model is a non-machine learning model(Pg 6, 4.1 Pre-Processing, Ln 1-2, The pre-processing stage scales the degraded signal y(t), to match the power level of the reference signal x(t), Ln 3-4, Short-term Fourier transform (STFT) spectrogram, Ln 12-14, The test spectrograms are floored to the minimum value in the reference spectrogram to level the signals with a 0-dB reference), 
content detected by the non-machine learning model is different from content detected by the loss detection, and/or content detected by the non-machine learning model is different from content detected by the discontinuity detection(Pg 6, 4.1 Pre-Processing, Ln 1-2, The pre-processing stage scales the degraded signal y(t), to match the power level of the reference signal x(t)).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Reynolds with the pre-processing of Hines, as it assists in improving performance(Pg 17, Conclusion, Ln 1-5).

Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Reynolds as applied to claim 1 above, and further in view of Nunes et al. “Degradation Type Classifier for Full Band Speech Contaminated With Echo, Broadband Noise, and Reverberation”, hereinafter Nunes.

Regarding Claim 8:
Reynolds teaches the method according to claim 1, but does not teach further comprising: detecting the aligned test voice based on a machine learning model and the aligned original voice, to determine whether the aligned test voice is an abnormal voice, or determine an abnormal type of the aligned test voice.
In the same field of Intrusive Signal Quality Estimation, Nunes teaches further comprising: detecting the aligned test voice based on a machine learning model and the aligned original voice(Pg 2517, Col 1, Para 4, Ln 1-6, The introduction of a set of features specially designed to extract relevant information for degradation identification from known reference and degraded signals. 2) The description of two systems based on machine-learning algorithms which are able to identify the degradation types using that feature set), 
to determine whether the aligned test voice is an abnormal voice, or determine an abnormal type of the aligned test voice(Abstract, Ln 1-2, identifying impairment types that might be present in a speech signal. Pg 2525, Ln 1-3, it should be noted that the degradation type classifier presented in this paper requires that reference and degraded signal are aligned in time and power).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Reynolds, with the degradation type classifier of Nunes, as it improves QA(P 2524, Conclusion, Ln 23-27).

Regarding Claim 16:
Claim 16 contains similar limitations as Claim 8, and is therefore rejected for the same reasons.

Allowable Subject Matter
Claims 2-5, 10-13 and 18-20 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter:


Regarding Claim 2:
Reynolds teaches the method according to claim 1, wherein the original voice comprises a first original statement, the test voice comprises a first test statement, and the first original statement corresponds to the first test statement(Pg 4, Ln 12-16,  means for identifying individual utterances……….The apparatus preferably includes means for synchronising each section in the distorted signal with the corresponding section in the test signal); 
but does not teach and the aligning the test voice with the original voice based on a result of the loss detection and/or the discontinuity detection comprises: inserting a first silent statement before a start time domain location of the first test statement when the test voice has no voice loss and/or voice discontinuity, and the start time domain location of the first test statement is before a start time domain location of the first original statement, wherein duration of the first silent statement is equal to a time difference between the start time domain location of the first test statement and the start time domain location of the first original statement; or deleting a second silent statement before a start time domain location of the first test statement when the test voice has no voice loss and/or voice discontinuity, and the start time domain location of the first test statement is after a start time domain location of the first original statement, wherein duration of the second silent statement is equal to a time difference between the start time domain location of the first test statement and the start time domain location of the first original statement.

	Regarding Claim 10:
Claim 10 contains similar limitations as Claim 2, and therefore contains allowable subject matter for the same reasons.

Regarding Claim 18:
Reynolds teaches the non-transitory computer readable storage medium according to claim 17, wherein the original voice comprises a first original statement, the test voice comprises a first test statement, and the first original statement corresponds to the first test statement(Pg 4, Ln 12-16,  means for identifying individual utterances……….The apparatus preferably includes means for synchronising each section in the distorted signal with the corresponding section in the test signal); 
but does not teach and the aligning the test voice with the original voice based on a result of the loss detection or the discontinuity detection comprises: inserting a first silent statement before a start time domain location of the first test statement when the test voice has no voice loss or voice discontinuity, and the start time domain location of the first test statement is before a start time domain location of the first original statement, wherein duration of the first silent statement is equal to a time difference between the start time domain location of the first test statement and the start time domain location of the first original statement; or deleting a second silent statement before a start time domain location of the first test statement when the test voice has no voice loss or voice discontinuity, and the start time domain location of the first test statement is after a start time domain location of the first original statement, wherein duration of the second silent statement is equal to a time difference between the start time domain location of the first test statement and the start time domain location of the first original statement.

Claims 3-5, 11-13 and 19-20 depend from claims containing allowable subject matter, therefor also contain allowable subject matter.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Lundberg et al. (US 6499009 B1)
Intrusive speech quality estimation with alignment of individual parts of utterance.
Beerends et al. (US 20170117006 A1)
Intrusive speech quality estimation.
	Skoglund v(US 20150199959 A1)
Intrusive speech quality estimation.
	Beerends et al. (US 20120143601 A1)
Intrusive speech quality estimation.
	Beerends et al. (US 20100106489 A1)
Intrusive speech quality estimation.
	Keyhl et al. (KR 20090045941 A)
Audio Signal Alignment.
	Berstein et al. (US 7197010 B1)
Intrusive speech quality estimation.
	Ps et al. (US 20050216260 A1)
Intrusive speech quality estimation with volume leveling before time alignment.
	Hollier et al. (US 6389111 B1)
Intrusive speech quality estimation.
	Harlander et al. “Sound Quality Assessment Using Auditory Models”
Intrusive speech quality estimation.
Avila et al. “Performance comparison of intrusive and non-intrusive instrumental quality measures for enhanced speech”
Intrusive speech quality estimation.
Pocta et al. “Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications”
Intrusive speech quality estimation.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER G MARLOW whose telephone number is (571)272-4536. The examiner can normally be reached Monday - Thursday 10:00 am - 8:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richmond Dorvil can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALEXANDER G MARLOW/Assistant Examiner, Art Unit 2658                                                                                                                                                                                                        

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658