Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-29 are pending.  Claims 1, 26, and 28-29 are independent.  Claims 1 and 28-29 are directed to system, method, and CRM claims for a time scaler.  Claims 26-27 are directed to a jitter buffer including the time scaler of Claim 1.
This Application is published as 2021/0233553.
Priority to provisionals 6/21/2013 and 5/5/2014.

This Application is a continuation of 16/243,006 issued as U.S. 10,984,817 which is a continuation of 14/977,507 issued as U.S. 10,204,640.  
Terminal Disclaimers are required over the terms of:
U.S. 10,204,640
 U.S. 10,984,817
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Independent Claims 1, 26, and 28-29 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 27, and 29-31 of U.S. Patent No. 10,984,817 and Claims 1 and 28-29 are rejected on the ground of nonstatutory double patenting as being unpatentable over\ claims 5-7 of U.S. Patent No. 10,204,640 as shown below. Although the claims at issue are not identical, they are not patentably distinct from each other because of the following mapping:
Instant Application
Reference Patent 10,204,640
1. A time scaler for providing a time scaled version of an input audio signal,
5. A time scaler for providing a time scaled version of an input audio signal, 
wherein the time scaler is configured to compute or estimate a quality of a time scaled version of the input audio signal acquirable by a time scaling of the input audio signal, and 
wherein the time scaler is configured to compute or estimate a quality of a time scaled version of the input audio signal acquirable by a time scaling of the input audio signal, and 
wherein the time scaler is configured to perform the time scaling of the input audio signal in dependence on the computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling. 
wherein the time scaler comprises a quality determinator block configured to perform the time scaling of the input audio signal in dependence on the computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling; 

wherein the time scaler comprises a time scaling performer block configured to time-shift a second block of samples with respect to a first block of samples, and to overlap-and-add the first block of samples and the time-shifted second block of samples, to thereby acquire the time-scaled version of the input audio signal, if the computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling indicates a quality which is larger than or equal to a quality threshold value; and 

wherein the time scaler is configured to determine a time shift of the second block of samples with respect to the first block of samples in dependence on a determination of a level of similarity, evaluated using a first similarity measure, between the first block of samples, or a portion of the first block of samples, and the second block of samples, or a portion of the second block of samples; and

 wherein the time scaler is configured to compute or estimate a quality of the time scaled version of the input audio signal acquirable by a time scaling of the input audio signal on the basis of an information about the level of similarity, evaluated using a second similarity measure, between the first block of samples, or a portion of the first block of samples, and the second block of samples, time-shifted by the determined time shift, or a portion of the second block of samples, time-shifted by the determined time shift;
 


wherein the first similarity measure is a cross correlation or a normalized cross correlation, or an average magnitude difference function or a sum of squared errors, and 
wherein the second similarity measure is a combination of a cross correlations or of normalized cross correlations for a plurality of different time shifts; or 
wherein the second similarity measure is a combination of cross correlations for at least four different time shifts, 
 

wherein the time scaler is implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.




Instant Application
Reference Patent 10,984,817
1. A time scaler for providing a time scaled version of an input audio signal, 
1. A time scaler for providing a time scaled version of an input audio signal,
wherein the time scaler is configured to compute or estimate a quality of a time scaled version of the input audio signal acquirable by a time scaling of the input audio signal, and 

wherein the time scaler is configured to compute or estimate a quality of a time scaled version of the input audio signal acquirable by a time scaling of the input audio signal, and 
wherein the time scaler is configured to perform the time scaling of the input audio signal in dependence on the computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling. 
wherein the time scaler comprises a quality determinator block configured to perform the time scaling of the input audio signal in dependence on the computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling; 

wherein the time scaler comprises a time scaling performer block configured to time-shift a second block of samples with respect to a first block of samples, and to overlap-and-add the first block of samples and the time-shifted second block of samples, to thereby acquire the time-scaled version of the input audio signal, if the computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling indicates a quality which is larger than or equal to a quality threshold value; and 

wherein the time scaler is configured to determine a time shift of the second block of samples with respect to the first block of samples in dependence on a determination of a level of similarity, evaluated using a computation of a first similarity measure, between the first block of samples, or a portion of the first block of samples, and the second block of samples, or a portion of the second block of samples, 

wherein the determined time shift is an information describing a position of highest similarity; and 

 wherein the time scaler is configured to compute or estimate a quality of the time scaled version of the input audio signal acquirable by a time scaling of the input audio signal on the basis of an information about the level of similarity, evaluated using a second similarity measure, between the first block of samples, or a portion of the first block of samples, and the second block of samples, time-shifted by the determined time shift, or a portion of the second block of samples, time-shifted by the determined time shift;
 


wherein the second similarity measure is different from the first similarity measure.




Claims 1 and 28-29 are directed to system, method, and CRM claims for a time scaler.  Claims 26-27 are directed to a jitter buffer including the time scaler of Claim 1.
The cited claims of the reference issued patents are the counterparts of the system claims of these patents as provided above.

(Claims 1 and 9 of the instant Application together form most but not the key last limitation of claim 1 of U.S. 10,984,817 stating: “wherein the second similarity measure is different from the first similarity measure.”  Most of the remainder of Claims of the instant Application are the same as the claims of U.S. 10,984,817 with a different Claim number.)
35 U.S.C. 112(f) Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 
The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. 
Such claim limitation(s) is/are: “Time Scaler” in Claims 1-29 and “Jitter Buffer Control” in Claims 26-27.  These limitations are generic in the context of the art and don’t refer to any specific structure and only serve as placeholders for the structure that performs the associated function(s) without providing any information about what that structure is. MPEP 2181 I A says:
For a term to be considered a substitute for "means," and lack sufficient structure for performing the function, it must serve as a generic placeholder and thus not limit the scope of the claim to any specific manner or structure for performing the claimed function. It is important to remember that there are no absolutes in the determination of terms used as a substitute for "means" that serve as generic placeholders. The examiner must carefully consider the term in light of the specification and the commonly accepted meaning in the technological art. Every application will turn on its own facts.

“Time Scaler” is entirely claimed in terms of its functions and no structure is provided in the CLAIM LANGUAGE for this component.

Based on the ordinary skill in the art and description of functions of these components in the Specification, they refer to processors or a combination of processor and memory and the claim of the parent application included:  “wherein the time scaler is implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.”
See:
[0203] …  Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
[0205] Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Published Application.

PLEASE NOTE: This is NOT a rejection. Please don’t address it as a rejection. If the Applicant does not agree with the INTERPRETATION, he may argue or amend to replace the terms interpreted under 112(f) with structural terms such as “memory: and “microprocessor” as appropriately supported by the Specification. In the alternative, he may let the interpretation stand if the intent was to include a means plus function limitation in the Claim.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.



Claims 9-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 9 has antecedent basis issues and Claims 10-14 depend from 9 and inherit the indefiniteness.  Claim 1 is provided for reference to phrases that have antecedence in a Claim from which 9 depends.

1. A time scaler for providing a time scaled version of an input audio signal, 
wherein the time scaler is configured to compute or estimate a quality of a time scaled version of the input audio signal acquirable by a time scaling of the input audio signal, and 
wherein the time scaler is configured to perform the time scaling of the input audio signal in dependence on the computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling. 

9. The time scaler according to claim 1, 
wherein the time scaler is configured to time-shift a second block of samples with respect to a first block of samples, and to overlap-and-add the first block of samples and [[the]] a time-shifted second block of samples, to thereby acquire the time-scaled version of the input audio signal, if [[the]] computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling indicates a quality which is larger than or equal to a quality threshold value; and 
wherein the time scaler is configured to determine a time shift of the second block of samples with respect to the first block of samples in dependence on a determination of a level of similarity, evaluated using a first similarity measure, between the first block of samples, or a portion of the first block of samples, and the second block of samples, or a portion of the second block of samples; and 
wherein the time scaler is configured to compute or estimate [[a]] the quality of the time scaled version of the input audio signal acquirable by a time scaling of the input audio signal on the basis of an information about the level of similarity, evaluated using a second similarity measure, between the first block of samples, or a portion of the first block of samples, and the second block of samples, time-shifted by the determined time shift, or a portion of the second block of samples, time-shifted by the determined time shift. 

	Are there two different “quality” values? Call them “first quality value” and “second quality value.” 

(Note the Conclusion section.)
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-8, 15-17, and 28-29 are rejected as under 35 U.S.C. 1012(a)(1) as being anticipated by Florencio (U.S. 20050055204).
Regarding Claim 1, Florencio teaches:
1. A time scaler for providing a time scaled version of an input audio signal, [Florencio, Figure 2, “stretched/compressed frames 245.”  “[0010] Therefore, what is needed is a system and method that provides high quality time scale modification of audio signals containing speech and other audio….”] 
wherein the time scaler is configured to compute or estimate a quality of a time scaled version of the input audio signal acquirable by a time scaling of the input audio signal, and [Florencio, Figure 2, “signal input module 200” provides the signals to “frame extraction module 205” and frames are provided to “pitch estimation module 210.”  “Quality” is taught by “pitch” of the signal.] 
wherein the time scaler is configured to perform the time scaling of the input audio signal in dependence on the computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling. [Florencio, Figure 2, the “time scaling 245” including “stretching” or “compression” is performed according to “segment type 215” which is based on the “pitch estimation 210.”]  

Regarding Claim 2, Florencio teaches:
2. The time scaler according to claim 1, 
wherein the time scaler is configured to perform an overlap-and-add operation using a first block of samples of the input audio signal and a second block of samples of the input audio signal, [Florencio, Figure 3, “slide by p samples, overlap and add 350.”]
wherein the time scaler is configured to time-shift the second block of samples with respect to the first block of samples, and to overlap-and-add the first block of samples and the time-shifted second block of samples, to thereby acquire the time-scaled version of the input audio signal. [Florencio, Figure 3, “slide by p samples, overlap and add 350.”  The sliding by p samples teaches the “time-shift” of the block of samples that is claimed.]

Regarding Claim 3, Florencio teaches:
3. The time-scaler according to claim 2, 
wherein the time scaler is configured to compute or estimate a quality of the overlap-and-add operation between the first block of samples and the time-shifted second block of samples, in order to compute or estimate the quality of the time scaled version of the input audio signal acquirable by the time scaling. [Florencio, Figure 2, “stretching / compression at target ratio? 255” and Figure 3, “Desired Size Reached? 360” evaluate the “quality” of the overlap add operation at 350.]

Regarding Claim 4, Florencio teaches:
4. The time scaler according to claim 2, 
wherein the time scaler is configured to determine the time shift of the second block of samples with respect to the first block of samples in dependence on a determination of a level of similarity between the first block of samples, or a portion of the first block of samples, and the second block of samples, or a portion of the second block of samples. [Florencio, Figure 3, the time shift of the Claim is taught by sliding by p samples at 350 in Figure 3.  Th number of samples p is the pitch estimate obtained at “estimate pitch p at template locations s[i] 340.”  Pitch is a measure of “level of similarity” that is Claimed.  Pitch is the fundamental frequency or period of the signal which shows when the samples of the signal begin to repeat (measure of similarity between samples).]

Regarding Claim 5, Florencio teaches:
5. The time scaler according to claim 4, 
wherein the time scaler is configured to determine an information about a level of similarity between the first block of samples, or a portion of the first block of samples, and the second block of samples, or a portion of the second block of samples, for a plurality of different time shifts between the first block of samples and the second block of samples, and to determine a time shift to be used for the overlap-and-add operation on the basis of the information about the level of similarity for the plurality of different time shifts. [Florencio, Figure 3, estimate pitch p at 340 yields the level of similarity between blocks of samples and slide/shift by p samples before overlap and add at 350 teaches the “time shift to be used for the overlap-and-add operation” which is based on the pitch p / “level of similarity”.]

Regarding Claim 6, Florencio teaches:
6. The time scaler according to claim 4, 
wherein the time scaler is configured to determine the time shift of the second block of samples with respect to the first block of samples, which time shift is to be used for the overlap-and-add operation, in dependence on a target time shift information. [Florencio, Figure 3, the “target time shift information” is taught by the p=pitch estimate for each sub-segment.]

Regarding Claim 7, Florencio teaches:
7. The time scaler according to claim 4, 
wherein the time scaler is configured to compute or estimate a quality of the time scaled version of the input audio signal acquirable by a time scaling of the input audio signal on the basis of an information about the level of similarity between the first block of samples, or a portion of the first block of samples, and the second block of samples, time shifted by the determined time shift, or a portion of the second block of samples, time-shifted by the determined time shift. [Florencio, Figure 2, “stretching / compression at target ratio? 255” and Figure 3, “Desired Size Reached? 360” evaluate the “quality” of the overlap add operation.  The “target ratio” is the measure of quality.  These steps occur after the “slide by p samples, overlap and add 350” which is based on “estimate pitch p at template locations s[i] 340.”  Because pitch/ level of similarity of blocks of samples goes into the calculation of the time-scaled signal, the Quality (desired size 360 or target ratio 255) depends on the estimated pitch/ level of similarity.]  [Note: Florencio, Figure 2, teaches that segment type detection 215 divides the frame, according to its segment types, as voiced, unvoiced, or mixed, which is a function of periodicity of the segment, i.e. “level of similarity between the first block of samples … and the second block of samples, time-shifted.”  Depending on this type/similarity a particular stretching or compressing module 225, 220, 230, 240 is selected. Thus, Florencio teaches that the type of time scaling must be adjusted according to the “level of similarity” / pitch information.]

Regarding Claim 8, Florencio teaches:
8. The time scaler according to claim 7, 
wherein the time scaler is configured to decide, on the basis of the information about the level of similarity between the first block of samples, or a portion of the first block of samples, and the second block of samples, time-shifted by the determined time shift, or a portion of the second block of samples, time-shifted by the determined time shift, whether a time scaling is actually performed. [Florencio, Figures 2 and 3. “Level of similarity” is taught by Pitch p which is used to slide the samples for overlap-add operation of time scaling.  Figure 2, “stretching/compression at target ratio? 255” teaches whether a time scaling is actually performed.  Target ratio has to be different from 1.]

Regarding Claim 15, Florencio teaches:
15. The time scaler according to claim 1, wherein the time scaler is configured to compare a quality value, which is based on a computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling, with a variable threshold value, to decide whether a time scaling should be performed or not. [Florencio is directed to an “adaptive temporal time scaler” which uses a variable threshold value to determine if the time scaling was done to a satisfactory degree:  “[0012] To address this need for high quality audio stretching and compression, an adaptive "temporal audio scaler" is provided for automatically stretching and compressing frames (or segments) of audio signals….”  Here the “quality value” is mapped to the target number of samples with is taught to be variable at [0014].  (Florencio keeps the target ratio constant.)  “…Further, the amount of stretching and compression applied to particular segments is automatically variable for minimizing signal artifacts while still ensuring that an overall target stretching or compression ratio is maintained for each frame.”  Abstract.]

Regarding Claim 16, Florencio teaches:
16. The time scaler according to claim 15, wherein the time scaler is configured to reduce the variable threshold value, to thereby reduce a quality requirement, in response to a finding that a quality of a time scaling would have been insufficient for one or more previous blocks of samples. [Florencio changes the target number of samples according to the previous compression/expansion: “[0014] For example, if a target compression ratio is 2:1 for a particular signal, and each input speech frame has 300 samples, each target output frame will nominally have 150 samples. However, if a particular frame is compressed to 180 samples instead of 150 samples, for example, then the extra 30 samples are compensated for in the next frame by setting its target compression to 120 samples….”  The target/threshold value is reduced from 150 samples to 120 samples.]

Regarding Claim 17, Florencio teaches:
17. The time scaler according to claim 15, wherein the time scaler is configured to increase the variable threshold value, to thereby increase a quality requirement, in response to the fact that a time scaling has been applied to one or more previous blocks of samples. [Florencio, the “variable threshold value” is taught by the variable number of samples per frame: 180 samples/frame to 130 to 140:  “[0015] …  For example, using the above example, if the frame following the frame that was compressed to 180 samples is compressed to 130 samples, then the target compression for the next frame have a target compression of 140 samples to provide an average of 150 samples over the three frames. Through use of this carry over technique any desired compression (or stretching) ratio is maintained, while keeping only a loose requirement on the length of any particular output frame.”]

Regarding Claim 20, Florencio teaches:
20. The time scaler according to claim 1, 
wherein the time scaler is configured to perform the time scaling of the input audio signal in dependence on the computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling, [Florencio, Figure 2, the different types of time scaling (stretching or compression) is performed by modules 230, 225, 220, and 240 depending on the result of the “segment type detection module 215.”  Voiced and Unvoiced and Mixed are the types of frame segments.  Different types of time scaling is applied to each type to generate the desired quality.] (Note what is the “quality” here?)
wherein the computation or estimation of the quality of the time scaled version of the input audio signal comprises an computation or estimation of artifacts in the time scaled version of the input audio signal which would be caused by a time scaling. [Florencio, Figure 2, the goal of time scaling is to minimize the artifacts.  “…  Further, the amount of stretching and compression applied to particular segments is automatically variable for minimizing signal artifacts while still ensuring that an overall target stretching or compression ratio is maintained for each frame.”  Abstract.]

Regarding Claim 21, Florencio teaches:
21. The time scaler according to claim 20, wherein the computation or estimation of the quality of the time scaled version of the input audio signal comprises an computation or estimation of artifacts in the time scaled version of the input audio signal which would be caused by an overlap-and-add operation of subsequent blocks of samples of the input audio signal. [Florencio, Figure 3, the scaled size is obtained from the overlap-add operation of 350 and therefore any artifacts in the output are “caused” by the overlap-add operation.]

Regarding Claim 22, Florencio teaches:
22. The time scaler according to claim 1, wherein the time scaler is configured to compute or estimate the quality of a time scaled version of the input audio signal acquirable by a time scaling of the input audio signal in dependence on a level of similarity of subsequent blocks of samples of the input audio signal. [Florencio, Figure 3, the time-scaled signal is obtained from the overlap-add operation of 350 which is done after sliding the window by p samples, and p is pitch estimate of a sub-segment which is a measure of similarity, and therefore any artifacts in the output depend on the p / level of similarity of subsequent blocks.]

Regarding Claim 23, Florencio teaches:
23. The time scaler according to claim 1, wherein the time scaler is configured to compute or estimate whether there are audible artifacts in a time scaled version of the input audio signal acquirable by a time scaling of the input audio signal. [Florencio teaches:   “…  Further, the amount of stretching and compression applied to particular segments is automatically variable for minimizing signal artifacts while still ensuring that an overall target stretching or compression ratio is maintained for each frame.”  Abstract.  “[0141] 3.2.3 Selection of Segments to Stretch:”  If stretching doesn’t help with the audio artifacts the segments is not stretched.  “[0040] FIG. 7 illustrates an exemplary system flow diagram for selection of segment origin points for minimizing audible changes resulting from stretching of an audio signal.”  The Claim does not provide a mechanism (discrete steps) for the “estimate …” and the support in the Specification indicates that this “estimation” is indeed a prediction the same way that Florencio “estimates” ahead of time whether or not to stretch or compress a segment. ]  (Supporting Specification:  “[0076] … In other words, the time scaler may be configured to compute or estimate the (expected) quality of the time scaled version of the input audio signal obtainable by time scaling of the input audio signal before the time scaling of the input audio signal is actually executed. For this purpose, the time scaler may, for example, compare portions of the input audio signal which are involved in the time scaling operation (for example, in that said portions of the input audio signal are to be overlapped and added to thereby perform the time scaling). ….”)

Regarding Claim 24, Florencio teaches:
24. The time scaler according to claim 1, wherein the time scaler is configured to postpone a time scaling to a subsequent frame or to a subsequent block of samples if the computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling indicates an insufficient quality.  [Florencio teaches that it performs time scaling when time scaling is required to improve the quality of the decoded audio.  If and when time scaling does not lead to improvement in the audio it won’t be done.]

Regarding Claim 25, Florencio teaches:
25. The time scaler according to claim 1, wherein the time scaler is configured to postpone a time scaling to a time when the time scaling is less audible if the computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling indicates an insufficient quality. [Florencio, this Claim paraphrases Claim 24 and is rejected under the same rationale.  Subsequent frames or blocks of samples occur in later times because of the sequential nature of receiving audio.]

Claim 28 is a method claim with limitations of system Claim 1 and is rejected under similar rationale.

Claim 29 is a CRM machine or manufacture claim with limitations of system Claim 1 and is rejected under similar rationale.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 9-10 are rejected as under 35 U.S.C. 103 as being unpatentable over Florencio in view of Lee (Sungjoo Lee, et al.: “Variable time-scale modification of speech using transient information,” IEEE International Conference on Acoustics, Speech, And Signal Processing, 1997. ICASSP-97, Munich, Germany 21-24 April 1997, Los Alamitos, CA, USA, IEEE Comput. Soc; US, vol. 2, 21 April 1997 (1997-04-21),pages 1319-1322).
Lee separates the transient parts of speech from the steady portions and applies time scaling only to the steady portions and then overlap-adds the time-scaled portions (compressed or expanded) with the transient portions.
Cited Florencio divides a frame into segments and determined whether each segment is voiced (steady), unvoiced (transient), or mixed and treats the time-scaling of each type of segment differently.  Florencio also considers the level of energy of the frame.
Instant Application similarly treats frames of different levels of energy and periodicity differently.
Regarding Claim 9, Florencio teaches:
9. The time scaler according to claim 1, 
wherein the time scaler is configured to time-shift a second block of samples with respect to a first block of samples, and to overlap-and-add the first block of samples and the time-shifted second block of samples, to thereby acquire the time-scaled version of the input audio signal, if the computation or estimation of the quality of the time scaled version of the input audio signal acquirable by the time scaling indicates a quality which is larger than or equal to a quality threshold value; and [Florencio, Figure 3, blocks of samples are shifted (slide at 350) and then added by overlap add (at 350).  The amount of shift is determined by p which is the pitch period of segment (blocks of samples) of the frame that is subject to time scaling.  “[0096] When stretching voiced segments in a frame, a windowed overlap-add (SOLA) approach is used for aligning and merging matching portions of the segment. …”   The quality is checked at 360: “desired size reached? “M=Desired Final Segment Size.”]
wherein the time scaler is configured to determine a time shift of the second block of samples with respect to the first block of samples in dependence on a determination of a level of similarity, evaluated using a first similarity measure, between the first block of samples, or a portion of the first block of samples, and the second block of samples, or a portion of the second block of samples; and [Florencio, Figure 3, the amount of shift/slide is determined according to Pitch p and pitch is a similarity measure between blocks of samples.  See 350:  Slide/shift by p samples.  The method predicts what amount of shift/slide would give a desired size (360).]
wherein the time scaler is configured to compute or estimate a quality of the time scaled version of the input audio signal acquirable by a time scaling of the input audio signal on the basis of an information about the level of similarity, evaluated using a second similarity measure, between the first block of samples, or a portion of the first block of samples, and the second block of samples, time-shifted by the determined time shift, or a portion of the second block of samples, time-shifted by the determined time shift. [Florencio, Figure 4, Unvoiced Segments of the frame are treated in Figure 4 according to a different selection/quality criterion because they are not periodic and don’t yield a stable pitch value p.  Figure 4 uses the FFT (420) of the frame in order to arrive at the stretched frame (460).  The IFFT computed at step 440 teaches the “second similarity measure” of the Claim because the stretched frame depends on the value of the IFFT.]
The quality check of the Claim is not defined with any particularity.  (Instant Specification:  there is a calculated “objective quality measure q” in [0163] and a “quality threshold value X” which is the same as the “dynamic minimum quality qMin” in [0167].)
Lee teaches: … a quality which is larger than or equal to a quality threshold value; [Lee, “To evaluate the performance of the proposed scheme, a subjective preference test by human listeners is conducted.”  Introduction, p. 1319.]
Florencio and Lee pertain to time scaling and follow similar methods and it would have been obvious to combine the subjective quality check of Lee with the method of Florencio which as quality check determines whether the desired length has been achieved so that either method can be used for quality checking of the time-scaled audio.

Regarding Claim 10, Florencio teaches:
10. The time scaler according to claim 9, wherein the second similarity measure is computationally more complex than the first similarity measure. [Florencio, Figure 4, step 440. The second similarity measure was mapped to the Inverse FFT (IFFT) value of the modified frame that is calculated at 440 and IFFT is more complex than a simple pitch p.]

Claims 11-14 are rejected as under 35 U.S.C. 103 as being unpatentable over Florencio and Lee and further in view of Chen (U.S. 2008/0046235).
Regarding Claim 11, Florencio teaches:
11. The time scaler according to claim 9, 
wherein the first similarity measure is a cross correlation or a normalized cross correlation, or an average magnitude difference function or a sum of squared errors, and [Florencio, Figure 2, determining what type of frame is to be scaled is according to:  “10. The method of claim 8 wherein determining the content type of each segment of the current frame comprises computing a normalized cross correlation for each frame and comparing a maximum peak of each normalized cross correlation to predetermined thresholds for determining the content type of each segment.”]
wherein the second similarity measure is a combination of a cross correlations or of normalized cross correlations for a plurality of different time shifts. 
Florencio does not teach using a combination of cross correlations as a similarity measure.
Lee uses the maximum cross-correlation value to determine similarity between adjacent frames and separate the transient from steady portions.
Chen suggests:
wherein the second similarity measure is a combination of a cross correlations or of normalized cross correlations for a plurality of different time shifts. [Chen teaches packet loss concealment which relies on an overlap-add operation.  In Chen the pitch is variable and therefore several pitch periods have to be calculated and several cross-correlations result when the pitch periods are different and the time-shift for arriving at the cross-correlation value is based on the pitch period.  “[0041] …. As shown in FIG. 3, the method begins at step 302, in which an extrapolated waveform is generated based on a frame that precedes the lost frame and on one or more good frames that follow the lost frame. At step 304, a replacement waveform is generated for the lost frame based on a first portion of the extrapolated waveform. At step 306, a second portion of the extrapolated waveform is overlap-added with a normally-decoded waveform associated with the one or more good frames that follow the lost frame. ….”  “[0045] If, on the other hand, the time lag identified in step 404 is not zero (that is, there is relative time shift between the extrapolated waveform and the normally-decoded waveform associated with the first good frame(s)), then this indicates that the pitch period has changed during the lost frame. In this case, rather than using a constant pitch period for extrapolation during the lost frame, the method of flowchart 400 calculates a pitch contour based on the identified time lag as shown at step 410. A second-pass periodic waveform extrapolation is then performed using the pitch contour to generate the extrapolated waveform, as shown at step 412. By performing the second-pass waveform extrapolation based on the pitch contour calculated in step 410, the method of flowchart 400 causes the extrapolated waveform produced by the method to be in phase with the normally-decoded waveform associated with the first good frame(s).”  "[0023] FIG. 5 depicts a flowchart of a method for calculating a number of pitch cycles in a gap between the end of a frame immediately preceding a lost frame and a middle of an overlap-add region in a first good frame following the lost frame in accordance with an embodiment of the present invention.”]
Florencio/Lee and Chen pertain to time-scaling or extrapolation of waveforms by an overlap and add method and it would have been obvious to modify the system of Florencio/Lee that teaches correlation maximization for obtaining the window of the overlap and add method with the variable pitch system of Chen which takes into account a varying pitch and therefore has to calculate and maximize the cross-correlation for different pitch periods in order to take care of calculating cross-correlation for transient audio instead of throwing out or just using the unscaled portions that correspond to transient audio.  This is simple substitution of one known element for another to obtain predictable results.  See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 12, Florencio does not teach the second similarity measure being based on a combination of cross-correlation values.
Chen suggests:
12. The time scaler according to claim 9, 
wherein the second similarity measure is a combination of cross correlations for at least four different time shifts. [Chen does not say for how many different values of pitch period, which is used as the time-shift value, the cross-correlation is maximized.  Chen assumes a linear pitch contour such that “[0046] … If the new pitch period contour is assumed to be linear, then it can be characterized by a single parameter: the amount of pitch period change per sample, which is basically the slope of the new linearly changing pitch period contour.”  The number of pitch cycles are approximated by m in [0052] or by m as obtained from Figure 5.  The equation in [0059] finds x2(n) which is the extrapolated signal at time n and is obtained by an overlap and add method.]
Chen does not teach or suggest combining cross-correlation measures for different time shifts.  Chen teaches the existence of different time shifts that pertain to a linearly increasing or decreasing pitch period.  Chen and Florencio both teach the use of maximized cross-correlation.  The combination of a varying pitch value, and the fact that cross-correlation uses the pitch value as the time shift, teaches the use of different (four or any number m) of pitch values as the time shift to calculate different cross-correlation values.  The maximized value would be a combination.

Regarding Claim 13, Florencio does not teach the second similarity measure being based on a combination of cross-correlation values.
Chen suggests:
13. The time scaler according to claim 12, 
wherein the second similarity measure is a combination of a first cross correlation value and of a second cross correlation value, which are acquired for time shifts which are spaced by an integer multiple of a period duration of a fundamental frequency of an audio content of the first block of samples or of the second block of samples, and of a third cross correlation value and a fourth cross correlation value, which are acquired for time shifts which are spaced by an integer multiple of the period duration of the fundamental frequency of the audio content, [Chen teaches the use of overlap and add method for arriving at extrapolated frames for packet loss concealment and teaches that pitch period is used as the time shift amount for calculating the cross correlation that is maximized to obtain the overlap amount and also teaches that at times the pitch is not constant and may be approximated with a linear function.  This suggests that for each value on the linear function a different cross-correlation may be maximized.  “[0060] If the overlap-add length is chosen to be the number of samples between two adjacent changes of the rounded pitch period, then the approach of pitch period rounding plus overlap-add using triangular windows effectively approximates a gradually changing pitch period contour with a linear slope.”]
wherein a time shift for which the first cross correlation value is acquired is spaced from a time shift for which the third cross correlation value is acquired, by an odd multiple of half the period duration of the fundamental frequency of the audio content. [Chen.  This is suggested by the linear approximation of the pitch contour.  The slope can be set to half period of the initial pitch.  “[0046] For simplicity, the new pitch period contour calculated in step 410 may be made to be linearly increasing or linearly decreasing, depending on whether the first-pass extrapolated waveform is leading or lagging the normally-decoded waveform associated with the first good frame(s), respectively. If the new pitch period contour is assumed to be linear, then it can be characterized by a single parameter: the amount of pitch period change per sample, which is basically the slope of the new linearly changing pitch period contour.”]
This Claim further limits Claim 12 and the rationale for combination is not changed.

Regarding Claim 14, this Claim is suggested by the combination of Florencio and Chen as provided for Claim 13.  This Claim expresses the language of Claim 13 in an equation.
14. The time scaler according to claim 9, wherein the second similarity measure q is acquired according to 
q=c(p)*c(2*p)+c( 3/2*p)*c(1/2*p) 
or according to 
q=c(p)*c(-p)+c(-1/2*p)*c(1/2*p), 
wherein c(p) is a cross correlation value between a first block of samples and a second block of samples, which are shifted in time by a period duration p of a fundamental frequency of an audio content of the first block of samples or of the second block of samples; 
wherein c(2*p) is a cross correlation value between a first block of samples and a second block of samples, which are shifted in time by 2*p; 
wherein c( 3/2*p) is a cross correlation value between a first block of samples and a second block of samples, which are shifted in time by 3/2*p; 
wherein c(1/2*p) is a cross correlation value between a first block of samples and a second block of samples, which are shifted in time by 1/2*p; 
wherein c(-p) is a cross correlation value between a first block of samples and a second block of samples, which are shifted in time by -p; and 
wherein c(-1/2*p) is a cross correlation value between a first block of samples and a second block of samples, which are shifted in time by -1/2*p. 

Claims 26-27 are rejected as under 35 U.S.C. 103 as being unpatentable over Florencio in view of Ojala (U.S. 2007/0263672).
Regarding Claim 26, Florencio teaches and therefore suggests:
26. An audio decoder for providing a decoded audio content on the basis of an input audio content, the audio decoder comprising: [Florencio, Figure 2, the frames are decoded and hence a decoder is taught.  “[0071] As illustrated by FIG. 2, …  This signal input module 200 receives an audio signal, which may have just been produced, or may have been stored in the computer, or may have been decoded from a packetized audio signal transmitted across a packet-based network, s…. As the signal input module 200 receives or decodes the packets, they are provided to a frame extraction module 205….”]
a jitter buffer configured to buffer a plurality of audio frames representing blocks of audio samples; [Florencio, Figure 2, [0011] and [0016] the purpose of time scaling is dejittering by keeping the right number of samples in a buffer such as “Frame buffer 250” of Figure 2.  “[0016] The result of this carry over technique is that compensation for lost or delayed packets through stretching or compression is extremely flexible as each individual frame is optimally stretched or compressed, as needed, for minimizing any perceivable artifacts in the reconstructed signal. This capability of the temporal audio scaler complements a number of applications such as de-jittering, for example, which generally requires a reduced delay for minimizing artifacts.”]
a decoder core configured to provide blocks of audio samples on the basis of audio frames received from the jitter buffer; [Florencio, Figure 2,  “26. A computer-implemented process for providing dynamic temporal modification of segments of a digital audio signal, comprising using a computing device to: receive one or more sequential frames of a digital audio signal; decode each frame of the digital audio signal as it is received; determine a content type of segments of the decoded audio signal from a group of predefined segment content types, each segment content type having an associated type-specific temporal modification process; and modify a temporal scale of one or more segments of the decoded audio signal using the associated type-specific temporal modification process specific to each segment content type.”]
a sample-based time scaler according to claim 1, [Florencio, Figures 2 and 3 see rejection of Claim 1.]
wherein the sample-based time scaler is configured to provide time-scaled blocks of audio samples on the basis of blocks of audio samples provided by the decoder core. [Florencio, Figures 2, the “frame buffer 250” provides samples back for decoding.  “[0082] Note that the buffer of stretched and compressed frames 245 is available for playback or further processing, as desired. Consequently, in one embodiment, a signal output module 270 is provided for interfacing with an application for outputting the stretched and compressed frames. For example, such frames may be played for a listener as a part of a voice-based communications system.”]
A more express reference is added.
Ojala teaches:
26. An audio decoder for providing a decoded audio content on the basis of an input audio content, the audio decoder comprising: [Ojala, Figure 1, Receiver 160 including a speech decoder 165.]
a jitter buffer configured to buffer a plurality of audio frames representing blocks of audio samples; [Ojala, Figure 1, “variable jitter buffer 162.”]
a decoder core configured to provide blocks of audio samples on the basis of audio frames received from the jitter buffer; [Ojala, Figure 1, “speech decoder 165.”]
a sample-based time scaler according to claim 1, [Ojala, “time scaling unit 163.”]
wherein the sample-based time scaler is configured to provide time-scaled blocks of audio samples on the basis of blocks of audio samples provided by the decoder core. [Ojala, the “depacketization unit 161” is mapped to the "decoder core."]
The “sample-based time scaler according to claim 1” is taught by claim 5 of the reference patent as applied to Claim 1.
It would have been obvious to use the time scaler that is taught by Florencio in the decoder/receiver of Ojala which uses a time scaler.  This is providing an application for the time scaler or simple substitution of one known element for another to obtain predictable results.  See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 27, Florencio teaches:
27. The audio decoder according to claim 26, wherein the audio decoder further comprises 
a jitter buffer control, [Florencio teaches that time scaling is used for control of jitter buffers.  See [0011] and [0016].]
wherein the jitter buffer control is configured to provide a control information to the sample-based time scaler, 
wherein the control information indicates whether a sample-based time scaling should be performed or not, and/or 
wherein the control information indicates a desired amount of time scaling. 
Florencio does not show the decoder or jitter buffer control expressly.
Ojala teaches:
27. The audio decoder according to claim 26, wherein the audio decoder further comprises a jitter buffer control, wherein the jitter buffer control is configured to provide a control information to the sample-based time scaler, wherein the control information indicates whether a sample-based time scaling should be performed or not, and/or wherein the control information indicates a desired amount of time scaling. [Ojala, “[0079] The jitter management control unit 164 is used to control the variable jitter buffer 162 and to control the time scaling unit 163, respectively. In particular, the jitter management control unit 164 receives the discrete information on audio/voice activity, and the jitter management control unit 164 may receiver further information on the received frames from the depacketization unit 161. Furthermore, the jitter management control unit 164 may receive further information on the network status of network 120 from a network analyser (not shown).”  “[0080] The jitter management control unit 164 controls the variable jitter buffer 162 and the time scaling unit 163 on the basis of the received discrete audio/voice activity information. Furthermore, the jitter management control unit 164 may use further information on the received frames and/or further information on the network status for controlling the variable jitter buffer 162 and the time scaling unit 163.”  “[0034] During the active audio burst, the jitter buffer may be controlled dependent on network properties in order to achieve a good trade-off between latency and audio quality.” Jitter buffer is being controlled and therefore has to control the time-scaling that is applied to the buffered frame so they can fit the buffer requirements.]
It would have been obvious to use the time scaler that is taught by Florencio in the decoder/receiver of Ojala which uses a time scaler and further would have been obvious to control the time scaler by control information from a jitter buffer that has to fit the data.  This is providing an application for the time scaler or simple substitution of one known element for another to obtain predictable results.  See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.  (See Ojala, .”  “[0016] Based on the determined end of an active audio burst, the jitter compensation performed to the received frames may be controlled in such a way, that the delay introduced by the jitter compensation to the received frames is reduced at the end of an active audio burst in order to decrease end-to-end latency of the transmission at the end of the active audio burst. E.g., the jitter delay may be decreased to zero delay or near to zero delay at the end of an active audio burst. In case that a variable jitter buffer is used for jitter compensation, the buffer delay of the variable jitter buffer may be decreased near the end of the active audio frame and a time-scaling may be applied to the buffered frames in order to compress the active audio frames near the end of the active audio burst.”)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

Claim 11 first expresses the ideal of the instant Application.	
11. The time scaler according to claim 9, 
wherein the first similarity measure is a cross correlation or a normalized cross correlation, or an average magnitude difference function or a sum of squared errors, and 
wherein the second similarity measure is a combination of a cross correlations or of normalized cross correlations for a plurality of different time shifts. 

Possible modification of Claim 9:
9. The time scaler according to claim 1, wherein the time scaler is configured to:
determine a time shift of a second block of samples with respect to a first block of samples in dependence on a determination of a level of similarity, evaluated using a first similarity measure, between the first block of samples, or a portion of the first block of samples, and the second block of samples, or a portion of the second block of samples; and 
acquire the time scaled version by:
time-shifting a second block of samples with respect to a first block of samples, to obtain a time-shifted second block of samples, and
overlap-and-add the first block of samples and the time-shifted second block of samples;
determine that the quality of the time scaled version is larger than or equal to a quality threshold value;  (q=c(p)*c(2*p)+c(3/2*p)*c(1/2*p), Fig. 9, 942 and [0163])
wherein the quality threshold value is based on a second similarity measure, between the first block of samples, or a portion of the first block of samples, and the second block of samples, time-shifted by the determined time shift, or a portion of the second block of samples, time-shifted by the determined time shift.  (Quality threshold X=qmin, qMin=qMinInitial-(nNotScaled*0.1)+(nScaled*0.2), Fig. 11, 1118.)

Figure 9 has the main idea and the Written Description Follows:
Time Scale Modification (TSM) [0121] In the following, the time-scale modification (TSM), which is also briefly designated as time scaler or sample-based time scaler herein, will be described. A modified packet-based WSOLA (waveform-similarity-based-overlap-add) (confer, for example, [Lia01]) algorithm with built-in quality control is used to perform time scale modification (briefly designated as time scaling) of the signal. Some details can be seen, for example, in FIG. 9, which will be explained below. A level of time scaling is signal-dependent; signals that would create severe artifacts when scaled are detected by a quality control and low-level signals, which are close to silence, are scaled by a most possible extent. Signals that are well time-scalable, like periodic signals, are scaled by an internally derived shift. The shift is derived from a similarity measure, such as a normalized cross correlation. With an overlap-add (OLA), the end of a current frame (also designated as "second block of samples" herein) is shifted (for example, with respect to a beginning of a current frame, which is also designated as "first block of samples" herein) to either shorten or lengthen the frame. [0122] As already mentioned, additional details regarding the time scale modification (TSM) will be described below, taking reference to FIG. 9, which shows a modified WSOLA with quality control, and also taking reference to FIGS. 10a and 10b and 11.

[0157] … In the first processing path 940, in which an amount of time shift is determined in a signal adaptive manner, a similarity estimation 942 is performed on the basis of the audio samples. The similarity estimation 942 may consider a minimum frame size information 944 and may provide an information 946 about a highest similarity (or about a position of highest similarity). In other words, the similarity estimation 942 may determine which position (for example, which position of samples within a block of samples) is best suited for a time shrinking overlap-and-add operation. The information 946 about the highest similarity is forwarded to a quality control 950, which computes or estimates whether an overlap-and-add operation using the information 946 about the highest similarity would result in an audio quality which is larger than (or equal to) a quality threshold value X (which may be constant or which may be variable). If it is found, by the quality control 950, that a quality of an overlap-and-add operation (or equivalently, of a time scaled version of the input audio signal obtainable by the overlap-and-add operation) would be smaller than (or equal to) the quality threshold value X, a time scaling is omitted and unscaled audio samples are output by the time scaler 900. In contrast, if it is found, by the quality control 950, that the quality of an overlap-and-add operation using the information 946 about the highest similarity (or about the position of highest similarity) would be larger than or equal to the quality threshold value X, an overlap-and-add operation 954 is performed, wherein a shift, which is applied in the overlap-and-add operation, is described by the information 946 about the highest similarity (or about the position of the highest similarity). Accordingly, a scaled block (or frame) of audio samples is provided by the overlap-and-add operation.

[0161] To conclude, it should be noted that three different cases are distinguished in the signal adaptive sample-based time scaling when a time shrinking or a time stretching is selected. If an energy of a block (or frame) of input audio samples comprises a comparatively small energy (for example, smaller than (or equal to) the energy threshold value Y), a time shrinking or a time stretching overlap-and-add operation is performed with a fixed time shift (i.e. with a fixed amount of time shrinking or time stretching). In contrast, if the energy of the block (or frame) of input audio samples is larger than (or equal to) the energy threshold value Y, an "optimal" (also sometimes designated as "candidate" herein) amount of time shrinking or of time stretching is determined by the similarity estimation (similarity estimation 942). In a subsequent quality control step, it is determined whether a sufficient quality would be obtained by such an overlap-and-add operation using the previously determined "optimal" amount of time shrinking or time stretching. If it is found that a sufficient quality could be reached, the overlap-and-add operation is performed using the determined "optimal" amount of time shrinking or time stretching. If, in contrast, it is found that a sufficient quality may not be reached using an overlap-and-add operation using the previously determined "optimal" amount of time shrinking or time stretching, the time shrinking or time stretching is omitted (or postponed to a later point in time, for example, to a later frame).

[0162] In the following, some further details regarding the quality adaptive time scaling, which may be performed by the time scaler 900 (or by the time scaler 200, or by the time scaler 340, or by the time scaler 450), will be described. Time scaling methods using overlap-and-add (OLA) are widely available, but in general are not performing signal adaptive time scaling results. In the described solution, which can be used in the time scalers described herein, the amount of time scaling not only depends on the position extracted by the similarity estimation (for example, by the similarity estimation 942), which seems optimal for a high quality time scaling, but also on an expected quality of the overlap-add (for example of the overlap-add 954). Therefore, two quality control steps are introduced in the time scaling module (for example, in the time scaler 900, or in the other time scalers described herein), to decide whether the time scaling would result in audible artifacts. In case of potential artifacts, the time scaling is postponed up to a point in time where it would be less audible.

[0163] A first quality control step calculates an objective quality measure using the position p extracted by the similarity measure (for example, by the similarity estimation 942) as input. In the case of a periodic signal, p will be the fundamental frequency of the current frame. The normalized cross correlation c( ) is calculated for the positions p, 2*p, 3/2*p, and 1/2*p. c(p) is expected to be a positive value and c(1/2*p) might be positive or negative. For harmonic signals, the sign of c(2p) should also be positive and the sign of c(3/2*p) should equal the sign of c(1/2*p). This relationship can be used to create an objective quality measure q: q=c(p)*c(2*p)+c(3/2*p)*c(1/2*p).
[0164] The range of values for q is [-2; +2]. An ideal harmonic signal would result in q=2, while very dynamic and broadband signals which might create audible artifacts during time scaling will produce a lower value. Due to the fact that time scaling is done on a frame-by-frame basis, the whole signal to calculate c(2*p) and c(3/2*p) might not be available yet. However, the evaluation can also be done by looking at past samples. Therefore, c(-p) can be used instead of c(2*p), and similarly c(-1/2*p) can be used instead of c(3/2*p).
[0165] A second quality control step compares the current value of the objective quality measure q with a dynamic minimum quality value qMin (which may correspond to the quality threshold value X) to determine if time-scaling should be applied to the current frame. [0166] There are different intentions for having a dynamic minimum quality value: if q has a low value because the signal is evaluated as bad to scale over a long period, qMin should be reduced slowly to make sure that the expected scaling is still executed at some point in time with a lower expected quality. On the other hand, signals with a high value for q should not result in scaling many frames in a row which would reduce the quality regarding long-term signal characteristics (e.g. rhythm). [0167] Therefore, the following formula is used to calculate the dynamic minimum quality qMin (which may, for example, be equivalent to the quality threshold value X): qMin=qMinInitial-(nNotScaled*0.1)+(nScaled*0.2) [0168] qMinInitial is a configuration value to optimize between a certain quality and the delay until a frame can be scaled with the requested quality, of which a value of 1 is a good compromise. nNotScaled is a counter of frames which have not been scaled because of insufficient quality (q<qMin). nScaled counts the number of frames which have been scaled because the quality requirement was reached (q>=qMin). The range of both counters is limited: they will not be decreased to negative values and will not be increased above a designated value which is set to be 4 by default (for example). [0169] The current frame will be time-scaled by the position p if q>=qMin, otherwise time-scaling will be postponed to a following frame where this condition is met. The pseudo code of FIG. 11 illustrates the quality control for time scaling. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499.  The examiner can normally be reached on Monday through Thursday 9am to 4pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/FARIBA SIRJANI/
Primary Examiner, Art Unit 2659