Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 
DETAILED ACTION
Claims 1-20 are pending. Claims 1, 9, and 17 are independent. Claims are amended to change some of the “the” to “a” for better antecedent basis purposes, change “sub-frame” to “subframe,” and to add that the “digital signal … comprises non-speech data.”  The Claims still doesn’t define “VOICED” and interpretation must fall back on the definition provided in the Specification.  
This Application was published as U.S. 2019/0237088.
Apparent priority is 18 September 2012.
This Application is a continuation of 15/398,321 issued as U.S. 10283133 which is a continuation of 14/027052 issued as U.S. 9589570.  A Terminal Disclaimer over the term of both parents are required as provided below.

Applicant’s amendments and arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection that, if presented, were necessitated by the amendments to the Claims.
This action is Final.

Regarding the use of “AUDIO” and “VOICED” in the Claims.  These capitalized terms are used in the Claims according to the SPECIFIC MEANING attributed to them in the Specification as per an inventor’s right to be his own lexicographer:
[0018] Audio signals are typically encoded in either the time-domain or the More specifically, audio signals carrying speech data are typically classified as VOICE signals and are encoded using time-domain encoding techniques, while audio signals carrying non-speech data are typically classified as AUDIO signals and are encoded using frequency-domain encoding techniques. Notably, the term "audio (lowercase) signal" is used herein to refer to any signal carrying sound data (speech data, non-speech data, etc.), while the term "AUDIO (uppercase) signal" is used herein to refer to a specific signal classification. This traditional manner of classifying audio signals typically generates higher quality encoded signals because speech data is generally periodic in nature, and therefore more amenable to time-domain encoding, while non-speech data is typically aperiodic in nature, and therefore more amenable to frequency-domain encoding. However, some non-speech signals exhibit enough periodicity to warrant time-domain encoding.

Accordingly, a “VOICED signal” within the meaning of the Claims is interpreted to be “a signal carrying speech data” and an “AUDIO signal” within the meaning of the Claims is interpreted to be “a signal carrying non-speech data.”

Response to Arguments
Applicant has amended the independent Claims to include:  “receiving, by an audio coder, a digital signal comprising audio data that comprises non-speech data” and argues that the limitations of classifying this digital signal as a VOICED (speech) signal and encoding the digital signal that is classified as VOICED is not taught by Gao because Gao only operates on “speech” signals and does not teach that its audio signal includes “non-speech data”:

    PNG
    media_image1.png
    297
    637
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    170
    644
    media_image2.png
    Greyscale

Response 8-9.

As a preliminary matter, note that VOICED signal of the Claim is interpreted according to the definition provided by the Specification to mean “speech” which in turn is defined according to its “periodicity”:
The quality of encoded signals can be improved by reclassifying AUDIO signals carrying non-speech data as VOICE signals when periodicity parameters of the signal satisfy one or more criteria.  … 

Instant Application, Abstract.

peech data is generally periodic in nature, and therefore more amenable to time-domain encoding, while non-speech data is typically aperiodic in nature, and therefore more amenable to frequency-domain encoding. However, some non-speech signals exhibit enough periodicity to warrant time-domain encoding.
[0019] Aspects of this disclosure re-classify audio signals carrying non-speech data as VOICE signals when a periodicity parameter of the audio signal exceeds a threshold…. The periodicity parameter can include any characteristic or set of characteristics indicative of periodicity….
	
Instant Application.

Applicant’s core arguments are based on Applicant’s assertion that the Claim refers to “non-speech data” whereas Gao refers to 6 types of “speech data” which cannot teach the “non-speech data” that is claimed.

First, note that at least paragraph [0078] of Gao also includes the sentence:  “The classifier 270 may use any approach to classify the input signal into periodic signals and non-periodic signals.”  Thus, Gao expressly includes classifying an “input signal” and not just “speech.”  Accordingly, Applicant’s argument is not persuasive.

Following paragraphs provide further discussion and further reply to the Applicant’s arguments that is not necessary for rebutting the Applicant’s argument in view of the above-cited express teaching of Gao.

Further, the definition and meaning of the Claim terminology based on the Claim and Specification matches what is taught by Gao based on how the reference defines its terminology.
The Claim and the Specification define “speech” and “non-speech” according to the “degree of periodicity” of the signal.  A signal that is highly periodic (more than a threshold) is labeled and considered “speech” and a signal that fails to satisfy the required degree of periodicity is considered “non-speech.”
Gao, too, classifies the frames of an input signal, which it calls “speech input,” according to their level of periodicity and some of the classification categories are clearly and expressly not “speech” even though they may have been received with “speech.”  Applicant is ignoring the explicit and elaborate teachings of Gao in the interest of the summarized notations used in the drawings of Gao.  For example, by a mention to “6 types of speech data,” Applicant appears to be referring to the following paragraph:
[0078] The classifier 270, with help from the pitch preprocessor 254, classifies each frame into one of six classes according to the dominating feature of the frame. The classes are (1) Silence/background Noise; (2) Noise/Like Unvoiced Speech; (3) Unvoiced; (4) Transition (includes onset); (5) Non-Stationary Voiced; and (6) Stationary Voiced. The classifier 270 may use any approach to classify the input signal into periodic signals and non-periodic signals. For example, the classifier 270 may take the pre-processed speech signal, the pitch lag and correlation of the second half of the frame, and other information as input parameters.
Gao.
At least “Silence/background noise,” which is one of the classification of Gao, cannot be considered a type of “speech” which the Applicant argues.  Additionally, “noise/like unvoiced speech” and “unvoiced” are aperiodic and would be a different type of audio than the “VOICED signal” of the Claim.  
Additionally, the classification of Gao, like the instant Application and as reflected by the determination of periodicity in the Claim, is based on the periodicity of the signal:  “A speech encoder that analyzes and classifies each frame of speech as being periodic-like speech or non-periodic like speech where the speech encoder performs a different gain quantization process depending if the speech is periodic or not….”  Gao, Abstract.  See also Gao, ¶¶ 44-45.  
Applicant latches on to the use of word “speech” for both periodic and non-periodic signals.  Whereas, Gao expressly defines its “speech 200” to include “silence” and/or “background noise” and in general other signals of low periodicity.  The inventors of instant Application are acting as their own lexicographers by using “VOICED” and “AUDIO” to have certain specific meanings, Gao may do so too.  Therefore, we look at the meanings and definitions of the terms and Gao provides ample and express definition for what it means by “speech.”  Some of the input frames 200 of Figure 4 of Gao teach the “non-speech data” of the Claim because some of the input frames at “pre-processed speech 200” can be classified as “noise” as expressly taught by Gao and further because both the Claim and Gao classify their signals according to the level of periodicity.

Note that the inventor of Gao and the instant Application are one and the same.  Any distinction intended from the reference Gao, does not currently come across in the language of the Claims.

As additional support, note that Gao makes a frame by frame determination of periodic vs. aperiodic and the Claim uses the subframes within a frame to determine the periodicity of the signal which also hints that the Claim, like Gao, is making a determination of periodicity (speech or non-speech) on a frame by frame basis. 

Patentability of the other independent Claims is argued based on their similarity to Claim 1. Accordingly, the above provides a reply to those arguments as well.
Patentability of the dependent Claims is argued based on their dependence from their base independent Claims. Accordingly, the above provides a reply to those arguments as well.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims of U.S. Patent No. 10283133 as shown below. Although the claims at issue are not identical, they are not patentably distinct from each other because of the following mapping:
Instant Application
Reference Patent 10283133
1. A method comprising: 

1. A method for encoding signals, the method, which is performed by an audio coder, comprising: 

receiving, by an audio coder, a digital signal comprising audio data that comprises non-speech data;
receiving a digital signal comprising audio data;

classifying the digital signal as an AUDIO signal; 
classifying, by the audio coder, the digital signal as a VOICED signal upon determining that classifying conditions are satisfied, 
re-classifying the digital signal as a VOICED signal when classifying conditions are satisfied, 
wherein the classifying conditions include:
pitch differences between subframes in the digital signal are less than a first threshold, 
an average normalized pitch correlation value for the subframes in the digital signal is greater than a second threshold, and 
a smoothed pitch correlation obtained according to the average normalized pitch correlation value is greater than a third threshold, 
wherein, the classifying conditions include: 
pitch differences between sub-frames in the digital signal are less than a first threshold, 
an average normalized pitch correlation value for the sub-frames in the digital signal is greater than a second threshold, and 
a smoothed pitch correlation obtained according to the average normalized pitch correlation value is greater than a third threshold; 
wherein each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two subframes respectively; and
wherein each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two sub-frames respectively; and 
encoding, by the audio coder, the digital signal that is classified as the VOICE signal using an encoding technique configured for encoding VOICED signals. 

encoding the re-classified VOICED signal in the time-domain when one or more encoding conditions are satisfied, 

wherein the one or more encoding conditions include: 
a coding rate of the digital signal is below a fourth threshold; or 
encoding the AUDIO signal which is not re-classified as the VOICED signal in the frequency-domain. 


2. The method of claim 1, 
wherein encoding the digital signal comprises 
encoding the digital signal in the time-domain upon determining that one or more encoding conditions are satisfied, 
wherein the one or more encoding conditions include:
a coding rate of the digital signal is below a fourth threshold. 
1.  …
(this limitation can be deduced from the last limitation of claim 1 of the reference; a signal not in the time-domain is in the frequency-domain.)


wherein the one or more encoding conditions include: 
a coding rate of the digital signal is below a fourth threshold; or 
encoding the AUDIO signal which is not re-classified as the VOICED signal in the frequency-domain.
3. The method of claim 1, wherein the number of the subframes is 4, the pitch differences comprises the first pitch difference dpit1, the second pitch difference dpit2, and the third pitch difference dpit3, wherein, the dpit1, the dpit2 and the dpit3 are calculated as follows: 
dpit1=|P1-P2| 
dpit2=|P2-P3| 
dpit3=|P3-P4| 
where P1, P2, P3, and P4 are four pitch values corresponding to the subframes, and wherein the classifying condition that the pitch differences between the subframes in the digital signal are less than a threshold comprises: all the dpit1, the dpit2 and the dpit3 are less than the first threshold. 
 2. The method of claim 1, wherein, the number of the subframes is 4, the pitch differences comprises the first pitch difference dpit1, the second pitch difference dpit2, and the third pitch difference dpit3, wherein, the dpit1, the dpit2 and the dpit3 are calculated as follows: 
dpit1-|P1-P2| 
dpit2=|P2-P3| 
dpit3=|P3-P4| 
wherein, P1, P2, P3, and P4 are four pitch values corresponding to the subframes respectively; accordingly, and wherein the classifying condition that the pitch differences between the subframes in the digital signal are less than a threshold comprises: all the dpit1, the dpit2 and the dpit3 are less than the first threshold.
4. The method of claim 3, wherein P1, P2, P3, and P4 are the best pitch values found in a pitch range from a minimum pitch limit PIT_MIN to a maximum pitch limit PIT_MAX for each subframe. 
3. The method of claim 2, wherein, P1, P2, P3, and P4 are the best pitch values found in a pitch range from a minimum pitch limit PIT_MIN to a maximum pitch limit PIT_MAX for each subframe.
5. The method of claim 1, wherein the smoothed pitch correlation from a previous to a current frame is obtained by following formula: Voicing_sm=(3Voicing_sm+Voicing)/4 
where Voicing_sm at the left side of the formula denotes the smoothed pitch correlation of the current frame, Voicing_sm at the right side of the formula denotes the smoothed pitch correlation of the previous frame, and Voicing denotes the average normalized pitch correlation value for the subframes in the digital signal.
 4. The method of claim 1, wherein, the smoothed pitch correlation from a previous to a current frame is obtained by following formula: Voicing_sm=(3Voicing_sm+Voicing)/4
wherein, the Voicing_sm at the left side of the formula denotes the smoothed pitch correlation of the current frame, the Voicing_sm at the right side of the formula denotes the smmothed pitch correlation of the previous frame and Voicing denotes the average normalized pitch correlation value for the subframes in the digital signal.
6. The method of claim 1, wherein the average normalized pitch correlation value for the subframes in the digital signal is obtained by: 
determining a normalized pitch correlation value for each subframe in the digital signal; and 
dividing the sum of all normalized pitch correlation values by the number of the subframes in the digital signal to obtain the average normalized pitch correlation value. 
5. The method of claim 1, wherein the average normalized pitch correlation value for the subframes in the digital signal is obtained by: 
determining a normalized pitch correlation value for each subframe in the digital signal; and 
dividing the sum of all normalized pitch correlation values by the number of the subframes in the digital signal to obtain the average normalized pitch correlation value.
7. The method of claim 1, wherein the digital signal is encoded using code-excited linear prediction (CELP). 
Obvious over a combination of parent application with Gao as shown in Obviousness rejection of Claim 7 provided below.
8. The method of claim 1, wherein the digital signal carries music data.
7. The method of claim 1, wherein the digital signal carries music data.


Claim 2 is taught by a limitation of claim 1 of the reference patent.
Claim 3 is taught by claim 2 of the reference patent.
Claim 4 is taught by claim 3 of the reference patent.
Claim 5 is taught by claim 4 of the reference patent.
Claim 6 is taught by claim 5 of the reference patent.
Claim 7 is taught by claim 6 of the reference patent.
Claim 8 is taught by claim 7 of the reference patent.

Claim 9 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
Claim 10 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 11 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 12 is a system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.
Claim 13 is a system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.
Claim 14 is a system claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.
Claim 15 is a system claim with limitations corresponding to the limitations of Claim 7 and is rejected under similar rationale.
Claim 16 is a system claim with limitations corresponding to the limitations of Claim 8 and is rejected under similar rationale.

Claim 17 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
Claim 18 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 19 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 20 is a system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.

Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims of U.S. Patent No. 9589570 as shown below. Although the claims at issue are not identical, they are not patentably distinct from each other because of the following mapping:
Instant Application
Reference Patent 9589570
1. A method comprising: 

1. A method for encoding signals, the method comprising:
receiving, by an audio coder, a digital signal comprising audio data that comprises non-speech data;
receiving, by an audio encoder, a digital signal comprising audio data,
wherein the audio data includes data of speech and non-speech sounds; 

classifying, by the audio encoder, the digital signal as an AUDIO signal based on the audio data in the digital signal; 

classifying, by the audio coder, the digital signal as a VOICED signal upon determining that classifying conditions are satisfied, 
(Moved up from below: 
re-classifying, by the audio encoder, the digital signal as a VOICED signal when the classifying conditions are satisfied;)
 
determining, by the audio encoder, whether classifying conditions are satisfied, 
wherein the classifying conditions include:
pitch differences between subframes in the digital signal are less than a first threshold, 


an average normalized pitch correlation value for the subframes in the digital signal is greater than a second threshold, and 
a smoothed pitch correlation obtained according to the average normalized pitch correlation value is greater than a third threshold,
wherein the classifying conditions include:
pitch differences between sub-frames in the digital signal are less than a first threshold, 
a coding rate of the digital signal is below a second threshold, 
an average normalized pitch correlation value for the sub-frames in the digital signal is greater than a third threshold and 
a smoothed pitch correlation obtained according to the average normalized pitch correlation value is greater than a fourth threshold, 
wherein each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two subframes respectively; and
wherein each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two sub-frames respectively; 

re-classifying, by the audio encoder, the digital signal as a VOICED signal when the classifying conditions are satisfied; 
encoding, by the audio coder, the digital signal that is classified as the VOICE signal using an encoding technique configured for encoding VOICED signals. 
encoding, by the audio encoder, the digital signal in the time-domain if the digital signal is classified as a VOICED signal; and 

encoding, by the audio encoder, the digital signal in the frequency-domain if the digital signal is classified as an AUDIO signal.


2. The method of claim 1, 
wherein encoding the digital signal comprises 
encoding the digital signal in the time-domain upon determining that one or more encoding conditions are satisfied, 
wherein the one or more encoding conditions include: a coding rate of the digital signal is below a fourth threshold. 
1.  …


encoding, by the audio encoder, the digital signal in the time-domain if the digital signal is classified as a VOICED signal; and
(a coding rate of the digital signal is below a second threshold, )

3. The method of claim 1, wherein the number of the subframes is 4, the pitch differences comprises the first pitch difference dpit1, the second pitch difference dpit2, and the third pitch difference dpit3, wherein, the dpit1, the dpit2 and the dpit3 are calculated as follows: 
dpit1=|P1-P2| 
dpit2=|P2-P3| 
dpit3=|P3-P4| 
where P1, P2, P3, and P4 are four pitch values corresponding to the subframes, and wherein the classifying condition that the pitch differences between the subframes in the digital signal are less than a threshold comprises: all the dpit1, the dpit2 and the dpit3 are less than the first threshold. 
 5. The method of claim 1, wherein, the number of the subframes is 4, the pitch differences comprises the first pitch difference dpit1, the second pitch difference dpit2, and the third pitch difference dpit3, wherein, the dpit1, the dpit2 and the dpit3 are calculated as follows: 
dpit1-|P1-P2| 
dpit2=|P2-P3| 
dpit3=|P3-P4| 
wherein, P1, P2, P3, and P4 are four pitch values corresponding to the subframes respectively; accordingly, and wherein the classifying condition that the pitch differences between the subframes in the digital signal are less than a threshold comprises: all the dpit1, the dpit2 and the dpit3 are less than the first threshold.
4. The method of claim 3, wherein P1, P2, P3, and P4 are the best pitch values found in a pitch range from a minimum pitch limit PIT_MIN to a maximum pitch limit PIT_MAX for each subframe. 
6. The method of claim 2, wherein, P1, P2, P3, and P4 are the best pitch values found in a pitch range from a minimum pitch limit PIT_MIN to a maximum pitch limit PIT_MAX for each subframe.
5. The method of claim 1, wherein the smoothed pitch correlation from a previous to a current frame is obtained by following formula: Voicing_sm=(3Voicing_sm+Voicing)/4 
where Voicing_sm at the left side of the formula denotes the smoothed pitch correlation of the current frame, Voicing_sm at the right side of the formula denotes the smoothed pitch correlation of the previous frame, and Voicing denotes the average normalized pitch correlation value for the subframes in the digital signal.
 7. The method of claim 1, wherein, the smoothed pitch correlation from a previous to a current frame is obtained by following formula: Voicing_sm=(3Voicing_sm+Voicing)/4
wherein, the Voicing_sm at the left side of the formula denotes the smoothed pitch correlation of the current frame, the Voicing_sm at the right side of the formula denotes the smmothed pitch correlation of the previous frame and Voicing denotes the average normalized pitch correlation value for the subframes in the digital signal.
6. The method of claim 1, wherein the average normalized pitch correlation value for the subframes in the digital signal is obtained by: 
determining a normalized pitch correlation value for each subframe in the digital signal; and 
dividing the sum of all normalized pitch correlation values by the number of the subframes in the digital signal to obtain the average normalized pitch correlation value. 
2. The method of claim 1, wherein the average normalized pitch correlation value for the subframes in the digital signal is obtained by: 
determining a normalized pitch correlation value for each subframe in the digital signal; and 
dividing the sum of all normalized pitch correlation values by the number of the subframes in the digital signal to obtain the average normalized pitch correlation value.
7. The method of claim 1, wherein the digital signal is encoded using code-excited linear prediction (CELP). 
Obvious over a combination of parent application with Gao as shown in Obviousness rejection of Claim 7 provided below.
8. The method of claim 1, wherein the digital signal carries music data.
4. The method of claim 1, wherein the digital signal carries music data.


Claim 2 is taught by a limitation of claim 1 of the reference patent.
Claim 3 is taught by claim 5 of the reference patent.
Claim 4 is taught by claim 6 of the reference patent.
Claim 5 is taught by claim 7 of the reference patent.
Claim 6 is taught by claim 2 of the reference patent.
Claim 7 is taught by claim 3 of the reference patent.
Claim 8 is taught by claim 4 of the reference patent.

Claim 9 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
Claim 10 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 11 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 12 is a system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.
Claim 13 is a system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.
Claim 14 is a system claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.
Claim 15 is a system claim with limitations corresponding to the limitations of Claim 7 and is rejected under similar rationale.
Claim 16 is a system claim with limitations corresponding to the limitations of Claim 8 and is rejected under similar rationale.

Claim 17 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
Claim 18 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 19 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 20 is a system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.

Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.


Claims 1-7, 9-15, and 17-20 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Gao (U.S. 2004/0260545) in view of Lee (U.S. 2013/0246068).
Regarding Claim 1, Gao teaches or suggests:
1. A method comprising: 
receiving, by an audio coder, a digital signal comprising audio data that comprises non-speech data; [Gao teaches in Figure 1, “speech encoder 117” receiving the “speech signal” from the “microphone 111.” (Note that 117 in Figure 1 is an encoder which is denoted as decoder by error as evidenced by the Written Description.)  Gao also teaches that the input signal may include silence, noise, unvoiced speech, all of which teach “non-speech data” of the Claim:  “[0078] The classifier 270, with help from the pitch preprocessor 254, classifies each frame into one of six classes according to the dominating feature of the frame. The classes are (1) Silence/background Noise; (2) Noise/Like Unvoiced Speech; (3) Unvoiced; (4) Transition (includes onset); (5) Non-Stationary Voiced; and (6) Stationary Voiced.  The classifier 270 may use any approach to classify the input signal into periodic signals and non-periodic signals….”  The “classifier 270” is in Figure 4 and part of the encoder: “[0042] FIG. 4 is a functional block diagram illustrating an exemplary second stage of the source encoder ….”   Gao teaches two modes of encoding depending if the signal is periodic/speech or aperiodic/non-speech.  See [0119] and [0120]. “[0054] In particular, a microphone 111 produces a speech signal in real time. The microphone 111 delivers the speech signal to an A/D (analog to digital) converter 115. The A/D converter 115 converts the analog speech signal into a digital form and then delivers the digitized speech signal to a speech encoder 117.”]
classifying, by the audio coder, the digital signal as a VOICED signal upon determining that classifying conditions are satisfied, [Gao teaches in Figure 4, the “pre-processed speech s200” is classified by the "Classification 270."  After "Classification 270" the classes in [0078] include (5) Non-Stationary Voiced and (6) Stationary Voiced both of which are VOICED.  Figures 12A and 12B, "Speech Signals" being classified into "Periodic Signals" and "Non-Periodic Signals.”  Fact being that most speech is periodic and aperiodic signals are generally not speech or “unvoiced sounds” that are harsh aperiodic types of speech.  “[0078] The classifier 270, with help from the pitch preprocessor 254, classifies each frame into one of six classes according to the dominating feature of the frame. The classes are (1) Silence/background Noise; (2) Noise/Like Unvoiced Speech; (3) Unvoiced; (4) Transition (includes onset); (5) Non-Stationary Voiced; and (6) Stationary Voiced….”]
wherein the classifying conditions include: 
pitch differences between subframes in the digital signal are less than a first threshold,  [Gao suggests the “pitch differences between subframes” of the Claim by teaching the use of the “correlation of the second half of the frame” as a measure of  whether the speech is Voiced.  Pitch lag indicates the number of samples in a pitch period and is the same as pitch period only expressed in terms of number of samples instead of time.  Figure 8 shows the subframes of a frame.  Correlation shows how much something has changed with time, i.e. “the difference with the previous subframe.”  Figure 8 and [0067] show that a frame is divided into subframes and the change of pitch lag from subframe to subframe is determined as an indicator of periodicity.  This teaches or suggests “pitch differences between subframes” and if these differences are small, i.e. “less than a first threshold,” the signal is periodic and thus speech (VOICED).  “[0078] … The classifier 270 may use any approach to classify the input signal into periodic signals and non-periodic signals. For example, the classifier 270 may take the pre-processed speech signal, the pitch lag and correlation of the second half of the frame, and other information as input parameters.”  “[0067] Mode 0 operates a traditional speech encoding algorithm such as a CELP algorithm. However, Mode 0 is not used for all frames of speech. Instead, Mode 0 is selected to handle frames of all speech other than "periodic-like" speech, as discussed in greater detail below. For convenience, "periodic-like" speech is referred to here as periodic speech, and all other speech is "non-periodic" speech. Such "non-periodic" speech include transition frames where the typical parameters such as pitch correlation and pitch lag change rapidly and frames whose signal is dominantly noise-like. Mode 0 breaks each frame into two subframes. Mode 0 codes the pitch lag once per subframe and has a two-dimensional vector quantizer to jointly code the pitch gain (i.e., adaptive codebook gain) and the fixed codebook gain once per subframe….”]
an average normalized pitch correlation value for the subframes in the digital signal is greater than a second threshold, and [Gao teaches in [0099]-[0104] teach an "Average weighted pitch correlation.”  Gao teaches that a periodic/stationary/voiced signal has high correlation which is generally more than a threshold.  Figure 4, “Classification 270” with “LPC analysis 260” and “Open Loop Pitch Estimation 272” providing input to the “classification 270.”  Paragraphs [0078], [0079], [0086].  Classification, e.g. [0078], according to each parameter requires one or more thresholds pertaining to that parameter; one example is given in:  “[0079] Various criteria can be used to determine whether speech is deemed to be periodic. … Furthermore, periodic speech may be smooth and stationary speech. …A speech signal is "smooth" if the adaptive codebook gain GP of that speech is greater than a threshold value. For example, if the threshold value is 0.7, a speech signal in a subframe is considered to be smooth if its adaptive codebook gain GP is greater than 0.7.”]
a smoothed pitch correlation obtained according to the average normalized pitch correlation value is greater than a third threshold, [Gao teaches in Figure 4, [0103]-[0104] additionally teach a “Running mean of average weighted pitch correlation.”  The weight can perform smoothing.  Running mean also performs smoothing on the average normalized pitch correlation.  (See definition of “smoothing” in the instant Specification which is a running and weighted average.)   “[0115] The parameters given by Equations 23, 25, and 26 are used to mark whether a frame is likely to contain an onset, and the parameters given by Equations 16-18, 20-22 are used to mark whether a frame is likely to be dominated by voiced speech. Based on the initial marks, past marks and other information, the frame is classified into one of the six classes.”  Equation 17 is the equation for “running mean of average weighted pitch correlation”/”smoothed pitch correlation” of the Claim and it is used for classification of speech as voiced, unvoiced, etc.  Classification, e.g. [0078], according to each parameter requires one or more thresholds pertaining to that parameter; one example for gain Gp is given in [0079] as 0.7.]
wherein each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two subframes respectively; and [Gao does not teach this particular definition of pitch differences.  The “correlation of the second half of the frame” was used to teach the “pitch differences” between subframe.  Correlation is a measure of difference; it shows the difference of a time series with itself at a later time.  However, the manner of calculating a correlation is not by taking “an absolute value of the difference between two pitch values.”]
encoding, by the audio coder, the digital signal that is classified as the VOICED signal using an encoding technique configured for encoding VOICED signals. [Gao teaches encoding of a signal according to its classification: “A speech encoder that analyzes and classifies each frame of speech as being periodic-like speech or non-periodic like speech where the speech encoder performs a different gain quantization process depending if the speech is periodic or not….”  Abstract.  Gao teaches two modes of encoding Mode 0, for non-periodic audio, and Mode 1, for periodic speech which teaches the VOICED signals of the Claim.  For non-periodic signals which refers to noise for example, Gao uses Mode 0 encoding which is CELP and for periodic speech (VOICED) it uses other types of encoding.  “[0044] FIG. 6 … fourth stage of the source encoder … for processing non-periodic speech (mode 0).”  “[0045] FIG. 7 … fourth stage of the source encoder … for processing periodic speech (mode 1).”  “[0067] Mode 0 operates a traditional speech encoding algorithm such as a CELP algorithm. However, Mode 0 is not used for all frames of speech. Instead, Mode 0 is selected to handle frames of all speech other than "periodic-like" speech, as discussed in greater detail below….”  “[0068] Mode 1 deviates from the traditional CELP algorithm. Mode 1 handles frames containing periodic speech which typically have high periodicity and are often well represented by a smooth pitch tract….”]

Gao does not teach the definition of pitch difference as “the pitch differences is an absolute value of the difference between two pitch values corresponding to two subframes respectively.”
Lee teaches:
pitch differences between subframes in the digital signal are less than a first threshold, [Lee uses the absolute value of the pitch difference between two subframes of the signal being less than a reference value/threshold which is a measure of periodicity and stationarity of the signal to decide that the frame that is received is a good frame. Figure 5, [0035].  “[0036] Referring to FIG. 5, a pitch T0 of the first subframe and a pitch T0-2 of the second subframe of the N+1-th frame data are first decoded (504).”]
…
wherein each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two subframes respectively; and [Lee uses the absolute value of the pitch difference between two subframes of the signal being less than a reference value/threshold as a measure of periodicity of the signal.  Figure 5, “|T0-T0_2] <x?  508””  “[0038] If all the conditions of step 506 are satisfied, it is checked that an absolute value of the pitch difference (T0-T0-2) between the first subframe and the second subframe of the N+1-th frame is smaller than the predetermined reference value (x) (508). When the condition is not satisfied, the general decoding procedure is performed after step 516.”  See also [0047].]
Gao and Lee pertain to encoding and decoding in the time domain and it would have been obvious to combine the particular method of evaluation of having a small difference between pitch values of consecutive subframes as an indicator of stability and stationarity from Lee with the conditions that taught by Gao which include an indirect determination of pitch difference by evaluating autocorrelation for determination of whether a speech signal is voice as combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 2, Gao teaches:
2. The method of claim 1, 
wherein encoding the digital signal comprises encoding the digital signal in a time-domain upon determining that one or more encoding conditions are satisfied, [Gao, “A speech encoder that analyzes and classifies each frame of speech as being periodic-like speech or non-periodic like speech where the speech encoder performs a different gain quantization process depending if the speech is periodic or not….”  Abstract.  The encoding appears to be in the time domain because it uses Linear Predictive Coding (LPC) which was originally used in the time domain and the reference does not otherwise mention transform into the frequency domain.  “[0076] … The quantization of the LPC coefficients may be either scalar or vector quantization and may be performed in any appropriate domain in any manner known in the art."  “[0117] … Though LSF is preferred, the quantizer 267 can quantize the LPC coefficients into a domain other than the LSF domain.” ]
wherein the one or more encoding conditions include: a coding rate of the digital signal is below a fourth threshold. [Gao tries to keep the bit rate below a certain level: “[0065] In order to achieve toll quality at a low bit rate (such as 4 kilobits per second), the improved speech encoding algorithm departs somewhat from the strict waveform-matching criterion of traditional CELP algorithms and strives to capture the perceptually important features of the input signal. ….”]

Regarding Claim 3, Gao teaches:
3. The method of claim 1, 
wherein, a number of the subframes is 4, [Gao, Figure 8, showing a frame including 4 subframes.]
the pitch differences comprises a first pitch difference dpit1, a second pitch difference dpit2, and a third pitch difference dpit3, 
wherein, the dpit1, the dpit2 and the dpit3 are calculated as follows: 
dpit1=|P1-P2| 
dpit2=|P2-P3| 
dpit3=|P3-P4| 
wherein, P1, P2, P3, and P4 are four pitch values corresponding to the subframes, and
wherein a classifying condition that the pitch differences between the subframes in the digital signal are less than a threshold comprises: 
all the dpit1, the dpit2 and the dpit3 are less than the first threshold. [Gao teaches that if pitch period changes rapidly we are dealing with noise because speech has steady periods that are all about the same and suggests that the difference between periods for speech would remain below a limit.  Gao also teaches dividing the frame into two or more subframes:  “[0067] … Such "non-periodic" speech include transition frames where the typical parameters such as pitch correlation and pitch lag change rapidly and frames whose signal is dominantly noise-like. Mode 0 breaks each frame into two subframes….”]teaches having a threshold for 

Gao does not teach calculating the pitch differences in the manner shown by taking an absolute value of difference between pitch periods.  This is suggested by the teaching of Gao which uses pitch correlation and change in pitch lag.

Lee teaches:
…
wherein the number of the subframes is 4, the pitch differences comprises a first pitch difference dpit1, a second pitch difference dpit2, and a third pitch difference dpit3, wherein, the dpit1, the dpit2 and the dpit3 are calculated as follows: 
 dpit1=|P1-P2| [Lee, Figure 5, 508, shows evaluating |T0-T0_2|<x which teaches this equation]
dpit2=|P2-P3| [Lee suggests this equation by the teaching in Figure 5, 508.]
dpit3=|P3-P4| [Lee suggests this equation by the teaching in Figure 5, 508.]
where P1, P2, P3, and P4 are four pitch values corresponding to the subframes, and wherein a classifying condition that the pitch differences between the subframes in the digital signal are less than a threshold comprises: all the dpit1, the dpit2 and the dpit3 are less than the first threshold. [Lee, Figure 5, 508, the single threshold is x.]
The rationale for combination remains as provided for Claim 1.  The pitch difference limitation comes from Lee and the details of this feature come in from Lee under the same rationale.

Regarding Claim 4, Gao teaches:
3. The method of claim 2, wherein, P1, P2, P3, and P4 are the best pitch values found in a pitch range from a minimum pitch limit PIT_MIN to a maximum pitch limit PIT_MAX for each subframe. [Gao “[0128] … After selecting the best pitch contribution from the adaptive codebook 290, the residual signal 416 is the modified weighted speech signal 258 less the pitch contribution….”  This pertains to Mode 0 which is not the periodic speech/stationary voiced speech mode of Gao.]

Regarding Claim 5, Gao teaches:
5. The method of claim 1, wherein, the smoothed pitch correlation from a previous to a current frame is obtained by following formula: 
Voicing_sm = (3Voicing_sm+Voicing)/4  [This can be shown as = (0.75) Voicing_sm +0.25 Voicing.]
wherein, the Voicing_sm at the left side of the formula denotes a smoothed pitch correlation of a current frame, the Voicing_sm at the right side of the formula denotes a smoothed pitch correlation of a previous frame and Voicing denotes the average normalized pitch correlation value for the subframes in the digital signal. [Gao, Equation 17 in [0104] is the equation for “running mean of average weighted pitch correlation”/”smoothed pitch correlation” of the Claim.  Starting from [0102] to [0105] teach the steps of finding the "running mean of average weighted pitch correlation" which was mapped to the "smoothed pitch correlation" of the Claim.   <Rw,pavg > is the "running mean of the average weighted pitch correlation" and  is mapped to Voicing_sm of the Claim with is the “smoothed pitch correlation” in the Claim. < Rw,pavg  (frame m)>= (some coefficient called alpha 2, for example 0.75) <Rw,pavg  (of previous frame m-1)> + (1-alpha2, for example: 1-0.75=0.25). Rw,pavg  .   Rw,pavg  is defined in [0103] as the "Average weighted pitch correlation” which is mapped to the “average normalized pitch correlation” or “Voicing” of the Claim.  So, the formula in this Claim is exactly Equation 17 in [0104] of Gao.]

Regarding Claim 6, Gao teaches:
6. The method of claim 1, wherein the average normalized pitch correlation value for the subframes in the digital signal is obtained by: 
determining a normalized pitch correlation value for each subframe in the digital signal; and [Gao, [0103]-[0104] "Average weighted pitch correlation” includes the averaging.]  
dividing a sum of all normalized pitch correlation values by a number of the subframes in the digital signal to obtain the average normalized pitch correlation value. [Gao, [0103]-[0104] teach an "Average weighted pitch correlation.”  Normalization is matter of dividing by some overall value.  The weight in the “average weighted pitch correlation” may be a division by some value.]

Regarding Claim 7, Gao teaches:
7. The method of claim 1, wherein the digital signal is encoded using code-excited linear prediction (CELP). [Gao is directed to “Gain quantization for a CELP speech coder.”  Title.  Gao has two modes, Mode 0 uses CELP encoding and is applied to non-periodic portions/frames of the signal which corresponds to the “AUDIO signal” of the instant Application.  Mode 1 of Gao does not use CELP and is used for encoding the periodic frames/speech to which the instant Application and Claim refer as VOICED.  “[0067] Mode 0 operates a traditional speech encoding algorithm such as a CELP algorithm. However, Mode 0 is not used for all frames of speech. Instead, Mode 0 is selected to handle frames of all speech other than "periodic-like" speech, as discussed in greater detail below. For convenience, "periodic-like" speech is referred to here as periodic speech, and all other speech is "non-periodic" speech. Such "non-periodic" speech include transition frames where the typical parameters such as pitch correlation and pitch lag change rapidly and frames whose signal is dominantly noise-like….”  “[0068] Mode 1 deviates from the traditional CELP algorithm. Mode 1 handles frames containing periodic speech which typically have high periodicity and are often well represented by a smooth pitch tract….”  (Gao uses “like” such as in “periodic-like” to be more accurate because probably few real-life periodic signals are 100% periodic.)]

Claim 9 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.  Additionally the “processor” and “computer readable storage medium” are taught by Gao, Figure 2, “speech processing and channel processing circuitry 159 and 165” and “speech and channel memory 161.”
Claim 10 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 11 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 12 is a system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.
Claim 13 is a system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.
Claim 14 is a system claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.
Claim 15 is a system claim with limitations corresponding to the limitations of Claim 7 and is rejected under similar rationale.

Claim 17 is a CRM system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
Claim 18 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 19 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 20 is a system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.

Claims 8 and 16 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Gao and Lee and further in view of Atti (U.S. 2013/0185063).
Regarding Claim 8, Gao does not mention classifying music.  Neither does Lee.
Atti teaches:
8. The method of claim 1, wherein the digital signal carries music data. [Atti, Figure 3, “[0032] FIG. 3 is an operational flow of an implementation of a method 300 for classifying audio. At 310, the initial classifier 210 receives an input audio frame (or other portion of an audio signal for classifying the portion of the audio signal as a speech-like audio signal or a music-like audio signal) and classifies it as speech or music at 320. The initial classifier 210 may be any classifier that classifies an audio frame or portion as speech or music.”] 
Gao and Lee and Atti apply to classifying signals and it would have been obvious to combine the discussion of finding music in the input audio from Atti with the classes of audio (The classes are (1) Silence/background Noise; (2) Noise/Like Unvoiced Speech; (3) Unvoiced; (4) Transition (includes onset); (5) Non-Stationary Voiced; and (6) Stationary Voiced.) from Gao/Lee in order to have a more complete list of classes to look for as simple substitution of one known element for another to obtain predictable results.  See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 16 is a system claim with limitations corresponding to the limitations of Claim 8 and is rejected under similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659