DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Response to Amendment
This communication is responsive to the applicant’s amendment dated 04/12/2022.  

Response to Arguments
Applicant's arguments filed 04/14/2022 have been fully considered but they are not persuasive. 

Regarding claim 1, the Applicant argues, “For example, Seroussi further lacks disclosure: “wherein the number of bits allocated for a frequency sub-band is higher [under a condition] than the number of bits that would be allocated [under another condition]”” (Remarks: pg. 9) The Examiner respectfully disagrees.
Considering the prior art as a whole, Seroussi teaches allocating a number of bits for bands with either small amount of large amount of frequency content (par. 0042; ‘The encoder can allocate a specified number of bits 514 for each band in each frame. In some examples, for bands in which there is a relatively large amount of frequency content, the encoder can allocate a relatively high number of bits to represent the frequency content. For bands in which there is relatively little frequency content, the encoder can allocate a relatively small number of bits to represent the frequency content. In general, the higher the number of bits allocated for a particular band, the more accurate the representation of the frequencies in that particular band. The encoder can strike a balance between accuracy, which drives the bit allocation upward, and data rate, which can provide an upper limit to the number of bits allocated per frame.’; par. 0101; ‘An example of a parameter can be a slope, in units of bits per frequency unit.’). Clearly, Seroussi suggests wherein the number of bits allocated for a frequency sub-band is higher [under a condition] than the number of bits that would be allocated [under another condition], the conditions being the amount of frequency content in each band.

Regarding claim 1, the Applicant argues, “Neither para [0077] nor para [0056] teach to allocate to a frequency sub-band of the frame on the basis of perceptibility, or on any other basis.” (Remarks: pg. 9) The Examine notes this argument is moot because the claims do not require allocating to a frequency sub-band of the frame on the basis of perceptibility.

Regarding claim 1, the Applicant argues that the “None of the cited prior art references teach classifying the audio frame in each frequency sub-band as either background or foreground using a background model specific to the frequency sub-band, as required by the claims” (Remarks: pg. 18) mainly because Li teaches encoding video frames (Remarks: pg. 17)
“Applicant respectfully submits the Office has failed to give the word ‘audio’ (audio signal) its plain meaning when it interprets the word audio so broadly as to encompass ‘video’ (video signal) within its meaning.” (Remarks: pg. 11)
“The office provides no basis in fact and/or technical reasoning to reasonable support a conclusion Applicant’s claimed feature ‘classifying an audio frame’ necessarily flows from some teaching of Li, regarding classifying a video frame.” (Remarks: pg. 13)
“Further, modification of Seroussi’s audio encoder to make it operate to process video signals in accordance Li’s video algorithm would entail reconstructing Seroussi’s audio encoder as a video encoder.” (Remarks: pg. 14)
“Applicant respectfully submits Li is non-analogous art.” (Remarks: pg. 15)
The Examiner respectfully disagrees. 
Considering the prior art as a whole, Seroussi teaches bit allocation for encoding and decoding audio. Li was introduced for teach background-foreground information based bit allocation. Both Seroussi and Li teach bit allocation methods. It is well-known that audio and video codec methods have overlaps, such as bit allocation, frequency domain transformation, etc. Therefore, applying a technique used in video codecs to audio codec would have been obvious to one of ordinary skill in the art.
Secondly, in response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art.  See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).  In this case, improving rate control would be beneficial for Seroussi’s audio encoder.
Third, in response to applicant's argument that Li is nonanalogous art, it has been held that a prior art reference must either be in the field of applicant’s endeavor or, if not, then be reasonably pertinent to the particular problem with which the applicant was concerned, in order to be relied upon as a basis for rejection of the claimed invention.  See In re Oetiker, 977 F.2d 1443, 24 USPQ2d 1443 (Fed. Cir. 1992).  In this case, as noted above, Li is in the field of bit allocation, therefore being analogous to Seroussi.
Finally, in response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

Claim Rejections - 35 USC § 103
Claims 1-7 and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Seroussi et al. (US 20180308494 A1) in view of Li et al. (“Background-foreground information based bit allocation algorithm for surveillance video on high efficiency video coding (HEVC)”, 2016).

Regarding claims 1 and 13, Seroussi teaches:
“receiving an audio signal to be encoded, the audio signal comprising a plurality of successive audio frames” (par. 0054; ‘Divide the input signal into frames, each frame containing a fixed number of audio samples.’);
“for each successive audio frame of the audio signal: representing the audio frame in a frequency domain with respect to a plurality of frequency sub-bands” (par. –58; ‘For each frame, partition the vector X into M bands B.sub.i, according to:’).
Seroussi teaches encoding each successive audio frame of the audio signal, wherein a number of bits is allocated for each frequency sub-band of the audio frame, wherein the number of bits allocated for a frequency sub-band is higher if the audio frame is perceptually important (par. 0042; ‘For bands in which there is relatively little frequency content, the encoder can allocate a relatively small number of bits to represent the frequency content. In general, the higher the number of bits allocated for a particular band, the more accurate the representation of the frequencies in that particular band. The encoder can strike a balance between accuracy, which drives the bit allocation upward, and data rate, which can provide an upper limit to the number of bits allocated per frame.’;;; par. 0077; ‘As explained above, the encoder allocates bits within the frames such that parts of the audio signal that are perceptually important are allocated more bits, while parts that may be less perceptible (e.g., due to masking phenomena) can be encoded using fewer bits.’).
However, Seroussi does not expressly teach background or foreground classification, as in:
“classifying the audio frame in each frequency sub-band as either background or foreground using a background model specific to the frequency sub-band”; and
“encoding each successive audio frame of the audio signal, wherein a number of bits is allocated for each frequency sub-band of the audio frame, wherein the number of bits allocated for a frequency sub-band is higher if the audio frame is classified as foreground in the frequency sub-band than if the audio frame is classified as background in the frequency sub-band.”
Li teaches:
“classifying the audio frame in each frequency sub-band as either background or foreground using a background model specific to the frequency sub-band” (pg. 1, right col., “These algorithms…”; ‘Basically, BFIBA classifies a LCU into a background LCU (BLCU) or a foreground LCU (FLCU) by utilizing the background and foreground information (BFI) and then allocates bits for frames and LCUs based on the classification information.’);
“encoding each successive audio frame of the audio signal, wherein a number of bits is allocated for each frequency sub-band of the audio frame, wherein the number of bits allocated for a frequency sub-band is higher if the audio frame is classified as foreground in the frequency sub-band than if the audio frame is classified as background in the frequency sub-band” (pg. 2, Fig. 2; ‘Background modeling’; pg. 1, right col., “These algorithms…”;; ‘The foreground parts always need large number of bits to encode while the background parts need little. Thus a Background-Foreground Information based Bit Allocation algorithm (BFIBA) for surveillance video is proposed in this paper.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Seroussi’s bit allocation method by incorporating Li’s Background-Foreground Information based Bit Allocation algorithm in order to encode each successive audio frame of the audio signal in a similar manner. The combination would improve rate control performance. (Li: pg. 2, left col., “Based on the….”)

Regarding claim 2 (dep. on claim 1), the combination of Seroussi in view of Li further teaches:
“wherein the number of bits allocated for encoding a background classified frequency sub-band of the audio frame is dependent on a frequency range of the background classified frequency sub-band of the audio frame; and/or the number of bits allocated for encoding a foreground classified frequency sub-band of the audio frame is dependent on a frequency range of the foreground classified frequency sub-band of the audio frame” (Seroussi: par. 0090; ‘The encoder can form a bit-allocation curve 614 for a particular frame, which represents how many bits are allocated for each band in the particular frame.’).

Regarding claim 3 (dep. on claim 1), the combination of Seroussi in view of Li further teaches:
“wherein the audio signal is encoded such that the number of bits allocated to a background classified first frequency sub-band of a first audio frame is higher if the same first frequency sub-band in an audio frame preceding the first audio frame was classified as foreground compared to if the same first frequency sub-band in the audio frame preceding the first audio frame was classified as background” (Seroussi: par. 0042; ‘For bands in which there is relatively little frequency content, the encoder can allocate a relatively small number of bits to represent the frequency content. In general, the higher the number of bits allocated for a particular band, the more accurate the representation of the frequencies in that particular band. The encoder can strike a balance between accuracy, which drives the bit allocation upward, and data rate, which can provide an upper limit to the number of bits allocated per frame.’; par. 0090; ‘The encoder can form a bit-allocation curve 614 for a particular frame, which represents how many bits are allocated for each band in the particular frame.’).

Regarding claim 4 (dep. on claim 1), the combination of Seroussi in view of Li further teaches:
“wherein the number of bits allocated for encoding a frequency sub-band of the audio frame further depends on a psychoacoustic model” (Seroussi: par. 0095; ‘The encoder 700 can employ data from any available sources 710, including psychoacoustic models and others, and perform bit-allocation 712 to produce a bit-allocation curve 714.’).

Regarding claim 5 (dep. on claim 2), the combination of Seroussi in view of Li further teaches:
“wherein the number of bits allocated for encoding a frequency sub-band of the audio frame is dependent on the frequency range of the frequency sub-band of the audio frame according to a psychoacoustic model” (Seroussi: par. 0095; ‘The encoder 700 can employ data from any available sources 710, including psychoacoustic models and others, and perform bit-allocation 712 to produce a bit-allocation curve 714.’).

Regarding claim 6 (dep. on claim 1), the combination of Seroussi in view of Li further teaches:
“wherein the number of bits allocated for encoding a background classified frequency sub-band of the audio frame is independent of a frequency range that the background classified frequency sub-band of the audio frame represents and wherein the number of bits allocated for encoding a foreground classified frequency sub-band of the audio frame is independent of a frequency range that the foreground classified frequency sub-band of the audio frame belongs to” (Seroussi: par. 0122; ‘In some examples, at least one target parameter can include a reference number of bits allocatable for each band. In some examples, the method 1100 can optionally further include: setting the estimated number of bits allocatable for each band to equal the reference number of bits allocatable for each band, for multiple frames in the digital audio signal; and encoding data representing the reference number of bits allocatable for each band into the bit stream.’).

Regarding claim 7 (dep. on claim 1), the combination of Seroussi in view of Li further teaches:
“for an audio frame of the audio signal: for a frequency sub-band of the audio frame; updating the background model specific to the frequency sub-band which corresponds to the frequency sub-band of the audio frame based on a frequency content of the frequency sub-band of the audio frame” (Li: pg. 2, Fig. 2; ‘Background modeling’ Updating background models are well-known in the art. It would have been obvious to update the background model specific to the frequency sub-band which corresponds to the frequency sub-band of the audio frame based on a frequency content of the frequency sub-band of the audio frame.).

Regarding claim 12, the combination of Seroussi in view of Li further teaches:
“A computer program product comprising a non-transitory computer-readable medium storing computer-readable instructions which, when executed on a processor, will cause the processor to perform the method according to claim 1” (Seroussi: par. 0162; ‘Further, one or any combination of software, programs, computer program products that embody some or all of the various embodiments of the encoding and decoding system and method described herein, or portions thereof, may be stored, received, transmitted, or read from any desired combination of computer or machine readable media or storage devices and communication media in the form of computer executable instructions or other data structures.’).
Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Seroussi in view of Li as applied to claim 1 above, and further in view of Chu et al. (“A semi-supervised learning approach to online audio background detection”, 2009).

Regarding claim 8 (dep. on claim 1), Seroussi in view of Li do not expressly teach Gaussian Mixture Model, as in “wherein the background model specific to the frequency sub-band includes a Gaussian Mixture Model, GMM, the GMM comprising a plurality of Gaussian distributions, each of which representing a probability distribution for energy levels in the frequency sub-band.”
Chu teaches:
“wherein the background model specific to the frequency sub-band includes a Gaussian Mixture Model, GMM, the GMM comprising a plurality of Gaussian distributions, each of which representing a probability distribution for energy levels in the frequency sub-band” (pg. 1631, left col., “For BG/FG…”; ‘For classification, we ordered the Gaussians by their values of αonline + αs, where αs = αbg if Pbg(μt) ≥ Pfg(μt) and 0 otherwise.’ See also abstract).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the background model taught by Seroussi in view of Li by incorporating the Gaussian mixture models used for background/foreground classification as taught by Chu in order to be able to understand and predict ambient context surrounding an agent, both human and machine. (Chu: abstract)

Regarding claim 9 (dep. on claim 8), the combination of Seroussi in view of Li and Chu further teaches:
“wherein a frequency sub-band of the audio frame is classified as background if an energy level of the frequency sub-band of the audio frame lies within a predetermined number of standard deviations around a mean of one of the Gaussian distributions of the GMM of the background model specific to the frequency sub-band, and if a weight of said Gaussian distribution is above a threshold, wherein the weight represents a probability that an energy level of the frequency sub-band of the audio frame will be within the predetermined number of standard deviations around the mean of said Gaussian distribution” (Chu: par. 1630, right col., “The history…”; ‘The Kth component is viewed as a match if xt is within 2.5 standard deviations from the mean of a distribution, as done in [7,9]. If none of the distributions qualify, the least probable distribution is replaced by the current observation xt as the mean value with an initial high variance and a low prior.’).

Regarding claim 10 (dep. on claim 8), the combination of Seroussi in view of Li and Chu further teaches:
“wherein the energy level is a power spectral density, PSD, measurement” (power spectral density is well-known in the art, as evident by Shug et al. (US 20140188488 A1) (par. 0079; ‘Furthermore, the bit allocation process determines a power spectral density (PSD) distribution and a frequency-domain masking curve (based on a psychoacoustic model) for each channel. The PSD distribution and the frequency-domain masking curve are used to to determine a substantially optimal distribution of the available bits to the different normalized mantissas 314 of the audio frame.’).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Seroussi in view of Li as applied to claim 1 above, and further in view of Gurijala et al. (US 10043527 B1).

Regarding claim 11 (dep. on claim 1), Seroussi in view of Li does not expressly teach metadata, as in “transmitting the encoded audio frames of the audio signal together with metadata, wherein the metadata indicates the classification of the frequency sub-bands of the audio frames.”
Gurijala teaches:
“transmitting the encoded audio frames of the audio signal together with metadata, wherein the metadata indicates the classification of the frequency sub-bands of the audio frames” (col. 18, lines 31-35; ‘The input to the embedding system of FIG. 5 includes the message payload 800 to be embedded in an audio segment, the audio segment, and metadata about the audio segment (802) obtained from classifier modules, to the extent available.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the encoding taught by Seroussi in view of Li by incorporating Gurijala’s embedding system in order to provide audio classification parameters to facilitate embedding.

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Seroussi in view of Li, further in view of Visser et al. (US 20170278519 A1).

Regarding claim 14, the combination of Seroussi in view of Li further teaches:
“wherein the receiver is configured to receive an audio signal to be encoded, the audio signal comprising a plurality of successive audio frames, and; wherein the one or more processors are configured to: for each successive audio frame of the audio signal: represent the audio frame in a frequency domain with respect to a plurality of frequency sub-bands; classify the audio frame in each frequency sub-band as either background or foreground using a background model specific to the frequency sub-band; encode each successive audio frame of the audio signal, wherein a number of bits is allocated for each frequency sub-band of the audio frame, wherein the number of bits allocated for a frequency sub-band is higher if the audio frame is classified as foreground in the frequency sub-band than if the audio frame is classified as background in the frequency sub-band” (see claim 1).
Seroussi in view of Li and do not explicitly teach a microphone, as in:
“a microphone configured to record an audio signal”;
“an encoder configured to receive the audio signal from the microphone and encode the audio signal with variable bitrate, the encoder for encoding an audio signal with variable bitrate, the encoder comprising a receiver and one or more processors.”
Visser teaches:
“a microphone configured to record an audio signal” (par. 0031; ‘The audio capture device 102 includes a processor 104, a memory 106, a microphone array 108, and a transceiver 110. The memory 106 may include a non-transitory computer-readable medium that includes instructions executable by the processor 104.’).
“an encoder configured to receive the audio signal from the microphone and encode the audio signal with variable bitrate, the encoder for encoding an audio signal with variable bitrate, the encoder comprising a receiver and one or more processors” (par. 0047; ‘Determining the proximity of the sound sources 122, 124, 126 may enable the processor 104 to encode audio signals from closer sound sources (e.g., foreground audio signals) at higher bit-rates and audio signal from sound sources farther away (e.g., background audio signals) at lower bit-rates for encoding efficiency.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the audio input method taught by Seroussi in view of Li by incorporating the microphone taught by Visser in order to capture audio.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK VILLENA whose telephone number is (571)270-3191. The examiner can normally be reached 10 am - 6pm EST Monday through Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARK . VILLENA
Examiner
Art Unit 2658



/MARK VILLENA/           Examiner, Art Unit 2658