DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Election/Restrictions
Applicants’ election of Invention I, Claims 1 to 6, 14, 16 to 18, and 20, in the reply filed on 10 December 2019 is acknowledged.  Because Applicants did not distinctly and specifically point out the supposed errors in the restriction requirement, the election has been treated as an election without traverse (MPEP § 818.01(a)).
Claims 7 to 13, 15, and 19 are withdrawn from further consideration pursuant to 37 CFR 1.142(b) as being drawn to a nonelected invention, there being no allowable generic or linking claim.  Election was made without traverse in the reply filed on 10 December 2019.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

Claims 16 to 17 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hellmuth et al. (U.S. Patent Publication 2009/0125314).
Regarding independent claims 16 to 17, Hellmuth et al. discloses a method and device for audio decoding, comprising:
1 to 14N, downmixer 16 provides the SAOC decoder 12 with side-information including SAOC-parameters including object level differences (OLD) and inter-object cross correlation parameters (IOC) (¶[0034]: Figure 1); downmixer 16 computes object level differences for each object i as OLDi (¶[0040]: Figure 1); downmixer 16 computes inter-object cross correlation parameters IOCi,j as a similarity measure of corresponding time/frequency tiles of audio objects i and j (¶[0041]: Figure 1); the individual objects 141 to 14N, then, represent at least a “first object” and a “second object”, where each of individual objects 141 to 14N has side information OLDi and IOCi,j for each audio object i = 1 to N;
1 to 14N; downmixer 16 performs this computation in a time/frequency resolution which may be decreased relative to an original time/frequency resolution by a certain amount, where this certain amount is signaled to the decoder side within side information 20 by respective syntax elements bsFrameLength and bsFreqRes (¶[0039]: Figure 2); 
“an object separator configured to separate the at least one audio object from the downmix signal using first object-specific side information in accordance with the object-specific time/frequency resolution, wherein first object-specific side information for at least one other audio object within the downmix signal comprises a different object-specific frequency resolution” – SAOC encoder 10 receives as an input N objects, i.e., 1 to 14N (¶[0035]: Figure 1); downmixer 16 computes SAOC-parameters from input audio signals 141 to 14N; downmixer 16 performs this computation in a time/frequency resolution which may be decreased relative to an original time/frequency resolution by a certain amount, where this certain amount is signaled to the decoder side within side information 20 by respective syntax elements bsFrameLength and bsFreqRes (¶[0039]: Figure 2); SAOC decoder 12 comprises an upmixer 22 which receives the downmix signal 18 and side information 20 in order to recover and render the audio signals 141 to 14N (“using the first object-specific side information”) (¶[0036]: Figure 1); the side information 58 comprises level information 60 describing spectral energies of the audio signal of the first type and the audio signal of the second type in a first predetermined time/frequency resolution 42 (¶[0054] - ¶[0055]: Figures 2 and 3); there are, then, a plurality of N audio objects (“for at least one other audio object”) having a time/frequency resolution described by received side information.

Regarding claim 20, Hellmuth et al. discloses a program having program code for executing when running on a processor (¶[0014] - ¶[0015]); an implementation may include software relating to a computer program that can be stored on a computer-readable medium (¶[0204]).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have 

Claims 1, 4, 6, 14, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Hellmuth et al. (U.S. Patent Publication 2009/0125314) in view of Disch (U.S. Patent Publication 2011/0106529).
Concerning independent claims 1 and 14, Hellmuth et al. discloses a method and device for audio decoding, comprising:
“an audio decoder for decoding a multi-object audio signal comprising a downmix signal and side information, the side information comprising first object-specific side information for at least one audio object indicative of an object-specific time-frequency region, and object-specific time/frequency resolution information indicative of an object-specific time/frequency resolution of the first object-specific side information for the at least one audio object in the at least one time/frequency region, as well as second object-specific side information for at least one other audio object in at least one time/frequency region” – an audio decoder decodes a multi-audio-object signal having an audio signal of a first type and an audio signal of a second type encoded therein, the multi-audio-object signal having a downmix signal and side information, the side information having level information of the audio signals of the first and second types in a predetermined time/frequency resolution, and a residual signal specifying residual level values in a second predetermined time/frequency resolution (Abstract; ¶[0010]); in order to enable SAOC decoder 12 to recover the individual objects 141 to 14N, downmixer 16 provides the SAOC decoder 12 with side-information including SAOC-parameters including object level differences (OLD) and inter-object cross correlation parameters (IOC) (¶[0034]: Figure 1); mixer 16 computes object level differences for i (¶[0040]: Figure 1); downmixer 16 computes inter-object cross correlation parameters IOCi,j as a similarity measure of corresponding time/frequency tiles of audio objects i and j (¶[0041]: Figure 1); the individual objects 141 to 14N, then, represent at least a “first object” and a “second object”, where each of individual objects 141 to 14N has side information OLDi and IOCi,j for each audio object i = 1 to N;
“the audio decoder comprising: an object-specific time/frequency resolution determiner configured to determine the object-specific time/frequency resolution from the side information for the at least one audio object” – audio decoder 50 is dedicated for decoding a multi-audio-object signal having an audio signal of a first type and an audio signal of a second type encoded therein; a multi-audio-object signal consists of a downmix signal 56 and side information 58; the side information 58 comprises level information 60 describing spectral energies of the audio signal of the first type and the audio signal of the second type in a first predetermined time/frequency resolution 42 (¶[0054] - ¶[0055]: Figures 2 and 3); means for computing prediction coefficients based on level difference information 52 uses side information 58 to determine an object-specific time/frequency resolution; means for computing prediction coefficients is based on inter-correlation information comprised by side information 58 (¶[0056]: Figures 2 and 3); downmixer 16 computes SAOC-parameters from input audio signals 141 to 14N; downmixer 16 performs this computation in a time/frequency resolution which may be decreased relative to an original time/frequency resolution by a certain amount, where this certain amount is signaled to the decoder side within side information 20 by respective syntax elements bsFrameLength and bsFreqRes (¶[0039]: Figure 2); 
1 to 14N (“using the object-specific side information”) (¶[0036]: Figure 1); means for upmixing 54 is configured to upmix the downmix signal 56 based on a time varying downmix prediction (¶[0057]: Figure 3); the side information 58 comprises level information 60 describing spectral energies of the audio signal of the first type and the audio signal of the second type in a first predetermined time/frequency resolution 42 (¶[0054] - ¶[0055]: Figures 2 and 3).
Concerning independent claims 1 and 14, Hellmuth et al. does not clearly disclose object-specific side information that includes a fine structure and a coarse structure in the limitations of “wherein the first object-specific side information is a fine structure object-specific side information for the at least one audio object in the at least one time/frequency region, and wherein the first side information further comprises coarse object-specific side information for the audio least one audio object in the at least one time/frequency region, the coarse object-specific side information being constant within the at least one time/frequency region” and “wherein the fine structure object-specific side information describes a difference between the coarse object-specific side information and the at least one audio object”.  However, Hellmuth et al. discloses side information 58 for a residual signal 62 specifying the residual level values.  (¶[0055] -¶[0056]: Figure 3)  Note that Figures 13D to 13E provide syntax of residual configurations and coarse quantization parameters.  (Figures 13D to 13E)  
Concerning independent claims 1 and 14, even if these limitations of object-specific side information that includes a fine structure and a coarse structure is not disclosed by Hellmuth et al., this is taught by Disch.  Generally, Disch teaches an apparatus and method for converting an audio signal into a parameterized representation.  Specifically, Figures 3b and 3c illustrate a decomposition of information into coarse and fine structure information.  (¶[0052] - ¶[0053]: Figures 3b and 3c)  One embodiment provides a parameterized representation of the coarse structure and an energy value representing or derived from the fine structure, where this parameterized representation is transmitted from an analyzer to a synthesizer.  (¶[0096]: Figure 3b)  A signal has a coarse structure related to on and offset of musical events, etc., and fine structure related to faster modulation frequencies.  Since this fine structure is representing the roughness properties of an audio signal, auditory roughness can be modified by removing the fine structure and maintaining the coarse structure.  An envelope is decomposed into coarse and fine structure, where the fine structure (residual) is obtained as a difference between the original signal and the coarse envelope (“a difference between the coarse object-specific side information and the at least one audio object”).  (¶[0100] - ¶[0101])  Figure 3c illustrates extracting a coarse structure from a band pass signal.  The coarse structure of the band pass signal is Disch, then, teaches extracting a coarse structure and a fine structure from an audio signal, where a fine structure represents a residual obtained by a difference between the coarse structure and the original signal.  Hellmuth et al.’s side information 58 for audio objects 141 to 14N includes level information 60 describing spectral energies of the audio signal as ‘coarse information’, and a residual signal 62 as ‘fine information’ that is a difference between an original audio signal and coarse information as taught by Disch.  Hellmuth et al.’s level information 60, corresponding to ‘coarse information’, comprises a normalized spectral energy scalar value per object and time/frequency tile.  (¶[0055] - ¶[0056]: Figures 3 to 4)  Inherently, Hellmuth et al.’s level information 60, corresponding to ‘coarse information’, “is constant within the at least one time/frequency region” corresponding to a time/frequency tile for any given one of audio object 141 to 14N.  An objective is to provide extracted information that is perceptually meaningful and produces perceptually smooth results avoiding undesirable artifacts.  (¶[0040])  It would have been obvious to one having ordinary skill in the art to include fine structure and coarse structure as taught by Disch as side information of Hellmuth et al. for a purpose of extracting perceptual meaningful and smooth results that avoid undesirable artifacts.

Concerning claim 4, Hellmuth et al. discloses that downmixer 16 computes SAOC-parameters from input audio signals 141 to 14N as determined by filter bank time slots 34 and subband decomposition; each frame is divided up into time/frequency tiles illustrated by dashed lined 42 (¶[0039]: Figure 2); means 82 for spectrally decomposing i.e., each small square in Figure 2.
Concerning claim 6, Hellmuth et al. discloses: “a downmix signal time/frequency transformer configured to transform the downmix signal within the time/frequency region from a downmix signal time/frequency resolution to at least the object-specific time/frequency resolution of the at least one audio object to acquire a re-transformed downmix signal” – downmixer 16 computes SAOC parameter in a time/frequency resolution which may be decreased relative to the original time/frequency resolution by a certain amount, where this certain amount is signaled to the decoder side within the side information by syntax elements (¶[0039]: Figures 1 to 2); audio decoder 50 decodes a multi-audio-object signal consisting of a downmix signal 56 and side information 58; side information 58 comprises level information 60 describing spectral energies of the audio signal of the first type and the audio signal of the second type in a first predetermined time/frequency resolution, e.g., time/frequency resolution 42; level 
“wherein the object separator is configured to separate the at least one audio object from the downmix signal at the object-specific time/frequency resolution” – audio decoder 50 decodes a multi-object signal of a first type and an audio signal of a second type; an audio signal of a first type can be a background object (BGO) and an audio signal of the second type can be a foreground object (FGO) (¶[0054]: Figure 3);
“an inverse time-frequency transformer configured to time/frequency transform the at least one audio object with the time/frequency region from the object-specific time/frequency resolution back to a common t/f resolution or the downmix signal time/frequency resolution” – means 52 may use time varying downmix prescription information comprised by side information 58 (¶[0056]: Figure 3); means 54 for upmixing may use the time varying downmix prescription to upmix the downmix signal (¶[0057]: Figure 3); implicitly, upmixing using time varying side information provides audio objects with “a common t/f resolution or the downmix signal time/frequency resolution”; broadly, if an inverse time-frequency transformer only transforms an audio object to “the downmix signal time/frequency resolution”, then this does not appear to be different from a time/frequency resolution received by means 52 of audio decoder 60.
Concerning claim 18, Hellmuth et al. discloses a program having program code for executing when running on a processor (¶[0014] - ¶[0015]); an implementation may include software relating to a computer program that can be stored on a computer-readable medium (¶[0204]).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Hellmuth et al. (U.S. Patent Publication 2009/0125314) (“Hellmuth et al. (‘314)”) in view of Disch (U.S. Patent Publication 2011/0106529) as applied to independent claim 1 above, and further in view of Hellmuth et al. (U.S. Patent Publication 2012/0177204) (“Hellmuth et al. (‘204)”).
Hellmuth et al. (‘314) does not include an equation for an estimated covariance matrix as ei,jη,κ = (fsliη,κ fsljη,κ)1/2 fsci,jη,κ, wherein ei,jη,κ is the estimated covariance of audio objects i and j, fsliη,κ and fsljη,κ are the object-specific side information of the audio objects i and j for fine-structure time-slot η and fine-structure (hybrid) sub-band k, and  fsci,jη,κ is an inter object correlation information of the audio objects i and j.  However, a similar equation for an estimated covariance matrix is taught for an audio decoder using object-related parametric information by Hellmuth et al. (‘204).  Specifically, Hellmuth et al. (‘204) teaches a known equation for a covariance matrix ei,j = (OLDi OLDj)1/2 IOCi,j, where OLDi and IOCi,j are object parameters obtained from parametric side information.  (¶[0178] - ¶[0179] and ¶[0189] - ¶[0190])  Here, covariance matrix ei,j corresponds to covariance matrix ei,jη,κ, side information object parameters OLDi and OLDj correspond to fine structure side parameters fsliη,κ and fsljη,κ, and inter object correlation IOCi,j corresponds to inter object correlation fsci,jη,κ.  That is, Hellmuth et al. (‘204) uses what is generally the same known equation, but does not apply it to fine-structure side information.  An objective is to provide an upmix signal representation in dependence on a downmix signal and object-related parametric information to obtain a good tradeoff between audio quality and bitrate requirements to avoid excessive resource load  Hellmuth et al. (‘204) for fine structure side information of a residual signal of Hellmuth et al. (‘314) for a purpose of avoiding excessive resource load for object-related parametric information.

Response to Arguments
Applicants’ arguments filed 08 March 2022 have been fully considered but they are not persuasive. 
Applicants do not provide any amendments to the claims, but present arguments traversing the prior rejection of independent claims 16 to 17 as being anticipated under 35 U.S.C. §102(a)(1) by Hellmuth et al. (U.S. Patent Publication 2009/0125314) and of independent claims 1 and 14 as being obvious under 35 U.S.C. §103 over Hellmuth et al. (U.S. Patent Publication 2009/0125314) in view of Disch (U.S. Patent Publication 2011/0106529).  Claims 7 to 13, 15, and 19 remain withdrawn pursuant to a restriction requirement.
Applicants note that there is an objection to independent claims 1 and 14 for the limitation of “the fine structure object-specific side information” and “a fine structure first object-specific side information”, but maintain that these limitations are proper.  The examiner now agrees that these limitations are proper, and the objection is withdrawn.
Applicants present arguments directed against the rejection of independent claims 16 to 17 as being anticipated under 35 U.S.C. §102(a)(1) over Hellmuth et al.  There are a variety of problems with these arguments provided by Applicants.  
Hellmuth et al. fails to disclose two different side information for two different objects, wherein the side information for the first object comprises a fine and a coarse portion, and these features are now part of claims 16 to 17.  Applicants then are implying that the limitations of independent claims 1 and 14 directed to a first object-specific side information is a fine structure object-specific side information and a coarse object-specific side information are now incorporated by amendment into independent claims 16 to 17.  However, there is no amendment to independent claims 16 to 17 that incorporates any limitations directed to fine structure side information and coarse structure side information.  Applicants arguments, then, directed to the fine structure side information and coarse structure side information being absent from Hellmuth et al., are not relevant to a rejection of independent claims 16 to 17 under 35 U.S.C. §102(a)(1).  Applicants are attempting to read limitations into these independent claims that are simply not there, and are misrepresenting that any amendment was actually presented.  
Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Secondly, Applicants appear to be simply reiterating arguments that were already considered in the prior Office Actions, and do not consider the response to those arguments in the prior Office Actions.  Specifically, Applicants’ main argument here is that Hellmuth et al. only provides side information that belongs to a multi-audio-object signal and that the side information is not object-specific side information.  Applicants consider level information for different objects disclosed by Hellmuth et al., but appear Hellmuth et al.’s description of a normalized spectral energy scalar value per object, which would itself appear to be an object-specific side information.  Here, Applicants somewhat cryptically say that they still interpret this as one side information for object-specific ‘portions’.  However, Applicants continue to allege that nowhere in Hellmuth et al. is it described that a first side information should be present for the first object and a second side information for the second object.  
This argument was already extensively considered in the prior Office Actions.  Specifically, one of the most telling disclosures in Hellmuth et al. for object-specific side information is at ¶[0040], which describes object level differences for each object i, as given as OLDi.  Plainly, OLDi is disclosed as object-specific side information because there is an OLDi for each object i.  The prior rejections have pointed out that OLDi is object-specific side information simply because the mathematics of the subscript i is for each particular object.  Similarly, the prior rejections pointed out that Hellmuth et al. discloses side information of inter-object cross-correlation parameters IOCi,j at ¶[0041].  Again, IOCi,j is side information that is object-specific for each of objects i and j.  That is, if there are four objects i, j = 1, 2, 3, 4 then there are at least six IOC’s – IOC1,2,  IOC1,3, IOC1,4, IOC2,3, IOC2,4, and IOC3,4, where IOC1,2, IOC1,3, and IOC1,4 are side information specific to object 1, IOC1,2, IOC2,3, and IOC2,4 are side information specific to object 2, IOC1,3, IOC2,3, IOC3,4 are side information specific to object 3, and IOC1,4, IOC2,4, and IOC3,4 are side information specific to object 4.  Hellmuth et al., then, discloses object-specific side information at least for level information of OLDi, and for inter-object cross-i,j.  Applicants have not specifically considered this argument.  
Similarly, Hellmuth et al., ¶[0055], states: “In particular, the level information 60 may comprise a normalized spectral energy scalar value per object and time/frequency tile.”  So, this passage is clear, too: level information is side information that is specific per each object, where this level information corresponds to OLDi.
Generally, Hellmuth et al. discloses that there are N input objects representing audio signals 141 to 14N.  (¶[0034] - ¶[0037]: Figure 1)  Moreover, Hellmuth et al., ¶[0035], states, “In order to enable the SAOC decoder 12 to recover individual objects 141 to 14N, downmixer 16 provides the SAOC decoder 12 with side information including SAOC-parameters including object level differences (OLD), inter-object cross correlation parameters (IOC), downmix gain values (DMG), and downmix channel level differences (DCLD).”  Hellmuth et al., at ¶[0042] - ¶[0045], then represents downmix gain values as DMGi  and downmix channel level differences as DCLDi, where the subscript i again indicates that downmix gain values (DMG) and downmix channel level differences (DCLD) are object-specific side information for an object i.  
Then Applicants present arguments directed against independent claims 1 and 14 as being obvious under 35 U.S.C. §103 over Hellmuth et al. in view of Disch.  Applicants begin by arguing that these claims are allowable because there is nothing that discloses object-specific side information by Hellmuth et al.
However, the rejection has already dealt with this argument above at least because OLDi, IOCi,j, DMGi, and DCLDi are object-specific side information as indicated by the subscript i for each object i.  
Hellmuth et al., and that the question is if these features are taught by Disch, and can be incorporated into Hellmuth et al.  Applicants state that Disch discloses AM information in the context of a coarse/fine structure, but contend that this reference fails to disclose that this information is object-specific.  
This argument is not persuasive at least because a coarse structure and a fine structure are linked to a residual signal at ¶[0101] of Disch: 
[0101] To decompose the envelope into coarse and fine structure, nonlinear methods can be utilized. For example, to capture the coarse AM one can apply a piecewise fit of a (low order) polynomial. The fine structure (residual) is obtained as the difference of original and coarse envelope. The loss of AM fine structure can be perceptually compensated for--if desired--by adding band limited `grace` noise scaled by the energy of the residual and temporally shaped by the coarse AM envelope.

Similarly, Hellmuth et al. links the time/frequency resolution to a residual signal in the Abstract:
An audio decoder for decoding a multi-audio-object signal having an audio signal of a first type and an audio signal of a second type encoded therein is described, the multi-audio-object signal having a downmix signal and side information, the side information having level information of the audio signals of the first and second types in a first predetermined time/frequency resolution, and a residual signal specifying residual level values in a second predetermined time/frequency resolution, the audio decoder having a processor for computing prediction coefficients based on the level information; and an up-mixer for up-mixing the downmix signal based on the prediction coefficients and the residual signal to obtain a first up-mix audio signal approximating the audio signal of the first type and/or a second up-mix audio signal approximating the audio signal of the second type.

One skilled in the art could understand that if there are first and second time/frequency resolutions in Hellmuth et al., then one of these time/frequency resolutions is going to be finer than the remaining time/frequency resolution.  That is, if TFR1 ≠ TFR2, then either TFR1 ≥ TFR2 or TFR1 ≤ TFR2.  The smaller of the two time/frequency resolutions is then equivalent to a fine structure time/frequency resolution and the larger of the two time/frequency resolutions is then equivalent to a coarse structure time/frequency Disch teaches that the residual can be considered a fine structure in the residual signal of Hellmuth et al.
	The combination is proper under a rationale of KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007) as “combining prior art elements according to known methods to yield predictable results” or “use of known technique to improve similar devices (methods, or products) in the same way”.  Here, Disch teaches a general concept of obtaining a fine structure derived from a coarse structure using a residual signal, and it would be predictable to obtain a fine structure from a coarse structure using a residual signal according to this known technique for first and second time/frequency resolutions at ¶[0012] - ¶[0016] of Hellmuth et al.
	Applicants then argues that even if there are a background object and a foreground object in Hellmuth et al., the side information does not have a link to a background object or a foreground object.
	However, the examiner maintains that a background object and a foreground object are merely one conceivable embodiment of a plurality of objects in Hellmuth et al.  That is, there could be a foreground object and a background object as a special simple case of N objects in Hellmuth et al., where N = 2, and i = 1, 2, but in a general case there can be any number of objects N, i = 1, 2, 3, . . . , N in a multi-audio-object signal.
Applicants’ arguments are not persuasive.  There are no new grounds of rejection.  Accordingly, this rejection is properly FINAL.




Conclusion
THIS ACTION IS MADE FINAL.  Applicants are reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached on Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  






/MARTIN LERNER/Primary Examiner
Art Unit 2657           
March 14, 2022