DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the applicants’ amendment filed on September 11, 2021 and wherein the Applicant has amended claims 10, 15, 23 and added new claims 43-44, and claims 2-9, 35-39, 42 remain withdraw status.
In virtue of this communication, claims 1-44 are currently pending in this Office Action.
With respect to the rejection of claims 10-34 under 35 USC §112(b), as set forth in the previous Office Action, the Applicant’s amendment, and argument, see paragraph 7 of page 19 in Remarks filed on September 11, 2021, have been fully considered and the argument is persuasive. Therefore, the rejection of claims 10-34 under 35 USC § 112(b), as set forth in the previous Office Action, has been withdrawn.
The Examiner appreciates the explanation of the amendment and analyses of the prior arts, and however, although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993) and MPEP 2145.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to 

Claims 1, 10-12, 25, 27-28, 30-34, 40, 43-44 are rejected under 35 U.S.C. 103 as being unpatentable over ISO23003-1 (“ISO/IEC 23003: 2006(E), Part 1: MPEG Surround”, 75. MPEG Meeting; 16-01-2006 -20-01-2006; Bangkok; no. N7947, 3 March 2006, XP030014439, ISSN:0000-0341, pages 1-289, provided in IDS filed on March 6, 2017) and in view of reference  ISO23003-2 ("ISO/IEC 23003-2, 1st edit, Part 2: Spatial Audio Object Coding SAOC”, 1st edition, 2010-10-01, pages 1-138).
Claim 1: ISO23003-1 teaches a multi-channel audio decoder (fig. 3 in page 10) for providing at least two output audio signals (reconstructed original in fig. 3) on the basis of an encoded representation (dashed line outputted from stereo coder in fig. 3),
wherein the multi-channel audio decoder comprises a renderer (part of SAC up-mix processing in fig. 3) configured to render a plurality of decoded audio signals (output from stereo decoder in fig. 3), which are acquired on the basis of the encoded representation (encoded signal outputted from stereo coder in fig. 3), to a multi-channel target scene (including a scene having Left channel area, Right channel area, center channel area and defined by signals Vn,kOTT1, Vn,kOTT2, and Vn,kTTTo, generated by M1 or TTT0 operation in fig. 26) in dependence on one or more rendering parameters (including ancillary data through SAC up-mix processing in fig. 3 or including CPC/CLDTTT, ICCTTT, etc., in fig. 26) which define a rendering matrix (e.g., at least pre-matrix M1, etc. and details in section 6.5, Calculation of pre-matrix M1), to acquire a plurality of rendered audio signals (intermediate audio signals outputted from applying the M1 to the input audio signals XL0 and XR0 in fig. 25 and implemented in TTT0 processing in fig. 26 and output the signals Vn,kOTT1, Vn,kOTT2, and Vn,kTTTo, to be decorrelated), and
1, d2, d3 from the decorrelators D1OOT, D2OOT, D0TTT and implemented in OOT1, OOT2, OTT0 by taking the intermediate audio signals Vn,kOTT1, Vn,kOTT2, and Vn,kTTTo, outputted from the TTT0 processing) from the rendered audio signals (the intermediate signals signals Vn,kOTT1, Vn,kOTT2, and Vn,kTTTo, by OTTx modules, x=0, 1, 2 in fig. 26 and by applying M1 to the xL0 and xR0 as input audio signals in fig. 25), and 
wherein the multi-channel audio decoder comprises a combiner (other part of the SAC upmix processing in fig. 3 and within the OTTx modules, see “MPEG Surround – The ISO/MPEG Standard for Efficient and Compatiable Multi-Channel Audio Coding” by Herre et al, AES, 122nd Convention 2007 May 5-8 Vienna, Austria, p.4, basic principle of the OTT module in fig. 3) configured to combine the rendered audio signals (inherently within the OTTx modules, mixing the input signal and the decorrelated signal of the input signal, see Herre’s fig. 3), or scaled version thereof (within the OTTx modules having a decorrelator in fig. 26 and inherently energy distribution controlled by CLDx, ICCx, etc., and see Herre’s fig. 3, p.4, col 2, last two paragraphs and p.5, col 1, the first paragraph), with the one or more decorrelated audio signals (input signals including signals Vn,kOTT1, Vn,kOTT2, and Vn,kTTTo in figs. 25-26), to acquire the at least two output audio signals (including L/Ls, R/Rs, C/LFE in fig. 26);
wherein the multi-channel audio decoder is configured to acquire the decoded audio signals using a parametric reconstruction (ancillary data that used to perform stereo coding and reverse operation at the decoder in fig. 3 and details in fig. 4 and as the application of spatial parameters in fig. 10).
However, ISO23003-1 does not explicitly teach wherein the decoded audio signals are reconstructed object signals, and wherein the multi-channel audio decoder is configured to derive the reconstructed object signals from one or more downmix signals using a side information.
ISO23003-2 teaches an analogous field of endeavor by disclosing multi-channel audio decoder (subtitle: Spatial Audio Object coding SAOC and Scope in page 7 and 5.1 Introduction, decoder side, in p.13 and right side of fig. 2 in page 14) and wherein combining (through adders in fig. 15) a scaled version of the rendered audio signals (output from Gn,k by taking Xn,k in fig. 14) with the one or more decorrelated audio signals (output signal of 
    PNG
    media_image1.png
    21
    27
    media_image1.png
    Greyscale
 which is post matrix processing of the decorrelator which outputs 
    PNG
    media_image2.png
    23
    31
    media_image2.png
    Greyscale
 in fig. 15) and wherein the decoded audio signals are reconstructed object signals (Output from SAOC decoder in fig. 2), and wherein the multi-channel audio decoder is configured to derive the reconstructed object signals from one or more downmix signals (downmix signal as input to the SAOC decoder in fig. 2) using a side information (SAOC bitstream as ancillary or side information in fig. 2 of page 14 and 5.1 Introduction, p.13) for a very efficient coding scheme by lowering the transmission rate of additional parametric data and higher compatibility with the MPEG surround multi-channel situation (1. Scope in page 7).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the reconstructed object signals and wherein the multi-channel audio decoder is configured to derive the reconstructed object signals from one or more downmix signals using a side information, as taught by ISO23003-2, to 
Claim 40 has been analyzed and rejected according to claim 1 above.
Claim 10: the combination of ISO23003-1 and ISO23003-2 further teaches, according to claim 1 above, wherein the combiner is configured to combine the rendered audio signals 
    PNG
    media_image3.png
    26
    17
    media_image3.png
    Greyscale
 with the one or more decorrelated audio signals W, to acquire the at least two output audio signals 
    PNG
    media_image4.png
    26
    18
    media_image4.png
    Greyscale
 according to

    PNG
    media_image5.png
    31
    144
    media_image5.png
    Greyscale

wherein P is a mixing matrix which is applied to the rendered audio signals 
    PNG
    media_image3.png
    26
    17
    media_image3.png
    Greyscale
, and
wherein M is a mixing matrix which is applied to the one or more decorrelated audio signals W (ISO23003-2, 
    PNG
    media_image6.png
    137
    757
    media_image6.png
    Greyscale
and P is mapped to GMod above and M is to P2 above and 7.6.3.3 Stereo Preprocessing, p.52-53).
Claim 11: the combination of ISO23003-1, ISO23003-2, further teaches, according to claim 10 above, wherein the multi-channel audio decoder is configured to adjust at least one out of the mixing matrix P and the mixing matrix M such that correlation characteristics or desired covariance characteristics (ISO23003-2, GMod is determined upon at least covariance matrix of the predicted signal and 7.6.3.3 Steroe Preprocessing, p.52-53).
Claim 12: the combination of ISO23003-1 and ISO23003-2 further teaches, according to claim 10 above, wherein the multi-channel audio decoder is configured to jointly compute the mixing matrix P and the mixing matrix M (ISO23003-2, through covariance matrix 
    PNG
    media_image7.png
    22
    12
    media_image7.png
    Greyscale
 and variable G, and 7.6.3.3 Stereo Preprocessing in p.53-54).
Claim 25: the combination of ISO23003-1 and ISO23003-2 further teaches, according to claim 1 above, wherein the combiner is configured to combine the rendered audio signals 
    PNG
    media_image3.png
    26
    17
    media_image3.png
    Greyscale
 with the one or more decorrelated audio signals W, to acquire the output audio signals 
    PNG
    media_image4.png
    26
    18
    media_image4.png
    Greyscale
 according to 

    PNG
    media_image8.png
    26
    165
    media_image8.png
    Greyscale

Wherein P=Pdry and M=Pwet wherein 
    PNG
    media_image9.png
    82
    487
    media_image9.png
    Greyscale

wherein 
    PNG
    media_image10.png
    29
    40
    media_image10.png
    Greyscale
is a covariance matrix of the rendered audio signals 
    PNG
    media_image4.png
    26
    18
    media_image4.png
    Greyscale
 and wherein 
    PNG
    media_image11.png
    25
    42
    media_image11.png
    Greyscale
is an estimated covariance matrix of the one or more decorrealted audio signals after the matrix Pwet has been applied (ISO23003-2, see the gain vector gvec in p.53).
Claim 27: the combination of ISO23003-1 and ISO23003-2 further teaches, according to claim 10 above, wherein the multichannel audio decoder is configured to set the mixing matrix P to be an identity matrix, or a multiple thereof, and to compute the mixing matrix M (ISO23003-1, 6.4.4.2 vector definitions for the 7-2-7 configurations and ISO23002-2, P2 is given in p.53).
Claim 28: the combination of ISO23003-1 and ISO23003-2 further teaches, according to claim 27 above, wherein the multi-channel audio decoder is configured to determine the mixing matrix M such that a difference 
    PNG
    media_image12.png
    24
    27
    media_image12.png
    Greyscale
 between a desired covariance matrix C and a covariance matrix 
    PNG
    media_image13.png
    22
    26
    media_image13.png
    Greyscale
, which is defined as is equal to

    PNG
    media_image14.png
    31
    114
    media_image14.png
    Greyscale
, or approximates (), a covariance (ISO23003-2, 
    PNG
    media_image15.png
    21
    18
    media_image15.png
    Greyscale
 in p.53 and GD as M and G*D* as MH)

    PNG
    media_image16.png
    26
    87
    media_image16.png
    Greyscale

, wherein the desired covariance matrix C is defined as 
	
    PNG
    media_image17.png
    31
    112
    media_image17.png
    Greyscale

(ISO23003-2, Fl,m,x in fig. 58 and Al,m mapped to R and also target covariance F in 7.6.3.2 Rendering of object energies F=MrenEM*ren in p.49) wherein R is a rendering matrix (ISO23003-2, Mren in p.49), wherein Ex is an object covariance matrix, and wherein Ew is a covariance matrix of the one or more decorrelated signals and wherein 
    PNG
    media_image13.png
    22
    26
    media_image13.png
    Greyscale
 is a covarnace matrix of the rendered audio signals (see the discussion of claim 10 above).
Claim 30 has been analyzed and rejected according to claim 10 above.
Claim 31: the combination of ISO23003-1 and ISO23003-2 further teaches, according to claim 10 above, wherein the multi-channel audio decoder is configured to combine the rendered audio signals with the one or more decorrelated audio signals such that only autocorrelation values or autocovariance values of rendered audio signals are modified while cross-correlation values or cross-covariance values are left unchanged (ISO23003-1, only 
Claim 32 has been analyzed and rejected according to claims 10 and 27above (about identity matrix).
Claim 33 has been analyzed and rejected according to claims 30 and 10, 12 above.
Claim 34 has been analyzed and rejected according to claims 33 and 25 above.
Claim 43 has been analyzed and rejected according to claim 1 above and the combination of ISO23003-1 and ISO23003-2 further teaches wherein one rendered audio signal is associated with each of a plurality of loudspeakers of the target scene, except for the one or more low frequency effect loudspeaker (ISO23003-1, fig. 26, p.92, the speaker configuration after M2 processing and ISO23003-2, similar in fig. 14, p.50). 
Claim 44 has been analyzed and rejected according to claims 1 and 43 above.

Claims 13-24, 26, 29, 41 are rejected under 35 U.S.C. 103 as being unpatentable over ISO23003-1 (above) and in view of references ISO23003-2 (above) and Koppens et al (US 20110264456 A1, hereinafter Koppens).
Claim 41: the combination of ISO23003-1 and ISO23003-2 teaches all the elements of claim 41, according to claim 40 above, except a non-transitory digital storage medium comprising a computer program to perform a method by the computer program runs on a computer.
Koppens teaches an analogous field of endeavor by disclosing multi-channel audio decoder (title and abstract, ln 1-2 and element 12 in fig. 1) and a non-transitory digital storage medium 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the non-transitory digital storage medium and the computer program for performing the method when the computer program runs on the computer, as taught by Koppens, to the method, as taught by the combination of ISO23003-1 and ISO23003-2, for the benefits discussed above.
Claim 13: the combination of ISO23003-1, ISO23003-2, and Koppens teaches all the elements of claim 13, according to claim 10 above, including a covariance matrix 
    PNG
    media_image18.png
    23
    24
    media_image18.png
    Greyscale
of the at least two output audio signals 
    PNG
    media_image4.png
    26
    18
    media_image4.png
    Greyscale
 approximates or equals a desired covariance matrix C (ISO23003-1, through mix matrix M2 in fig. 25, signal Y including yL, yLs, yR, yRs, yC, yLFE in fig. 25 and ISO23003-2, a desired correlation characteristics or covariance characteristic, desired covariance matrix Fl,m, 7.7.2.1 Mono to binaural “x-1-b” processing mode in p.56, and matrix Al,m and p.58 for desired covariance matrix Fl,m and Koppens, target covariance matrix derived according to target ICC and p.8, para 86-87 and through setting mixing ratio between dry and wet rendering path, i.e., decorrelated signal path as wet rendering path, by changing rotator except acquiring a combined mixing matrix F = [P M].
It would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to have recognized choosing an equivalent format of a matrix operation or matrix combination or separation is a matter of designer's choice, for example, using matrix representation of such as 
    PNG
    media_image4.png
    26
    18
    media_image4.png
    Greyscale
=[P M]*[ 
    PNG
    media_image19.png
    19
    15
    media_image19.png
    Greyscale
  W]T, i.e., 
    PNG
    media_image4.png
    26
    18
    media_image4.png
    Greyscale
=F*Q and F = [P M] and Q= [ 
    PNG
    media_image19.png
    19
    15
    media_image19.png
    Greyscale
  W]T or using formula representation 
    PNG
    media_image5.png
    31
    144
    media_image5.png
    Greyscale
, for example, using the matrix representation is better for using matrix operators, while using formula representation is better to understand the relationships among variables, etc.
Claim 14: the combination of ISO23003-1, ISO23003-2, and Koppens further teaches, according to claim 13 above, wherein the multi-channel audio decoder is configured to determine the combined mixing matrix F such that the covariance matrix

    PNG
    media_image20.png
    26
    120
    media_image20.png
    Greyscale

(ISO23003-2, covariance matrix 
    PNG
    media_image15.png
    21
    18
    media_image15.png
    Greyscale
 in p.53 and desired covariance matrix Fl,m,x, p,58, and Koppen, XX*, YX*, YY* in p.8, para 88) is equal to the desired covariance matrix

    PNG
    media_image21.png
    29
    125
    media_image21.png
    Greyscale

(ISO23003-2, Cl,m, p.58, and Koppens, target coherence matrix F=AEA* and p.5, para 50 and leads to target covariance matrix YY*=ASS*A* and p.8, para 87) wherein Es is a covariance matrix of a signal S combining the rendered audio signals 
    PNG
    media_image19.png
    19
    15
    media_image19.png
    Greyscale
 and the one or more decorrelated audio signals W (ISO23003-2, El,m is a covariance matrix, and Koppens, SS* is covariance matrix of a signal S and p.8, para 87), which is defined as

    PNG
    media_image22.png
    65
    79
    media_image22.png
    Greyscale

wherein Ex is an object covariance matrix (ISO23003-2, signals applied to the matrix Al,m and p.58, and Koppens, E is defined in p.6, para 53 and the discussion in claim 13 above).
Claim 15: the combination of ISO23003-1, ISO23003-2, and Koppens further teaches, according to claim 10 above,
wherein the combiner is configured to combine the rendered audio signals 
    PNG
    media_image19.png
    19
    15
    media_image19.png
    Greyscale
 (ISO23003-1, L, Ls, R, Rs, C, LFE outputted from predecorrelator matrix M1 in e.g., 6.4.5.1 Introduction, fig. 30 in p.106, and ISO23003-2, output from Gn,k in fig. 15, p. 56 and Koppens, inputted to Gn,k in fig. 4) with the one or more decorrelated audio signals W (ISO23003-1, output signals from decorrelator in fig. 30, p.106 and ISO23003-2, 
    PNG
    media_image23.png
    26
    35
    media_image23.png
    Greyscale
 in fig. 15 of p. 56 and Koppens, output signals from decorerlator in fig. 2), to acquire the at least output audio signals 
    PNG
    media_image4.png
    26
    18
    media_image4.png
    Greyscale
 (ISO23003-2, signal 
    PNG
    media_image24.png
    27
    38
    media_image24.png
    Greyscale
 in fig. 15 of p.56 and Koppens, 
    PNG
    media_image24.png
    27
    38
    media_image24.png
    Greyscale
 in fig. 2) according to

    PNG
    media_image25.png
    25
    162
    media_image25.png
    Greyscale

or according to
    PNG
    media_image26.png
    23
    166
    media_image26.png
    Greyscale

or according to 
    PNG
    media_image27.png
    29
    193
    media_image27.png
    Greyscale

wherein P is a mixing matrix which is applied to the rendered audio signals 
    PNG
    media_image19.png
    19
    15
    media_image19.png
    Greyscale
, and
wherein M is a mixing matrix which is applied to the one or more decorrelated audio signals W (ISO23003-1, M1 is structured by R1l,R(k), G1l,R(k), Hl,R(k), p.115-116, and M2 is structured by an equation having element R2, p.131, section 6.5.3 and M3 applied to all signals including decorrelated audio signals and rendered audio signals L, R, C, and M3 is structured by an 3l,m, p.138, section 6.5.4.2 Calculation of R3 for an arbitrary configuration and see the discussion in claim 10 above),
wherein Adry is a first correction matrix or a first adjustment matrix (ISO23003-1, M1xM3 and M2xM3 in fig. 25 and thus, M3 becomes correction matrix for direct signal and decorrelated signals and ISO23003-2, Gn,k in fig. 15 of p.56 and Koppens, Gn,k in fig. 4), wherein Awet is a second correction matrix or a second adjustment matrix (P2n,k in fig. 4).
Claim 16 has been analyzed and rejected according to claims 15 and 11 above and the combination of ISO23003-1, ISO23003-2, and Koppens further teaches wherein the multi-channel audio decoder is configured to adjust at least one out of the mixing matrix P and the mixing matrix M such that correlation characteristics or covariance characteristics of the at least two output audio signals Z or of audio signals acquired by a mixing of Z and W using P and M approximate or equal the desired correlation characteristics or desired covariance characteristics (discussion of claim 11 above and Koppens, through the Gn,k and/or P2n,k through output from the SAOC parameter processing unit 42 in fig. 4).
Claim 17 has been analyzed and rejected according to claims 15 and 12 above.
Claim 18 has been analyzed and rejected according to claims 15 and 13 above.
Claim 19 has been analyzed and rejected according to claims 18 and 14 above.
Claim 20: the combination of ISO23003-1, ISO23003-2, and Koppens further teaches, according to claim 15 above, wherein the multi-channel audio decoder is configured to determine the first correction matrix such that a contribution of the rendered audio signals onto the at least two output audio signals is limited (Koppens, Gn,k is obtained according to first rendering prescription Gl,m depending on the inter-object cross correlation information, the P2l,m depending on the inter-object cross correlation information, the object level information, the downmix information, the rendering information and the HRTF parameters and p.11, para 139).
Claim 21 has been analyzed and rejected according to claims 15 and 20 above.
Claim 22: the combination of ISO23003-1, ISO23003-2, and Koppens further teaches, according to claim 21 above, wherein the properties of the rendered audio signals, and/or of the decorrelated audio signals, and/or of the desired output audio signals, and/or of the mixed rendered audio signals, and/ or the mixed decorrelated audio signals are energy properties, or correlation properties, or covariance properties (see the discussion in claims 15 and 20 above and the matrix is related to energy inherency).
Claim 23 has been analyzed and rejected according to claim 15 above and the combination of ISO23003-1, ISO23002-2, and Koppens further teaches a threshold value (Koppens, threshold for coherence and p.4, para 64)
Claim 24 has been analyzed and rejected according to claim 23 above (Koppens, constant threshold value).
Claim 26 has been analyzed and rejected according to claims 15 and 25 above (also see ISO23003-2, about d36 in p.50).
Claim 29 has been analyzed and rejected according to claims 28 and 26 above.

Claims 1, 40-41, 43-44 are rejected under 35 U.S.C. 103 as being unpatentable over Engdegard et al. (“Spatial Audio Object Coding SAOC – The Upcoming MPEG Standard on Parametric Object Based Audio Coding”, Audio Engineering Society Convention Paper 7377, the 124th Convention, Amsterdam, The Netherlands, May 17-20, 2008, pages 1-15) and in view of reference Seefeldt et al (US 20080126104 A1, hereinafter Seefeldt).
Claim 1: Engdegard teaches a multi-channel audio decoder (title and abstract, 1-17 and fig. 1(a) and fig. 1(b) and fig.2(a) and fig.2(b)) for providing at least two output audio signals (M channel audio signals in fig.1(a)/(b) and outputted to multiple speakers in figs.2(a)/(b)) on the basis of an encoded representation (generated and outputted downmix signals from the object encoder in figs.1(a)/(b) and downmix signals with SAOC bitstream in figs.2(a)/(b)),
wherein the multi-channel audio decoder comprises a renderer (including a part of renderer and mixer combination in fig. 1a/b and details in figs. 2a/2b) configured to render a plurality of decoded audio signals (through the mixer/renderer by taking decoded objects 0, 1, …, N in fig.1(a) and through the downmix transcoder in fig.2(b)), which are acquired on the basis of the encoded representation (downmix signals in figs.1(a)/(b) and figs.2(a)/(b)), to a multichannel target scene (through scene rendering engine in figs.2(a)/(b)) in dependence on one or more rendering parameters which define a rendering matrix (generated rendering matrix from the generates rendering matrix in figs.2(a)/(b) and vocal at a center position and background music at the left and right position in fig. 3, p.8, col 1, last paragraph and p.8, col 2, the first paragraph), to acquire a plurality of rendered audio signals (output from the downmix transcoder in fig2(b)),

wherein the decoded audio signals are reconstructed object signals (reconstructed audio objects 1, 2, …, N compared to the inputted audio objects Obj 1, 2, …, N in figs. 1(a)/(b)), and
wherein the multi-channel audio decoder is configured to derive the reconstructed object signals from one or more downmix signals using a side information (the decoded audio objects 1, 2, …, N based on the downmix signals and object meta data as side information).
However, Engdegard does not explicitly teach wherein the multi-channel audio decoder comprises a decorrelator configured to derive one or more decorrelated audio signals from the rendered audio signals, and does not explicitly teach wherein the multi-channel audio decoder comprises a combiner configured to combine the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, to acquire the at least two output audio signals.
Seefeldt teaches an analogous field of endeavor by disclosing a multi-channel audio decoder (title and abstract, ln 1-12 and decoder in fig. 3 and p.2, para 15) and wherein the multi-channel audio decoder comprises a renderer (including upmixer 36 in fig. 3) configured to render a plurality of decoded audio signals (signal zi i=1, …, n and representing left, center, right, etc. and p.4, para 28), which are acquired on the basis of the encoded representation (audio information from the bitstream unpacker 32 in fig. 3), to a multi-channel target scene in dependence on one or more rendering parameters (including spatial parameters in fig. 3), to i and above discussion and p.4, para 28), wherein the multi-channel audio decoder comprises a decorrelator (including Decorrelation Filter 38 in fig. 3) configured to derive one or more decorrelated audio signals from the rendered audio signals (by taking the reconstructed multichannel signal from the upmixer 36 in fig. 3), and wherein the multi-channel audio decoder comprises a combiner (including the adder 46 in fig. 3) configured to combine the rendered audio signals (reconstructed multichannel signal from the upmixer 36 and combined with the scaled decorelated audio signals outputted from the Decorrelation Filters 38 in fig. 3 and via the combining function 46 in fig. 3), or a scaled version thereof (scaled by multiplier 44 in fig. 3), with the one or more decorrelated audio signals (Zj, j=1, 2, …, n, as outputs from the Upmxer 36 in fig. 3), to acquire the at least two output audio signals (the output signals from the combining function 46 in fig. 3) for benefits of achieving an improvement of perception of the original interchannel correlation by performing adjustable covariance of the multichannel audio signals and adjustable decorrelated audio signals for a desired perception of the orginal interchannel correlation (fig. 3 and p.1, para 5-6).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein the multi-channel audio decoder is configured to derive one or more decorrelated audio signals from the rendered audio signals, and wherein the multi-channel audio decoder is configured to combine the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, to acquire the at least two output audio signals, as taught by Seefeldt, to the rendering audio signals in the multi-channel audio decoder, as taught by Engdegard, for the benefits discussed above.
Claim 40 has been analyzed and rejected according to claim 1 above.
Claim 41: the combination of Engdegard and Seefeldt further teaches, according to claim 40 above, a non-transitory digital storage medium comprising a computer program to perform the method of claim 40 by the computer program runs on a computer (a computer-readable storage medium with computer program and computer system to run the program and perform the method and p.5, para 36).
Claim 43 has been analyzed and rejected according to claim 1 above and the combination of Engdegard and Seefeldt further teaches wherein one rendered audio signal is associated with each of a plurality of loudspeakers of the target scene, except for the one or more low frequency effect loudspeaker (Engdegard, mixer/render in fig. 1a output channel signals to loudspeakers in fig. 2a/2b and Seefeldt, upmixer outputs the left, the right, and the center channel signals, i.e., reconstructed original left, right, and center channel signals,, para [0028]). 
Claim 44 has been analyzed and rejected according to claims 1 and 43 above.

Response to Arguments

Applicant's arguments filed on September 11, 2021 have been fully considered, but are not persuasive because: 
As to the argument about the claimed decorrelation and combined with the input of the decorrelation and the applicant argued: “ISO/IEC23003-1 fails to disclose that a decorrelator derives one or more decorrelated audio signals from the rendered audio signals and that a combiner combines the rendered audio signals, or a scaled version thereof, with the one or 1 outputs “VL, VR, VC, VOOT1, …, VOOT2, VOOT0. Accordingly, the mix matrix M2 actually combines VL, VR, VC and decorrelated versions of VOOT2, VOTT1, and VTTTO”, other than, combining VOOT2, VOOT1, VTTTO and the decorrelated versions of VOTT2, VOTT1, and VTTTO, as asserted in paragraphs 5-6 of page 21 and paragraph 1 of page 22 in Remarks filed on September 11, 2021. 
In response to the argument above, the Office respectfully disagrees because (1) it is well-known in the art that decorrelation is purported to widen the audio signal in space while the decorrelated signal added to the original signal that is decorrelated (e.g., “Spatial Audo Object Coding SAOC – The Upcoming MPEG Standard on Parametric Ojbect Based Audio Coding” in the reference list, by Engdegard, fig. 3; ““US 20110194712 A1 by Potard, fig. 2, the decorrelation filter A/B applied to left/right and combined to the left/right, respectively by Potard; etc.), and there is rare case to combine unrelated decorrelation signal to an original audio signal, which would not have clear channels or objects, but all the same sounds in the space and also, it is rare to decorrelate a residual signal which doesn’t make any sense to widen a residual signal; and (2) in the instant OTT (One-to-Two) blocks of ISO/23003-1, as opposite of TTO block (Two-to-One), each OTT inherently has its own decorrelation and then mix or combine, (e.g., US 20110106543 A1 by Jaillet et al, OTT (TTO-1) in fig. 5), fig. 25 of ISO/23003-1 can be detailed in tree structure of fig. 26, e.g., TTT0 can represent M1, while OTT1, OTT2, OTT0 can represent M2 operations and output from TTT0 must be represented as direct L signal and/or indirect L signal for later L and Ls, R direct signal and indirect signal for later R and Rs, and center signal C for later C and LFE and thus, 5-2-5 tree structure and within OTT1, OTT2, 0, there are decorrelation operations for each of input signals such as indirect L, indirect R, and indirect signal C, represented by each of VOTT0, VOTT1, and VOTT2, or VTTT0, VOTTres1, and VOTTres2, as inputs to individual decorrelators (ISO/23003-1, last paragraph of page 92) and combining thereafter through W expression in page 93 (e.g., D1(VOTT1) as output of a decorrelator and then combined with scaled VOOTres1, etc., in w expression of page 93; also indicated by Jaillet above and also Seefeldt’s OTT box, as indicated in a figure in the previous office action, p.24), i.e., decorreator combines the decorrelator’s output with the decorrelator’s input with scaling factors, and thus, assuming that VOOT2, VOTT1, and VTTTO as input to the decorrelator have nothing to do with decorrelator combination above is not persuasive.

    PNG
    media_image28.png
    335
    601
    media_image28.png
    Greyscale

The applicant further challenged the second prior art ISO23003-2, and argued “in the concept of ISO23003-2, there is no derivation of any decorrelated signals from a rendered audio signal, wherein the rendered audio signal is obtained by a renderer on the basis of a plurality of decoded audio signals which are reconstructed object signals. Rather, in the concept 
In response to the argument above, the Office further respectfully disagrees because (1). The claimed “reconstructed object signals” in the feature “the decoded audio signals are reconstructed object signals” is merely alternative wording or symbol of the “decoded audio signal” because claim 1 fails to recite what “object” is and fails to recite any processing relatd to “object” and thus, there is no different weight to give “constructed object signals” from the “decoded audio signals”; and (2) ISO23003-2 is about SAOC, i.e., Spatial Audio Object Coding and thus, “downmix signal” and “downmix signal” processing are audio object related (figs. 1-2 of page 14) and wherein the MPS decoder outputs the downmix and SAOC bitstream for further processing, i.e., rendering including downmix processing based on SAOC parameter processing (including rendering matrix and HRTF parameters in fig. 2), and wherein the downmix processing is more detailed by including decorrelation processing (fig. 15), and thus, rendered signals as downmix signals are decorrelated and combined (in fig. 15), which is also consistent with the argued feature above and thus, the argument above is also not persuasive.
As to the prior art rejection of claim 1 according to prior arts Engdegard and Seefeldt under 35 USC 103(a), the applicant further argued “According to Engdegard, the rendering follows the object decoding, see for example, Fig. 1a, such that there is no derivation of any decorrelated signals from the rendered audio signals”, no “one or more rendering parameters which define a rendering matrix” because Engdegard’s “the object metadata is used in the object decoder. In other words, the information about the object level differences and about 
In response to the argument above, the Office further respectively disagrees because Engdegard clearly teaches a render/mixer (within MPEG Surround in fig. 2a/2b, and also including an object decoder in fig. 1a) and wherein the rendering/mixer is based on the further rendering information (fig. 1a) provided by Generating rendering matrix (in fig. 2a/2b), which is consistent with the clamed feature “a renderer configured to render a plurality of decoded audio signals, which are …, to a multi-channel target scene in dependence on one or more rendering parameters which define a rendering matrix” (e.g., the rendering matrix is upon inputted object position information and the inputted playback configuration information in fig. 2a/2b), although other parameters such as OLD, IOC, DLD, and downmix gains, object energies, may be used in the object decoder, and thus, the argument above is not persuasive.
The applicant further challenged the combination of Engdegard and Seefeldt because Engdegard’s “embodiment of Fig. 1a, a decorrelation is only considered in the object decoder, but not in the mixer/renderer”, because “Seefeldt discloses that there is first an upmixer 36 and that there are also decorrelation filters 38. However, this concept of Seefeldt is completely 
In response to the argument above, the Office respectively disagrees because, as discussed in the office action above, modifying Engdegard’s rendering processing with Seefeldt’s decorrelation and combining, has nothing to do with Engdegard’s decorrelation applied in the downmix signal of the encoder (Engdegard, fig. 3). In fact, the left, right, and center channel signals are formed after upmixing (recovered left, right, and center channel signals, para [0028]) and it would have been obvious for one having ordinary skill in the art to have modify Engdegard’s rendering with the decorrelation and combining due to the different rendering information including object position information and playback configuration to be realized at the decoder side from the downmix information at the encoder side, although the decoded signal has some “intrinsic decorrelation” due to performed at encoder side, and this is 
The applicant further challenged Office’s previous responses to the applicant argument and argued ISO/IEC 23003-1’s “VL, VR, and Vc are not signals which are rendered to a multi-channel target scene. Rather, the … the target scene will comprise signals YL, YLS, YR, YRS, and Yc, as well as the low frequency effect signal YLFE”, paragraph 2 of page 26 in Remarks filed on September 11, 2021.
In response to the argument above, the Office disagrees because (1) claim 1 fails to recite what “multi-channel target scene” is and thus, VL, VR, and VC are characterized with “target scene” because VL, VR, VC, as discussed above, are the essentially same signals as L, R, C for decorrelation processing and at least, VL contains most left channel signal, VR comprises right channel signal, and VC to most center channel signal and it is well-known in the art that left channel signal, right channel signal, and center channel signal are spatial meaning signals on the left, the right, and the center locations, i.e., “target scene” and thus, the Office maintains the position as responded in the previous office action.
The applicant further, about Engdegard’s L0 and R0 signals, argued “Engdegard are only downmix signals which are obtained by the stereo transcoding …, it sould be noted that the stereo based transcoding has no relevance with respect to the present invention, since the stereo based transcoding does not reconstruct any object signals”, as asserted in paragraph 2 of page 27 in Remarks filed on September 11, 2019.

Referred to the indication of “VOTT2, VOTT1, VTTT0” with VL, VR, VC, see the paragraph 5 of page 27 and paragraph 1 of page 28 in Remarks filed on September 11, 2021, after the further reading of the document ISO/23003-1, the are different, because ISO/23003-1 clearly reads VL, VR, VC are direct left, direct right, and direct center signal, while VOTT2, VOTT1, VTTT0 are corresponding to their diffuse or indirect channel signals left, right, and center, which are input to the associated and individual decorrelator and then weighted combined with VOTT2, VOTT1, VTTT0 represented by VOTTres2, VOTTres1, VTTT0 in w expression (page 29), as discussed above.
Back to the Engdegard and Seefeldt, the applicant interpreted “Engdegard processes downmix signals, rather than rendered signls rendered to a target scene” about section 3.3.2 of Engdegard and figs. 2b. In response to the interpretation above, it should combining Engdegard’s fig. 1a and wherein the “MPEG Surround” in fig. 2a/2b should be interpreted as a whole to object decoder plus mixer/renderer and wherein “metadata” in fig. 1a is equivalent to 

    PNG
    media_image29.png
    495
    815
    media_image29.png
    Greyscale


    PNG
    media_image30.png
    419
    811
    media_image30.png
    Greyscale


    PNG
    media_image31.png
    273
    716
    media_image31.png
    Greyscale


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589.  The examiner can normally be reached on Monday-Friday 6:30am-4:00pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 571-272-7848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LESHUI ZHANG/
Primary Examiner, Art Unit 2654