DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the applicants’ amendment filed on May 23, 2022 and wherein the Applicant has amended claims 1-11, 17-20, cancelled claims 14-16, 22, and added new dependent claims 23-24.
In virtue of this communication, claims 1-13, 17-21, 23-24 are currently pending in this Office Action.
With respect to the specification due to formality issues about the claimed “signal ratio”, as set forth in the previous Office Action, the Applicant argument, see the last two paragraphs of page 15 and paragraphs 1-3 of page 16 in Remarks filed on May 23, 2022 and wherein Applicant indicated that the amendment claims claimed “energy ratio” as recited in claims 4-7, 10-11, 21, are well-known in the art as disclosed in the background section of the specification para [0002], has been fully considered and the argument is persuasive. Therefore, the objection of specification due to the formality issues about the claimed “signal ratio”, as set forth in the previous Office Action, has been withdrawn.
With respect to the objection of claim 11 due to formality issue, as set forth in the previous Office Action, the Applicant’s amendment, and argument, see paragraph 5 of page 16 in Remarks filed on May 23, 2022, have been fully considered and the argument is persuasive. Therefore, the objection of claim 11 due to the formality issue, as set forth in the previous Office Action, has been withdrawn.
With respect to the rejection of claims  2-8, 10-11, 18-19, 21 under 35 USC §112(b), as set forth in the previous Office Action, the Applicant’s amendment, and argument, see paragraph 6 of page 16 in Remarks filed on May 23, 2022, have been fully considered and the argument is persuasive. Therefore, the rejection of claims  2-8, 10-11, 18-19, 21 under 35 USC § 112(b), as set forth in the previous Office Action, has been withdrawn.
The Examiner appreciates the explanation of the amendment and analyses of the prior arts, and however, although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993) and MPEP 2145.

Claim Objections
Claims 7-11 are objected to because of the following informalities: 
Claim 7 recites “the at least one memory and the computer program code” which should be -- the at least one non-transitory memory and the computer program code--  because the “memory” herein is referred back to “at least one non-transitory memory” as recited in parent claim 1. Claim 8-11 are objected due to the dependencies to claim 7.
Claims 8-11 are further objected for the at least similar reason as described in claim 7 above because claims 8-11 recite the similar deficient feature as recited in claim 7.
Claim 8 further recites “use the at least one metadata parameter associated with the at least one further audio signal …” which should be -- use the at least one further metadata parameter associated with the at least one further audio signal …--.
Claim 10 further recites “based on the at least one first user input” which should be -- based on the at least one first signal user input--.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(a):

(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

Claims 7-11 are rejected under 35 U.S.C. 112(a), as failing to comply with the written description requirement. The claims contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor(s) or joint inventor, or for pre-AIA  the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 7 recites”generate the metadata based on: determining at least one first energy ratio …; generating at least one first signal weight based on the at least one first energy ratio; determining at least one further energy ratio …; generating at least one further signal weight based on the at least one further energy ratio; comparing the at least one first signal weight and the at least one further signal weight; and generating the metadata based on the comparing …” and the parent claim recites “generate a metadata based on the comparing ” and wherein “compare the at least one value and the at least one further value to control combining of the at least one metadata parameter with the at least one further metadata parameter” and wherein “, etc. i.e., claim 7 claims “generate the metadata” is based on both “comparing the at least one first signal weight and the at least one further signal weight” as recited in claim 7 and “compare the at least one value and the at least one further value to control combining of …” as recited in parent claim 1. However, the original disclosure, including original specification, claims and drawings, has nowhere to disclose a sufficiently definite structure and written description in sufficient details for performing the claimed feature above. For example, the disclosure merely and only discloses generating the metadata is based on weighted average of meta from both streams (635 in fig. 6 and 735 in fig. 7, para [0020]-[0021], para [0033]-[0034]) with no disclosure of comparing the weights and also comparing “the first value” and “the further value”. Therefore, it is reasonable to convey to one skilled in the relevant art that the inventors. Claims 8-11 are rejected due to the dependencies to claim 7.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(B)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 2, 6-11 18-19 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention.
Claim 2 recites “to determine at the at least one metadata parameter” and “to determine at the at least one further metadata parameter” which do not make any sense and cause confusing because it is unclear whether it should read to “at least one metadata parameter” or to “at the metadata parameter” and thus, renders claim indefinite. Claim 2 further recites “cause the apparatus to at least one of: analyse …; or decode …; and to determine the at least one further … is configured to cause the apparatus to at least one of: analyse …; or decode …” which is further confusing because the claimed “at least one” also includes a case that is more than one and the claimed “analyse … or decode …” only includes one of “analyse …” and “decode” and thus, it is confusing because it is unclear what “more than one” in the claimed “at least one” is and thus, renders claim indefinite. Similar to the claimed “to determine the at least one further metadata parameter is configured to cause the apparatus to at least one of: analyse …; or decode …”. Note: the phrase “at least one of (a group) means one or more within the group.
Claim 6 recites “at least one primary metadata block comprising at least one spatial audio parameter associated with at least one of the at least one first audio signal or the at least one further audio signal, the at least one spatial audio parameter associated with at least one further audio signal, the at least one spatial audio parameter comprising …” which is confusing because it is unclear whether “the at least one spatial audio parameter associated with at least one further audio signal” is further comprised in “at least one primary metadata block” or further limit “the at least one spatial audio parameter comprising …” and thus, renders claim indefinite. Claim 6 is rejected for the at least similar reason as described in claim 2 above because claim 6 recites the similar deficient features as recited in claim 2. Similar to the claimed “the common metadata block, the common metadata block comprising …”.
Claim 7 further recites “generate the metadata based on: determining at least one first energy ratio …; generating at least one first signal weight based on the at least one first energy ratio; determining at least one further energy ratio …; generating at least one further signal weight based on the at least one further energy ratio; comparing the at least one first signal weight and the at least one further signal weight; and generating the metadata based on the comparing …” and the parent claim recites “generate a metadata based on the comparing ” and wherein “compare the at least one value and the at least one further value to control combining of the at least one metadata parameter with the at least one further metadata parameter”, as discussed in 35 U.S.C. 112(a) above, there is no disclosure for the claimed features above and thus, it is unclear how the claimed functions are performed, e.g., it is unclear how “comparing the at least one first signal weight and the at least one further signal weight” and “compare the at least one value and the at least one further value to control combining of …” are performed and thus, renders claim indefinite. Claims 8-11 are rejected due to the dependencies to claim 7.
Claim 18 is rejected for the at least similar reason as described in claim 2 above because claim 18 recites the similar deficient features as recited in claim 2.
Claim 19 further recites “adding the extracted metadata block as a secondary metadata block within the combined metadata” and wherein “the combined metadata” has an insufficient antecedent basis for the limitation in claim 19, which is confusing because it is unclear what “the combined metadata” is and it is unclear how “adding … within the combined metadata” is performed and thus, renders claim indefinite.

Claim Rejections - 35 USC § 112(d)
The following is a quotation of the fourth paragraph of 35 U.S.C. 112:
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claim 20 is rejected under 35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which they depend, or for failing to include all the limitations of the claim upon which they depend.
Claim 20 recites “wherein generating the metadata further comprises at least one of: : determining at least one first signal user input associated with the at least one audio signal; determining at least one further signal user input associated with the at least one further audio signal; generating at least one first signal weight based on the at least one first user input and at least further signal weight based on the at least one further signal user input; determining at least one first signal server input associated with the at least one metadata parameter and the at least one first audio signal; determining at least one further signal server input associated with the at least one further metadata parameter and the at least one further audio signal; 
generating at least one first signal weight based on the at least one first server input; generating at least one further signal weight based on the at least one further signal server input” which are not further limitation of the claimed subject matter as recited in parent claim 17 and thus, renders claim improper dependent form. For example, the generating “at least one first signal weight” and “at least one further signal weight” are not further limitation to “generating the metadata” of the parent claim 17.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 17, 19, 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Samuelsson et al (US 20170251321 A1, hereinafter Samuelsson) and in view of reference Laitinen et al. (“Converting 5.1 Audio Recordings to B-Format for Directional Audio Coding Reproduction”, IEEE ICASSP, May 2011, pp.61-64, IDS by May 24, 2022).
Claim 1: Samuelsson teaches an apparatus (title and a method in abstract ln 1-16, fig. 1, and a system having a processor executing the method as software stored in storage medium, para [0010]-[0011]) comprising: 
at least one processor (a processor, para [0010]); and 
at least one non-transitory memory including a computer program code (storage medium comprising a software program adapted for execution on the processor, para [0011]), the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to (the software program stored in the storage medium to be executed by the processor, para [0010]-[0011]):
determine for at least one first audio signal of an audio signal format (receiving an audio program having audio signal 110 of an audio object type received as audio program para [0020]-[0021] and as  spatially diverse audio signals, para [0025], generating downmix signal 111 in fig. 1), at least one metadata parameter (the received audio program also comprising object audio metadata OAMD, and then the bitstream metadata 121 is determined by the encoder 101 in fig. 1, para [0021]-[0022]; or upmix/JOC metadata 221 determined to reconstruct or decode 5.1 channel audio signals from the two channel downmix signal 111, para [0024]);
determine, for at least one further audio signal of a further audio signal format (audio content or system sound 130 of a set-top box STB, para [0022], as the claimed first audio signal 130, para [0055]), at least one further metadata parameter (the 130 mixed with or replaced to  a center channel of the modified downmix signal 112, or mixed equally with all of the audio channels the downmix signal 111, para [0055], e.g., mapping table of an audio object to a  channel L, R, C, … para [0041] and a loudspeaker position x-y-z in tables 1, 2, para [0052]; receiving a STB flag dealing with insertion of the system sound, para [0058], e.g., after flag is set, the bitstream metadata 121 is fade-weighted to a target bitstream metadata 122, para [0058], i.e., the flag is about the data about the system sound to be inserted, or metadata for the STB system sound inherently and weights for modifying the metadata from the bitstream metadata 121 to the modified bitstream metadata 122 is inherently determined for performing the fade-weighted change) and generate a metadata (combination of modified upmix metadata 223 and the modified object metadata 224 in fig. 2).
However, Samuelsson does not explicitly teach 
determine at least one value associated with the determined at least one metadata parameter;
determine at least one further value associated with the determined at least one further metadata parameter; 
compare the at least one value and the at least one further value to control combining of the at least one metadata parameter with the at least one further metadata parameter; and 
the generating the metadata is based on the comparing, wherein the generated metadata is configured to be associated with a combined audio signal formed from the at least one first audio signal and the at least one further audio signal in such a way that the generated metadata comprises at least one spatial audio parameter.
Laitinen teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-7, a system in fig. 4) and wherein 
determine for at least one first audio signal of an audio signal format (e.g., XYZeven signals after modified by a gain parameter for merging in fig. 4), at least one metadata parameter (e.g., parameter ψ applied to the XYZeven under even arrangement of loudspeakers, in fig. 4, 0°, ±72°, ±144°);
determine, for at least one further audio signal of a further audio signal format (e.g., XYZstandard signals after modified by a gain parameter for merging in fig. 4), at least one further metadata parameter (e.g., parameter 1- ψ applied to the XYZstandard under standard arrangement of loudspeakers in fig. 4, 0°, ±30°, ±110°);
determine at least one value associated with the determined at least one metadata parameter (via the equation 1, p.61, i.e., 
    PNG
    media_image1.png
    38
    160
    media_image1.png
    Greyscale
);
determine at least one further value associated with the determined at least one further metadata parameter (via the equation 1, p.61 and wherein 1-ψ is equivalent to ψ’=  
    PNG
    media_image2.png
    36
    104
    media_image2.png
    Greyscale
); 
compare the at least one value and the at least one further value (through ψ = 1 – ψ’, diffuseness ψ decreased while ψ’ increased and not greater than a constant one, represented by change of ψ value or ψ + ψ’ = 1 inherently from the formal 1 and ψ’=1- ψ) to control combining of the at least one metadata parameter with the at least one further metadata parameter (ψ*(XYZeven)+ ψ’*(XYZstandard) being used for calculating ψmerged by which μ is calculated in fig. 4, e.g., complete diffuseness field ψ=1, and thus, ψ’=0, while direct sound only ψ=0, and ψ’=1, through formula 1, p.61); and 
generate a metadata (the calculated μ as the claimed metadata) based on the comparing (based on the formula 1 having ψ and ψ’, i.e., comparison with respect to the constant one in the formula 1), wherein the generated metadata is configured to be associated with a combined audio signal formed from the at least one first audio signal and the at least one further audio signal (XYZout = μ*XYZmerged in fig. 4, i.e., μ is associated with XYZmerged or XYZout in fig. 4) in such a way that the generated metadata comprises at least one spatial audio parameter (∆ψ = (ψeven – ψmerged) in fig. 6, and directivity parameter β in formula 8 for calculating μ, p.64) for benefits of maintaining a unaltered sound spatial qualities after a conversion from one audio signal format to another in different loudspeaker arrangements (abstract), and section 1 INTRODUCTION).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein determining the at least one value associated with the determined at least one metadata parameter; determining the at least one further value associated with the determined at least one further metadata parameter; comparing the at least one value and the at least one further value to control combining of the at least one metadata parameter with the at least one further metadata parameter; and generating a metadata based on the comparing, wherein the generated metadata is configured to be associated with a combined audio signal formed from the at least one first audio signal and the at least one further audio signal in such a way that the generated metadata comprises at least one spatial audio parameter, as taught by Laitinen, to the at least one metadata parameter and the at least one further metadata parameter in the apparatus, as taught by Samuelsson, for the benefits discussed above.
Claim 17 has been analyzed and rejected according to claim 1 above.
Claim 3: the combination of Samuelsson and Laitinen further teaches, according to claim 1 above, wherein the generated metadata is configured to be generated based on: extracting at least one of the at least one metadata parameter and the at least one further metadata parameter (Samuelsson, extracting the bitstream metadata 121) as a metadata block (Samuelsson, extracting the flag from the STB, para [0058], and the channel information for the inserted first audio signal or SBT system sound signal, para [0056] and within the insertion unit 102 in fig. 1); and adding the extracted metadata block as a secondary metadata block within the generated metadata (Samuelsson, inserting or adding the information indicative of the flag to modify the bitstream metadata with the channel information and the object position information about the first audio signal or SBT system sound signal to form the modified JOC metadata 223 and OAMD metadata 224, including the information of table 1 and table 2, para [0040]- [0041], para [0051]-[0052] and encoded in metadata encoder 208 in fig. 2).
Claim 19 has been analyzed and rejected according to claims 17, 3 above and the combination of Samuelsson and Laitinen further teaches, according to claim 17 above, wherein generating further comprises: extracting at least one of the at least one metadata parameter and the at least one further metadata, as a metadata block; and adding the extracted metadata block as a secondary metadata block within the combined metadata (Laitinen, extracting ψ and 1- ψ in fig. 2 and applied the ψ and 1- ψ to the created virtual microphone signals in fig. 2).
Claim 23: the combination of Samuelsson and Laitinen further teaches, according to claim 1 above, wherein the generating the metadata further comprises: in response to the at least one first value being larger than the at least one further value by a first provided measure, generating the generated metadata using the at least one metadata parameter (Laitinen, ψ  = 1, in all diffuseness sound field, and thus, 1- ψ = 0, p.61, and thus, the μ is calculated via the formula 5, 7-9, p.63-64); and in response to the at least one further value being larger than the at least one first value by a second provided measure, generating the generated metadata using the at least one further metadata parameter (Laitinen, ψ  = 0, in all direct sound field with no diffuseness sound field, and thus, 1- ψ  = 1, p.61, and thus, the μ is calculated via the formula 5, 7-9, p.63-64);
Claim 24: the combination of Samuelsson and Laitinen further teaches, according to claim 23 above, in response to the at least one first value not being larger than the at least one further value by the first provided measure and the at least one further value not being larger than the at least one first value by the second provided measure, generating the generated metadata using a weighted average of the at least one metadata parameter and the at least one further metadata parameter (Laitinen, formula 1 for ψ which is not equal to zero and not equal to one, and then, via the formula 5, 7-9, p.63-64).

Claims 2, 4-5, 12-13, 18, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Samuelsson et al. (above) and in view of references Laitinen (above) and Galdo et al (US 20110216908 A1, hereinafter Galdo).
Claim 2: the combination of Samuelsson and Laitinen teaches all the elements of claim 2, according to claim 1 above, including wherein: to determine the at least one metadata parameter is configured to cause the apparatus to at least one of: analyse or decode the at least one first audio signal to determine at the at least one metadata parameter (Laitinen, via diffuseness analysis to derive ψ in fig. 4); except analyse or decode the at least one further audio signal to determine at the at least one further metadata parameter.
Galdo teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-20 and a system in fig. 1A) and wherein analyzing or decoding at least one further audio signal (via a means for determining 110 in fig. 1A) to determine at least one further metadata parameter (determining second diffuseness parameters and second DOAs in fig. 1A, para [0035]-[0036]) for benefits of achieving an efficient audio coding by bandwidth saving and simplifying front-end processing (merging multiple information to a single one for encoding and transmission, para [0013]) and by utilizing efficiency of the encoding/rendering scheme (DirAC for correcting ILD, ITD, and IC if the diffuseness is reproduced accurately, para [0003]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein analysing or decoding the at least one further audio signal to determine at least one further metadata parameter, as taught by Galdo, to the determining of at least one further metadata parameter in the apparatus, as taught by the combination of Samuelsson and Laitinen, for the benefits discussed above.
Claim 4: the combination of Samuelsson, Laitinen, and Galdo further teaches, according to claim 3 above, wherein the extracted metadata block (Samuelsson, within the insertion unit 102 in figs. 1-2 and Galdo, input to the estimator) comprises: at least one direction parameter (Galdo, DOA for one of the first and the second spatial audio stream, one of first/second wave direction measures and one of first/second wave field measures i.e., magnitude or energy related, para [0025]); at least one energy ratio parameter (Laitinen, in the formula 1, p.61, including |V|2 and wherein V is |X, Y, Z|, section 2, p.61); or at least one coherence parameter associated with at least one of the at least one first audio signal or the at least one further audio signal (Galdo, IC or interaural coherence for DirAC, para [0003]).
Claim 5: the combination of Samuelsson, Laitinen, and Galdo further teaches further teaches, according to claim 3 above, wherein the apparatus is configured, based upon the adding of the secondary metadata block (Samuelsson, within the insertion unit 102 in figs. 1-2 and Galdo, input to the estimator), to cause the apparatus to add at least one of: the at least one direction parameter (the discussion in claim 4 above, e.g., Galdo, DOA for other one of the first and the second spatial audio stream, other one of first/second wave direction measures and other one of first/second wave field measures i.e., magnitude or energy related, para [0025]); at least one energy ratio parameter; or at least one coherence parameter associated with at least one of the at least one first audio signal or the at least one further audio signal (Galdo, IC or interaural coherence for DirAC, para [0003]).
Claim 12: the combination of Samuelsson, Laitinen, and Galdo further teaches, according to claim 1 above, wherein the at least one first audio signal of the audio signal format is at least one of: 2-N channels of a spatial microphone array (Galdo, stereo or surround audio data, para [0073]); 2-N channels of multi-channel audio signal (Samuelsson, 2-channel or a 5.1 channel or 7.1 channel donwmix signal, para [0021]; 5.1 channel specified in the modified upmix metadata 223, para [0041], and Galdo, first audio channel stream and second spatial audio stream as stereo DirAC stream in fig. 1A, para [0074]); a first order Ambisonics signal (figure-of-eight audio pickup pattern, i.e., B-format pickup audio signals X, Y, Z, and W, para [0070]); a higher order ambisonics signal; or a spatial audio signal (Samuelsson, 5.1 configuration in the modified upmix metadata 223, para [0041] and Galdo, first audio channel stream and second spatial audio stream as inputs in fig. 1A).
Claim 13: the combination of Samuelsson, Laitinen, and Galdo further teaches, according to claim 1 above, wherein the at least one further audio signal of the further audio signal format is at least one of: 2-N channels of a spatial microphone array (Galdo, stereo or surround audio data, para [0073]); 2-N channels of multi-channel audio signal (Samuelsson, including left and right channels in STB sound signals, para [0056] and Galdo, first audio channel stream and second spatial audio stream as stereo DirAC stream in fig. 1A, para [0074]); a first order ambisonics signal (figure-of-eight audio pickup pattern, i.e., B-format pickup audio signals X, Y, Z, and W, para [0070]); a higher order ambisonics signal; or a spatial audio signal (Samuelsson, one or more audio signals 130 from STB, and rendered within a 3D rendering environment, para [0022], including left and right channels, para [0056]).
Claim 18 has been analyzed and rejected according to claims 17, 2 above.
Claim 21 has been analyzed and rejected according to claims 19, 4 above.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Samuelsson et al. (above) and in view of references, Laitinen (above), Galdo (above) and Takahashi et al (CA 2967249 A1, hereinafter Takahashi).
Claim 6: the combination of Samuelsson and Laitinen teaches all the elements of claim 6, according to claim 3 above,  including the generated metadata comprising at least one primary metadata block (Samuelsson, including modified object metadata 224 in fig. 2) comprising at least one spatial audio parameter associated with the at least one of the at least one first audio signal or the at least one further audio signal, the at least one spatial audio parameter associated with at least one of the at least one first audio signal or the at least one further audio signal (Samuelsson, including position of the audio object 113, 123, para [0048], loudspeaker position para [0050], object metadata for the first modified audio object para [0063]), and a common metadata block (Samuelsson, including upmix coefficient metadata or modified JOC metadata 223, para [0023] and Laitinen, ψ in the formula 1, p.61, fig. 4) associated with the at least one first audio signal or the at least one further audio signal comprising at least one non-spatial audio related parameter, common metadata block, (Samuelsson, upmix matrix, e.g., table 1, to mute or not mute channels, para [0039]-[0040]; upmix coefficients for reconstructing the first modified audio object being added to the modified upmix metadata 223, para [0063], and Laitinen, |W|2, etc., in the formula 1, p.61) and wherein the common metadata block comprising at least one of at least one energy ratio parameter (Laitinen, V=|X, Y, Z| and |V|2 representing energy ratio of each of directional signals X, Y, and Z in the formula 1, p.61), except wherein the at least one spatial audio parameter comprising at least one of: at least one direction parameter; or at least one coherence parameter associated with the at least one first audio signal or the at least one further audio signal; and the at least one non-spatial audio related parameter comprising at least one of: a version identifier; a time-frequency resolution identifier; or a number of directions identifier.
Galdo teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-20 and a system in fig. 1A) and wherein at least one primary metadata block is closed (received stream by the processor 130 in fig. 1A) comprising at least one spatial audio parameter associated with the at least one first audio signal and the at least one further audio signal (including DOA and wave direction of the first and the second spatial audio streams and the wave direction is included in wave representation in fig. 1A, para [0008]), the at least one spatial audio parameter comprising at least one: at least one direction parameter (the DOA and the wave direction of the first and the second spatial audio streams and the discussion in claim 4 above); or at least one coherence parameter associated with the at least one first audio signal and the at least one further audio signal (IC or interaural coherence for DirAC, para [0003]) for the similar benefit discussed in claim 4 above.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the at least one spatial audio parameter and wherein the at least one spatial audio parameter comprising at least one of: at least one direction parameter; or at least one coherence parameter associated with the at least one first audio signal or the at least one further audio signal, as taught by Galdo, to the at least one spatial audio parameter and the at least one primary metadata block in the apparatus, as taught by the combination of Samuelsson and Laitinen, for the benefits discussed above.
However, the combination of Samuelsson, Laitinen, and Galdo does not explicitly teach wherein the at least one non-spatial audio related parameter comprising at least one of: a version identifier; a time-frequency resolution identifier; or a number of directions identifier.
Takahashi teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-6 and a system in fig. 6) and wherein a common metadata block is disclosed (included in META1, META2, etc., in fig. 6) to be associated with the at least one first/further audio signal (related to the object sound source SA1 and SA2, para [0106]) comprising at least one non-spatial audio related parameter (including component group descriptor of fig. 11 in the Syntax, some fields in transport stream in fig. 12), the at least one non-spatial audio related parameter comprising at least one of: a version identifier (including stream type, elementary PID, component_tag, etc., in fig. 12); a time-frequency resolution identifier; or a number of directions identifier (number of audio in the Syntax block in fig. 11; each audio corresponding to a sound source having a direction defined by r’, θ’, Φ’ in fig. 5 and indicated in fig. 4) for benefits of achieving an improvement in sound presentation and sound quality (e.g., 3D audio, para [0005]) with dynamically moved video image (sound object moving with movement of  video image, para [0004]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein the at least one non-spatial audio related parameter comprising at least one of: a version identifier; a time-frequency resolution identifier; or a number of directions identifier, as taught by Takahashi, to the at least one non-spatial audio related parameter in the apparatus, as taught by the combination of Samuelsson, Laitinen, and Galdo, for the benefits discussed above.

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Samuelsson et al. (above) and in view of reference Oh et al (US 20090262957 A1, hereinafter Oh).
Claim 20: the combination of Samuelsson and Laitinen teaches all the elements of claim 20, according to claim 17 above, including wherein determining at least one further signal user input associated with the at least one further audio signal (Samuelsson, user is enabled to select particular video/audio content from a database of the distributer, para [0018]), except wherein generating the metadata further comprises at least one of: determining at least one first signal user input associated with the at least one audio signal; or generating at least one first signal weight based on the at least one first user input and at least one further signal weight based on the at least one further signal user input; determining at least one first signal server input associated with the at least one metadata parameter and the at least one first audio signal; determining at least one further signal server input associated with the at least one further metadata parameter and the at least one further audio signal; generating at least one first signal weight based on the at least one first server input; or generating at least one further signal weight based on the at least one further signal server input.
Oh teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-22 and a system in fig. 17) and wherein determining at least one first signal user input associated with at least one audio signal is disclosed (a user being select the preset metadata via the input unit 1530 to a control unit 1550 in fig. 15) and generating at least one first signal weight based on the at least one first user input and at least one further signal weight based on the at least one further signal user input (the preset metadata is applied to all data regions of a downmix signal according to the characteristic of sound source, para [0010], the present metadata including a level or gain setting and a position of an audio object in the downmix signal, para [0011] and multiple audio objects in the downmix being controlled, para [0007]) for benefits of achieving an improvement in satisfying the user’s expectation (para [0006]-[0008]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein determining at least one first signal user input associated with at least one audio signal and generating at least one first signal weight based on the at least one first user input and at least one further signal weight based on the at least one further signal user input, as taught by Oh, to the generating the combined metadata in the apparatus, as taught by the combination of Semuelsson and Laitinen, for the benefits discussed above.

Claims 7-11 are rejected under 35 U.S.C. 103 as being unpatentable over Samuelsson et al. (above) and in view of references Laitinen (above) and Groeschel et al (US 20130170672 A1, hereinafter Groeschel).
Claim 7: the combination of Samuelsson and Laitinen teaches all the elements of claim 7, according to claim 1 above, including wherein generating the metadata (Samuelsson, JOC and OAMD metadata in fig. 1 and the discussion in claim 1 above and Laitinen, the calculated μ and the discussion in claim 1 above), and 
determining at least one first signal energy ratio associated with the at least one metadata parameter associated with the at least one first audio signal (through the formula 1, p.61, and wherein 
    PNG
    media_image2.png
    36
    104
    media_image2.png
    Greyscale
 is the energy ratio including V in the formula about diffuseness parameter ψ);
generating at least one first signal weight based on the at least one first signal energy ratio (the formula 1, p.61, for calculating ψ to be applied to the three signals XYZeven in fig. 4);
generating at least one further signal weight (ψ’ = 1- ψ and applied to the three signals XYZstandard in fig. 4);
comparing the at least one first signal weight and the at least one further signal weight (comparing between direct sound represented by (1- ψ) and non-direction sound represented by ψ, e.g., ψ=0 and ψ’=1 at direct sound only, and ψ=1 and thus, ψ’=0 at diffuseness sound only, according to ψ+ ψ’=1, formula 1, section 2, p.61 and the discussion in claim 1 above); and
generating the metadata based on the comparing the at least one first signal weight and the at least one further signal weight (the μ is calculated in formulas 8-9, p.64, and the discussion in claim 1 above), except determining at least one further energy ratio associated with the at least one further metadata parameter associated with the at least one further audio signal and generating at least one further signal weight based on the at least one further signal energy ratio.  
Groeschel teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-16 and a system in fig. 2) and wherein the generating combined metadata is disclosed (scale factor is determined and applied to the mixed signal at 405-406 in fig. 4) to be based on: determining at least one first signal ratio associated with at least one metadata parameter associated with the at least one first audio signal (between mixing balance control and metadata scale factor inputted at step 401); generating at least one first signal weight based on the at least one first signal ratio (e.g. main metadata scale factor); determining at least one further ratio associated with at least one further metadata parameter associated with the at least one further audio signal; generating at least one further signal weight based on the at least one further signal ratio (including associated metadata scale factor); comparing the at least one first signal weight and the at least one further signal weight (comparing the balance and metadata scale factor selected at step 402 in fig. 4); and generating the combined metadata based on the comparing the at least one first signal weight and the at least one further signal weight (selected scaler of either main scale factor or associated metadata scale factor according to comparing the balance control and metadata scale factor at step 402) for benefit of achieving an improvement in trade-off between the audio sound effects and sound quanlity by flexibly balancing sound level of each component of the mixed audio signals (para [0007]-[0009]). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein determining at least one further energy ratio associated with at least one further metadata parameter associated with the at least one further audio signal; generating at least one further signal weight based on the at least one further signal ratio, as taught by Groeschel, to the generating the metadata in the apparatus, as taught by the combination of Samuelsson and Laitinen, for the benefits discussed above.
Claim 8 has been analyzed and rejected according to claims 7 above.
Claim 9 has been analyzed and rejected according to claims 7 above.
Claim 10 has been analyzed and rejected according to claims 7 above.
Claim 11 has been analyzed and rejected according to claims 7 above.

Response to Arguments

Applicant's arguments filed on May 23, 2022 have been fully considered and but are moot in view of the new ground(s) of rejection necessitated by the applicant amendment. The Examiner has thoroughly reviewed Applicants' arguments but firmly believes that the cited references to reasonably and properly meet the claimed limitations.
In the response to this office action, the examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589.  The examiner can normally be reached on Monday-Friday 6:30am-4:00pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 571-272-7848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LESHUI ZHANG/
Primary Examiner, Art Unit 2654