DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
This office action is in response to the applicant preliminary amendment on November 15, 2020 and wherein the applicant amended claims 1-14, 17-20, canceled claims 15-16, and added new dependent claims 21-22.
In virtue of this communication, claims 1-14, 17-22 are currently pending in this Office Action.
In the response to this office action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Specification
Specification fails to disclose any embodiment to represent what the “ratio” is in the claimed feature “at least one first signal ratio associated with at least one metadata parameter associated with the at least one first audio signal”, “at least one further ratio associated with at least one further metadata parameter associated with the at least one further audio signal” as recited in claim 7-11, “at least one energy ratio parameter” as recited in claims 4-5, 21. As it is well-known in the art that a “ratio” is dealing with two variables, but the spec fails to disclose what other one variable of the “ratio” is with respect to “at least one first audio signal”, “at 
Appropriate correction is required.

Claim Objections
Claim 11 is objected to because of the following informalities: 
Claim 11 recites “wherein the the at least one memory and …” which should be -- wherein the --.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(B)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 2-8, 10-11, 18-19, 21 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention.
Claim 2 recites “the determined at least one of the at least one first and further audio signal is configured to cause the apparatus to at least one of: …” and wherein the term “the determined at least one of …” and “the at least one first and further audio signal” have insufficient antecedent bases for the limitations, which causes confusing because it is unclear Claim 2 further recites “cause the apparatus to at least one of: receive at least one of the at least one metadata parameter and the at least further metadata parameter” and the parent claim 1 recites “cause the apparatus at least to … determine … at least one metadata parameter” which is further confusing because it is unclear whether “the at least one metadata parameter” is determined by “the apparatus” or “received” by “the apparatus” and thus, renders claim indefinite, and wherein “the at least further metadata parameter” has an insufficient antecedent basis for the limitation in claim 2. 
Claim 3 is rejected for the at least similar reason as described in claim 2 above because claim 3 recites the similar deficient feature as recited in claim 2, for example, claim 3 recites “the at least one first and further audio signal” which has an insufficient antecedent basis for the limitation in claim 3. Claims 4-6 are rejected due to the dependencies to claim 3.
Claim 4 is further rejected for the at least similar reason as described in claim 2 above because claim 4 recites the similar deficient feature as recited in claim 2, for example, claim 4 recites “the at least one first and further audio signal” which has an insufficient antecedent basis for the limitation in claim 4. 
Claim 5 is further rejected for the at least similar reason as described in claim 2 above because claim 5 recites the similar deficient feature as recited in claim 2, for example, claim 5 recites “the at least one first and further audio signal” which has an insufficient antecedent 
Claim 6 is further rejected for the at least similar reason as described in claim 2 above because claim 6 recites the similar deficient feature as recited in claim 2, for example, claim 6 recites “the at least one first and further audio signal” and “the at least one first/further audio signal” which have insufficient antecedent bases in claim 6. Claim 6 further recites “a number of directions identifier” which is confusing because it is unclear whether “identifier” is herein referred to “a number of directions” or “a number of … identifier” for “directions” and thus, further renders claim indefinite.
Claim 7 recites “generating at least one further signal weight based on the at least one further signal ratio” and wherein “the at least one further signal ratio” has an insufficient antecedent basis for the limitation in claim 7, which causes confusing because it is unclear what is “the at least one further signal ratio” is and it is unclear what is based on for “generating at least one further signal weight” and thus, renders claim indefinite. Claims 8-11 are rejected due to the dependencies to claim 7.
Claim 8 further recites “use the at least one metadata parameter associated with the at least one first audio signal” and “use the at least one metadata parameter associated with the at least one further audio signal” which is confusing because it is unclear whether “the at least one metadata parameter” is associated with “the at least one first audio signal” or with the at least one further audio signal” and thus, renders claim indefinite.
Claim 10 recites “generate the at least one first signal weight based on the at least one first signal ratio further based on the at least one first user input energy” and wherein “the at least one first user input energy” has an insufficient antecedent basis for the limitation in claim 10, which causes confusing because it is unclear what “the at least one first user input energy” is and it is unclear what is further based on for generating “the at least one first signal weight” and thus, renders claim indefinite.
Claim 11 recites “generate the at least one first signal weight based on the at least one first signal ratio further based on the at least one first server input energy” and wherein “the at least one first server input energy” has an insufficient antecedent basis for the limitation in claim 10, which causes confusing because it is unclear what “the at least one first server input energy” is and it is unclear what is further based on for generating “the at least one first signal weight” and thus, renders claim indefinite.
Claim 18 is rejected for the at least similar reason as described in claim 2 above because claim 18 recites the similar deficient feature as recited in claim 2.
Claim 19 is rejected for the at least similar reason as described in claim 3 above because claim 19 recites the similar deficient feature as recited in claim 3. Claim 21 is rejected due to dependency to claim 19.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –


Claims 1, 3, 14, 17, 19, 22 are rejected under 35 U.S.C. 102(a)(1)/(2) as being anticipated by Samuelsson et al (US 20170251321 A1, hereinafter Samuelsson).
Claim 1: Samuelsson teaches an apparatus (title and a method in abstract ln 1-16, fig. 1, and a system having a processor executing the method as software stored in storage medium, para [0010]-[0011]) comprising: 
at least one processor (a processor, para [0010]); and 
at least one non-transitory memory including a computer program code (storage medium comprising a software program adapted for execution on the processor, para [0011]), the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to (the software program stored in the storage medium to be executed by the processor, para [0010]-[0011]):
determine for at least one first audio signal of an audio signal format (receiving an audio program having audio signal 110 of an audio object type received as audio program para [0020]-[0021] and as  spatially diverse audio signals, para [0025], generating downmix signal 111 in fig. 1), at least one metadata parameter (the received audio program also comprising object audio metadata OAMD, and then the bitstream metadata 121 is determined by the encoder 101 in fig. 1, para [0021]-[0022]; or upmix/JOC metadata 221 determined to reconstruct or decode 5.1 channel audio signals from the two channel downmix signal 111, para [0024]);
determine, for at least one further audio signal of a further audio signal format (audio content or system sound 130 of a set-top box STB, para [0022], as called as a first audio signal 
control a combination of the at least one metadata parameter with the at least one further metadata parameter (a mix or a replacement of the center channel with the STB sound signal to a center channel or to all channels, performed with in the metadata modification unit 204 in fig. 2, para [0055], including application of the weights to implement a cross-fading the bitstream metadata 121 over a pre-determined time interval into the target bitstream metadata 223; backing to bitstream 111 and bitstream metadata 121 based on a detection of termination of the insertion of the STB system sound, para [0059], i.e., control of the combination) to generate a combined metadata (encoded and modified metadata 122, including the modified object metadata 224 or modified OAMD, para [0047], indicative of a position of the modified audio object 113, 123, para [0048]; indicative of loudspeaker position such as table 2, para [0050]; the modified object metadata 224 for insertion of the modified object signal 130 added to the modified bitstream metadata 122 and upmix coefficients for 
Claim 17 has been analyzed and rejected according to claim 1 above.
Claim 3: Samuelsson further teaches, according to claim 1 above, wherein the combined metadata is configured to be generated based on: extracting at least one of the at least one metadata parameter and the at least one further metadata parameter associated with at least one of the at least one first (extracting the bitstream metadata 121) and further audio signal as a metadata block (extracting the flag from the STB, para [0058], and the channel information for the inserted first audio signal or SBT system sound signal, para [0056] and within the insertion unit 102 in fig. 1); and adding the extracted at least one of the at least one metadata parameter and the at least one further metadata parameter associated with at least one of the at least one first and further audio signal as a secondary metadata block within the combined metadata (inserting or adding the information indicative of the flag to modify the bitstream metadata with the channel information and the object position information about the first 
Claim 14: Samuelsson further teaches, according to claim 1 above, where the combined metadata comprises at least one spatial audio parameter (including position of the audio objects in the object audio metadata OAMD 120, and modified OAMD 224 in fig. 2, para [0020], para [0031], para [0037]) and at least one non-spatial audio related parameter (upmix JOC metatdata, including upmix parameters or JOC metadata and modified JOC metadata 223 in fig. 2).
Claim 19 has been analyzed and rejected according to claims 17, 3 above.
Claim 22 has been analyzed and rejected according to claims 17, 14 above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 4-5, 12, 13, 18, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Samuelsson et al. (above) and in view of reference Galdo et al (US 20110216908 A1, hereinafter Galdo).
Claim 2: Samuelsson teaches all the elements of claim 2, according to claim 1 above, including wherein the determined at least one of the at least one first and further audio signal is configured to cause the apparatus to at least one of: 
receive at least one of the at least one metadata parameter and the at least further metadata parameter (Samuelsson, receiving the metadata 120 and encoded metadata 121 and receiving the flag to indicating the insertion or termination of the insertion of the SBT system sound signal via signal 130 in fig. 1 and the discussion in claim 1 above); 
or decode at least one of the at least one first and further audio signal to determine at least one of the at least one metadata parameter and the at least one further metadata parameter (decoding the encoded bitstream including the encoded modified downmix signal 112 and encoded modified bitstream metadata 122 via the decoder 103 in fig. 1), except 
analyse at least one of the at least one first and further audio signal to determine at least one of the at least one metadata parameter and the at least one further metadata parameter.
Galdo teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-20 and a system in fig. 1A) and wherein analyzing at least one of the at least one first and further audio signal (via a means for determining 110 in fig. 1A) to determine at least one of the at least one metadata parameter and the at least one further metadata parameter (determining first/second diffuseness parameters and first/second DOAs in fig. 1A, para [0035]-[0036]) for benefits of achieving an efficient audio coding by bandwidth saving and simplifying front-end processing (merging multiple information to a single one for encoding and transmission, para [0013]) and by utilizing efficiency of the encoding/rendering scheme (DirAC for correcting ILD, ITD, and IC if the diffuseness is reproduced accurately, para [0003]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have added the analyzing of at least one of the least one first and further audio signal to determine at least one of the at least one metadata parameter and the at least one further metadata parameter, as taught by Galdo, to the determining of at least one metadata parameter and the at least one further metadata parameter in the apparatus, as taught by Samuelsson, for the benefits discussed above.
Claim 4: the combination of Samuelsson and Galdo further teaches, according to claim 3 above, wherein the extracted metadata block (Samuelsson, within the insertion unit 102 in figs. 1-2 and Galdo, input to the estimator) is configured to cause the apparatus to extract at least one of: at least one direction parameter (Galdo, DOA for one of the first and the second spatial audio stream, one of first/second wave direction measures and one of first/second wave field measures i.e., magnitude or energy related, para [0025]); at least one energy ratio parameter; or at least one coherence parameter associated with at least one of the at least one first and further audio signal as the metadata block (Galdo, IC or interaural coherence for DirAC, para [0003]).
Claim 5: the combination of Samuelsson and Galdo further teaches, according to claim 3 above, wherein the apparatus is configured, based upon the adding of the secondary metadata block (Samuelsson, within the insertion unit 102 in figs. 1-2 and Galdo, input to the estimator), to cause the apparatus to add at least one of: the at least one direction parameter (the discussion in claim 4 above, e.g., Galdo, DOA for other one of the first and the second spatial audio stream, other one of first/second wave direction measures and other one of first/second wave field measures i.e., magnitude or energy related, para [0025]); at least one energy ratio 
Claim 12: the combination of Samuelsson and Galdo further teaches, according to claim 1 above, wherein the at least one first audio signal of the audio signal format is at least one of: 2-N channels of a spatial microphone array (Galdo, stereo or surround audio data, para [0073]); 2-N channels of multi-channel audio signal (Samuelsson, 2-channel or a 5.1 channel or 7.1 channel donwmix signal, para [0021]; 5.1 channel specified in the modified upmix metadata 223, para [0041], and Galdo, first audio channel stream and second spatial audio stream as stereo DirAC stream in fig. 1A, para [0074]); a first order Ambisonics signal (figure-of-eight audio pickup pattern, i.e., B-format pickup audio signals X, Y, Z, and W, para [0070]); a higher order ambisonics signal; or a spatial audio signal (Samuelsson, 5.1 configuration in the modified upmix metadata 223, para [0041] and Galdo, first audio channel stream and second spatial audio stream as inputs in fig. 1A).
Claim 13: the combination of Samuelsson and Galdo further teaches, according to claim 1 above, wherein the at least one further audio signal of the further audio signal format is at least one of: 2-N channels of a spatial microphone array (Galdo, stereo or surround audio data, para [0073]); 2-N channels of multi-channel audio signal (Samuelsson, including left and right channels in STB sound signals, para [0056] and Galdo, first audio channel stream and second spatial audio stream as stereo DirAC stream in fig. 1A, para [0074]); a first order ambisonics signal (figure-of-eight audio pickup pattern, i.e., B-format pickup audio signals X, Y, Z, and W, para [0070]); a higher order ambisonics signal; or a spatial audio signal (Samuelsson, one or 
Claim 18 has been analyzed and rejected according to claims 17, 2 above.
Claim 21 has been analyzed and rejected according to claims 19, 4 above.

Claim 6 are rejected under 35 U.S.C. 103 as being unpatentable over Samuelsson et al. (above) and in view of reference Galdo (above) and Takahashi et al (CA 2967249 A1, hereinafter Takahashi).
Claim 6: Samuelsson teaches all the elements of claim 6, according to claim 3 above,  the combined metadata associated with the at least one of the first and further audio signals (combined via the metadata encoder 208 and output the encoded metadata 122 in fig. 2) comprising at least one primary metadata block (including modified object metadata 224 in fig. 2) comprising at least one spatial audio parameter associated with the at least one further/first audio signal (including position of the audio object 113, 123, para [0048], loudspeaker position para [0050], object metadata for the first modified audio object para [0063]), and a common metadata block (including upmix coefficient metadata or modified JOC metadata 223, para [0023]) associated with the at least one first/further audio signal comprising at least one non-spatial audio related parameter (upmix matrix, e.g., table 1, to mute or not mute channels, para [0039]-[0040]; upmix coefficients for reconstructing the first modified audio object being added to the modified upmix metadata 223, para [0063]), except wherein the at least one spatial audio parameter associated with the at least one first/further audio signals comprising at least one of: at least one direction parameter; at least one energy ratio parameter; or at least one 
Galdo teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-20 and a system in fig. 1A) and wherein at least one primary metadata block is closed (received stream by the processor 130 in fig. 1A) comprising at least one spatial audio parameter associated with the at least one further/first audio signal (including DOA and wave direction of the first and the second spatial audio streams and the wave direction is included in wave representation in fig. 1A, para [0008]), the at least one spatial audio parameter comprising at least one: at least one direction parameter (the DOA and the wave direction of the first and the second spatial audio streams and the discussion in claim 4 above); at least one energy ratio parameter; or at least one coherence parameter associated with the at least one first/further audio signal (IC or interaural coherence for DirAC, para [0003]) for the similar benefit discussed in claim 4 above.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the at least one spatial audio parameter and wherein the at least one spatial audio parameter associated with the at least one first/further audio signals comprising at least one of: at least one direction parameter; at least one energy ratio parameter; or at least one coherence parameter associated with the at least one first/further audio signal, as taught by Galdo, to the at least one spatial audio parameter and the at least one primary metadata block in the apparatus, as taught by Samuelsson, for the benefits discussed above.
However, the combination of Samuelsson and Galdo does not explicitly teach wherein the at least one non-spatial audio related parameter comprising at least one of: a version identifier; a time-frequency resolution identifier; or a number of directions identifier.
Takahashi teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-6 and a system in fig. 6) and wherein a common metadata block is disclosed (included in META1, META2, etc., in fig. 6) to be associated with the at least one first/further audio signal (related to the object sound source SA1 and SA2, para [0106]) comprising at least one non-spatial audio related parameter (including component group descriptor of fig. 11 in the Syntax, some fields in transport stream in fig. 12), the at least one non-spatial audio related parameter comprising at least one of: a version identifier (including stream type, elementary PID, component_tag, etc., in fig. 12); a time-frequency resolution identifier; or a number of directions identifier (number of audio in the Syntax block in fig. 11; each audio corresponding to a sound source having a direction defined by r’, θ’, Φ’ in fig. 5 and indicated in fig. 4) for benefits of achieving an improvement in sound presentation and sound quality (e.g., 3D audio, para [0005]) with dynamically moved video image (sound object moving with movement of  video image, para [0004]).
Therefore.
Claim 20 are rejected under 35 U.S.C. 103 as being unpatentable over Samuelsson et al. (above) and in view of reference Oh et al (US 20090262957 A1, hereinafter Oh).
Claim 20: Samuelsson teaches all the elements of claim 20, according to claim 17 above, including wherein determining at least one further signal user input associated with the at least one further audio signal (user is enabled to select particular video/audio content from a database of the distributer, para [0018]), except wherein generating the combined metadata further comprises at least one of: determining at least one first signal user input associated with the at least one audio signal; or generating at least one first signal weight based on the at least one first user input and at least one further signal weight based on the at least one further signal user input; determining at least one first signal server input associated with the at least one metadata parameter and the at least one first audio signal; determining at least one further signal server input associated with the at least one further metadata parameter and the at least one further audio signal; generating at least one first signal weight based on the at least one first server input; or generating at least one further signal weight based on the at least one further signal server input.
Oh teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-22 and a system in fig. 17) and wherein determining at least one first signal user input associated with at least one audio signal is disclosed (a user being select the preset metadata via the input unit 1530 to a control unit 1550 in fig. 15) and generating at least one first signal weight based on the at least one first user input and at least one further signal weight based on the at least one further signal user input (the preset metadata is applied to all data regions of a downmix signal according to the characteristic of sound source, para [0010], the present 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein determining at least one first signal user input associated with at least one audio signal and generating at least one first signal weight based on the at least one first user input and at least one further signal weight based on the at least one further signal user input, as taught by Oh, to the generating the combined metadata in the apparatus, as taught by Semuelsson, for the benefits discussed above.

Claims 7-11 are rejected under 35 U.S.C. 103 as being unpatentable over Samuelsson et al. (above) and in view of reference Groeschel et al (US 20130170672 A1, hereinafter Groeschel).
Claim 7: Samuelsson teaches all the elements of claim 7, according to claim 1 above, including wherein the apparatus is configured to generate the combined metadata (Samuelsson, JOC and OAMD metadata in fig. 1 and the discussion in claim 1 above), except the generating the combined metadata is based on: determining at least one first signal ratio associated with at least one metadata parameter associated with the at least one first audio signal; generating at least one first signal weight based on the at least one first signal ratio; determining at least one further ratio associated with at least one further metadata parameter associated with the at least one further audio signal; generating at least one further signal weight based on the at least one further signal ratio; comparing the at least one first signal 
Groeschel teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-16 and a system in fig. 2) and wherein the generating combined metadata is disclosed (scale factor is determined and applied to the mixed signal at 405-406 in fig. 4) to be based on: determining at least one first signal ratio associated with at least one metadata parameter associated with the at least one first audio signal (between mixing balance control and metadata scale factor inputted at step 401); generating at least one first signal weight based on the at least one first signal ratio (e.g. main metadata scale factor); determining at least one further ratio associated with at least one further metadata parameter associated with the at least one further audio signal; generating at least one further signal weight based on the at least one further signal ratio (including associated metadata scale factor); comparing the at least one first signal weight and the at least one further signal weight (comparing the balance and metadata scale factor selected at step 402 in fig. 4); and generating the combined metadata based on the comparing the at least one first signal weight and the at least one further signal weight (selected scaler of either main scale factor or associated metadata scale factor according to comparing the balance control and metadata scale factor at step 402) for benefit of achieving an improvement in trade-off between the audio sound effects and sound quanlity by flexibly balancing sound level of each component of the mixed audio signals (para [0007]-[0009]). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein the generating the combined metadata is based on: determining at least one first signal ratio associated with at least one metadata parameter associated with the at least one first audio signal; generating at least one first signal weight based on the at least one first signal ratio; determining at least one further ratio associated with at least one further metadata parameter associated with the at least one further audio signal; generating at least one further signal weight based on the at least one further signal ratio; comparing the at least one first signal weight and the at least one further signal weight; and generating the combined metadata based on the comparing the at least one first signal weight and the at least one further signal weight, as taught by Groeschel, to the generating the combined metadata in the apparatus, as taught by Samuelsson, for the benefits discussed above.
Claim 8 has been analyzed and rejected according to claims 7 above.
Claim 9 has been analyzed and rejected according to claims 7 above.
Claim 10 has been analyzed and rejected according to claims 7 above.
Claim 11 has been analyzed and rejected according to claims 7 above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589.  The examiner can normally be reached on Monday-Friday 6:30am-4:00pm EST.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LESHUI ZHANG/
Primary Examiner, Art Unit 2654