Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.


Claims 1-5, 8, 11-14, 17, 18, 20, 22, 23 rejected under 35 U.S.C. 103 as being unpatentable over Ojala: 20110002469 in view of Oh: 20090222118.
Regarding claim 1, 22, 23
Oj teaches:
An apparatus, method and medium bearing coded instructions comprising at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor (Oj: Abstract: ¶ 42-46; Fig 3: a processor in concert with memory operative of program code), cause the apparatus at least to perform: 
assign one or more audio representations to one or more audio scene entities in an audio scene (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: encoding and authoring of a scene comprising one or more channels, objects, etc. disclosed as well known; further, an audio representation comprises an audio signal, channel, bitstream, etc. consequently an entity must be considered no more than a particular channel – in accord with ¶ 86 of the instant specification audio scene entities include, but are not limited to, a diegetic sound, an ambience sound, an audio source, a non-diegetic sound, and a combination thereof, the microphones representative of sources in figure 2 comprise represented entities), 
wherein the one or more audio scene entities each represent a respective source of audio within the audio scene (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: a microphone provides a representation of the sound assigned to a channel, entity, etc.);
including a first audio scene entity combination and a second audio scene entity combination (Oj: ¶ 9-21, 47-64; Fig 1, 2, 6: the microphone configuration and selection modules operate to determine and combine a plurality of channels, entities, etc. and downmix same for transmission, operation of a controller of the system to move the listener position dynamically combines a selected set of microphones and updates the same thereby generating plural first, second, etc. subsequent combinations of channels, entities, etc.) 
based on the one or more audio scene entities and the one or more audio representations (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: coding and packaging of an audio scene comprising one or more combinations of channels, objects, etc. for broadcast disclosed as well known), the first audio scene entity combination and the second audio scene entity combination each comprising the one or more audio scene entities having been assigned the one or more audio representations in a first manner (Oj: ¶ 9-21, 47-64; Fig 1, 2, 6: microphone channel, entities selectably and dynamically grouped based on user operation of the system controller); and 
signal the plurality of audio scene entity combinations to a client (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: transmission of an audio scene comprising one or more combinations of channels, objects, etc. disclosed as well known) 
wherein the one or more audio representations assigned to the one or more audio scene entities cause the client to select an audio scene entity combination from the audio scene entity combinations to render the audio scene (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: client selects and renders audio scene(s), entity(s), object(s), channel(s) among available audio scene(s), entity(s), object(s), channel(s) in concert with user interaction and scene reconstruction data).

first and second manners respectively, wherein the first manner comprises assigning the one or more audio representations to the one or more audio scene entities differently than the second manner.

In a related field of endeavor consider Oh which teaches a method and apparatus for processing a received audio stream comprising using well-known preset data operable within a client to require particular automatic or user entered output audio settings (see Oh: Abstract;  116-124; Fig 13, 14: user input to a client application directs selection of particular groupings of channels objects etc. in the form of preset data said preset data operable as a requirement of the client to adjust a matrix directive of output channel, object, etc. rendering parameters; please see also group presets and switch groups in the 3D audio specification provided by Applicant alongside IDS filed 11/16/20), further the system groups particular channels, entities, etc. for output in first and second manners respectively, wherein the first manner comprises assigning the one or more audio representations to the one or more audio scene entities differently than the second manner (Oh: Abstract;  116-124; Fig 13, 14: a user is delivered a plurality of scene entity combinations which are grouped according to metadata transmitted with a container or according to preset metadata at the listening system, a user may operatively group scene entities in a user directed manner by entering user data such as upon the user interface of figure 14). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to render to a user using a first and second scene entity combinations utilizing a first preset metadata and a second subsequent client selected rendering matrices as taught or suggested by Oh within the Oj system and method. The average 

Regarding claim 2
Oj in view of Oh teaches or suggests:
An apparatus, method and medium, wherein the one or more audio representations comprise one or more of objects, channels, or higher order ambisonics (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6: audio representation comprise objects and/or channels, Oj further teaches ambisonic encoding as well known); (Oh: Abstract; 48-51, 75-78, 116-124; Fig 1, 5, 13, 14).

Regarding claim 3
Oj in view of Oh teaches or suggests:
An apparatus, method and medium, wherein the apparatus is further caused to group audio scene entities in an audio scene entity combination to form an audio scene entity group, wherein the audio scene entities that are assigned same audio representation are grouped together (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6: grouping of objects, channels, etc. into a downmix transmitted to a decoder, a downmix comprises a “same audio representation” of a group of channels; further any grouping of channels within a particular disclosed MPEG, SAOC, etc. codec must comprise assignment of “same audio representation”, channels within a stream are grouped by the stream); (Oh: Abstract; 48-51, 75-78, 116-124; Fig 1, 5, 13, 14: preset metadata groups together sets of channels, objects etc., for automated and/or user selection and control of output parameters thereby).

Regarding claim 4
Oj in view of Oh teaches or suggests:
An apparatus, method and medium, wherein the apparatus is further caused to generate an independent stream for each audio scene entity combination (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6: one or more input channels encoded as mono or as a variety of grouped channels, objects, etc. for transmission to a decoder as a plurality of multichannel downmix signal(s) ); (Oh: Abstract; 48-51, 75-78, 116-124; Fig 1, 5, 13, 14)

Regarding claim 5
Oj in view of Oh teaches or suggests:
An apparatus, method and medium, wherein the apparatus is further caused to: provide description of the audio scene entities in the audio scene entity group; and provide information about the audio representation assigned to the audio scene entity group (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6: side information comprises descriptive metadata about the audio data including rendering information including channel, object, etc. loudness, direction, etc. of the one or more downmix signal(s) ); (Oh: Abstract; 48-51, 75-78, 116-124; Fig 1, 5, 13, 14)

Regarding claim 8
Oj in view of Oh teaches or suggests:
An apparatus, method and medium, wherein the apparatus is further caused to pack the audio scene entities as MPEG-H audio streaming (MHAS) packet, wherein the MHAS packet comprises a scene description packet. Examiner has taken official notice which Applicant has failed to timely and specifically traverse and it is thus accepted as Applicant’s Admitted Prior Art 

Regarding claim 11
Oj in view of Oh teaches or suggests:
An apparatus, method and medium, wherein the apparatus is further caused to define information for identifying the audio scene (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6, 7: audio object, channels, etc. transmitted to a client in concert with side information identifying the audio scene; provision of listener coordinates to the encoder operates to identify particular audio scene data to be selectively transmitted to a particular decoder operable to deliver a desired audio scene to a user); (Oh: Abstract; 48-51, 75-78, 116-124; Fig 1, 5, 13, 14)

Regarding claim 12
Oj in view of Oh teaches or suggests:
An apparatus, method and medium, wherein the apparatus is further caused to define number of audio scene entity combinations in the audio scene (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6, 7: provision of downmix signal(s) and side information operates to define a particular grouping of audio scene entities for delivery to a user thus the side information defines a number of audio objects, channels, etc. as well as any number of the available audio objects, channels, etc. ); (Oh: Abstract; 48-51, 75-78, 116-124; Fig 1, 5, 13, 14).

Regarding claim 13
Oj in view of Oh teaches or suggests:


Regarding claim 14
Oj in view of Oh teaches or suggests:
An apparatus, method and medium, wherein the apparatus is further caused to define information for indicating a priority of the audio scene entity (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6, 7: inclusion of virtual listener coordinates uniquely identifiers a particular downmix signal(s) and side information for delivery of selected audio channels to a user wherein particular microphones are selected as priorities in a microphone lattice delivered based upon requirements transmitted by the client to the apparatus); (Oh: Abstract; 48-51, 75-78, 116-124; Fig 1, 5, 13, 14)

Regarding claim 17
Oj teaches or suggests:
An apparatus, method and medium comprising at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor (Oj: Abstract: ¶ 42-46; Fig 3: a processor in concert with memory operative of program code), cause the apparatus at least to perform: 
receive one or more streams comprising one or more audio scene entity combinations (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: encoding and authoring of a scene comprising one or more including a first audio scene entity combination and a second audio scene entity combination (Oj: ¶ 9-21, 47-64; Fig 1, 2, 6: the microphone configuration and selection modules operate to determine and combine a plurality of channels, entities, etc. and downmix same for transmission, operation of a controller of the system to move the listener position dynamically combines a selected set of microphones and updates the same thereby generating plural first, second, etc. subsequent combinations of channels, entities, etc.), 
wherein the one or more audio scene entity combinations comprise one or more audio scene entities from an audio scene wherein the one or more audio scene entities each represent a respective source of audio within the audio scene  (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: a microphone provides a representation of the sound assigned to a channel, entity, etc.),
wherein one or more audio representations are assigned to the one or more audio scene entities (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: coding and packaging  of an audio scene comprising one or more combinations of channels, objects, etc. for broadcast disclosed as well known), and 
wherein the one or more audio scene entity combinations are generated based on the one or more audio scene entities and the one or more audio representations (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: decoding and rendering of an audio scene comprising one or more combinations of channels, objects, etc. disclosed as well known), the first audio scene entity combination and the second audio scene entity combination each comprising the one or more audio scene entities having been assigned the one or more audio representations in a first manner (Oj: ¶ 9-21, 47-
select, based on the one or more audio representations, at least a first stream of the one or more streams that matches requirements for rendering the audio scene (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6, 7: user selects among available scenes in concert with user interaction and scene reconstruction data); and 
perform at least one of retrieve, buffer, or render the first stream (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: decoding and rendering of an audio scene comprising one or more combinations of channels, objects, etc. disclosed as well known).
Oj does not explicitly teach the selection of an audio scene entity from the plurality of audio scene entities based on one or more requirements of the client nor the assigning of first, second, etc. entity combinations in first and second manners respectively, wherein the first manner comprises assigning the one or more audio representations to the one or more audio scene entities differently than the second manner.

In a related field of endeavor consider Oh which teaches a method and apparatus for processing a received audio stream comprising using well-known preset data operable within a client to require particular automatic or user entered output audio settings (see Oh: Abstract;  116-124; Fig 13, 14: user input to a client application directs selection of particular groupings of channels objects etc. in the form of preset data said preset data operable as a requirement of the client to adjust a matrix directive of output channel, object, etc. rendering parameters; please see also group presets and switch groups in the 3D audio specification provided by Applicant alongside IDS filed 11/16/20), further the system groups particular channels, entities, etc. for output in first and second manners respectively, wherein the first manner comprises assigning the one or more audio representations to the one or more audio scene entities differently than the second manner (Oh: Abstract;  116-124; Fig 13, 14: a user is delivered a plurality of scene entity combinations which are grouped according to metadata transmitted with a container or according to preset metadata at the listening system, a user may operatively group scene entities in a user directed manner by entering user data such as upon the user interface of figure 14). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to render to a user using a first and second scene entity combinations utilizing a first preset metadata and a second subsequent client selected rendering matrices as taught or suggested by Oh within the Oj system and method. The average skilled practitioner would have been motivated to do so for the purpose of allowing a user to visualize and operate upon audio channels, objects, etc. in a client interface and to thereby direct output configurations, parameters, etc. and would have expected predictable results therefrom.

Regarding claim 18
Oj in view of Oh teaches or suggests:
An apparatus, method and medium, wherein the one or more audio representations comprise objects, channels, or higher order ambisonics (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6: audio representation comprise objects and/or channels, Oj further teaches ambisonic encoding as well known).

Regarding claim 20
Oj teaches:

determine at least one of a position or an orientation of a user (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6, 7: user interactions with the system change a virtual listener position and direct the subsequent selection of audio channels for delivery, decoding and rendering to a user); 
determine one or more audio scene entities that are relevant for an audio scene based on the at least one of the position or the orientation of the user (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6, 7: user interactions with the determine one or more audio object, channels, etc. by transmitting a virtual listener position from the client to the apparatus thereby directing a subsequent selection of audio channels for delivery, decoding and rendering to a user), wherein the one or more audio scene entities each represent a respective source of audio within the audio scene (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: a microphone provides a representation of the sound assigned to a channel, entity, etc. said microphones, channels, entities, etc. mapped to a virtual listener position);  
select audio representations that match requirements to render the audio scene (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6, 7: client directs selection among available audio object, channels, etc. in concert with user interaction and scene reconstruction data such that it is only necessary to transmit required signals); 
select, from a plurality of media streams each comprising a respective audio scene entity combination of a plurality of audio scene entity combinations, a media stream comprising selected audio scene entities combinations represented in required audio representations (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6, 7: client directs selection among available audio objects, channels, etc. at  selected audio scene entity combinations comprising audio scene entities represented in required audio representations matching the selected audio representations  (Oj: ¶ 9-21, 47-64; Fig 1, 2, 6: the microphone configuration and selection modules operate to determine and combine a plurality of channels, entities, etc. and downmix same for transmission, operation of a controller of the system to move the listener position dynamically combines a selected set of microphones and updates the same thereby generating plural first, second, etc. subsequent combinations of channels, entities, etc.), wherein the plurality of audio scene entity combinations include a first audio scene entity combination and a second audio scene entity combination  (Oj: ¶ 9-21, 47-64; Fig 1, 2, 6: the microphone configuration and selection modules operate to determine and combine a plurality of channels, entities, etc. and downmix same for transmission, operation of a controller of the system to move the listener position dynamically combines a selected set of microphones and updates the same thereby generating plural first, second, etc. subsequent combinations of channels, entities, etc.), the first audio scene entity combination and the second audio scene entity combination each comprising the one or more audio scene entities having been assigned the one or more audio representations in a first manner (Oj: ¶ 9-21, 47-64; Fig 1, 2, 6: microphone channel, entities selectably and dynamically grouped based on user operation of the system controller); and 
perform at least one of retrieve, buffer, or render of the audio scene (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: decoding and rendering of an audio scene comprising one or more combinations of channels, objects, etc. disclosed as well known).

Oj does not explicitly teach the selection of an audio scene entity from the plurality of audio scene entities based on one or more requirements of the client nor the assigning of first, first and second manners respectively, wherein the first manner comprises assigning the one or more audio representations to the one or more audio scene entities differently than the second manner.

In a related field of endeavor consider Oh which teaches a method and apparatus for processing a received audio stream comprising using well-known preset data operable within a client to require particular automatic or user entered output audio settings (see Oh: Abstract;  116-124; Fig 13, 14: user input to a client application directs selection of particular groupings of channels objects etc. in the form of preset data said preset data operable as a requirement of the client to adjust a matrix directive of output channel, object, etc. rendering parameters; please see also group presets and switch groups in the 3D audio specification provided by Applicant alongside IDS filed 11/16/20), further the system groups particular channels, entities, etc. for output in first and second manners respectively, wherein the first manner comprises assigning the one or more audio representations to the one or more audio scene entities differently than the second manner (Oh: Abstract;  116-124; Fig 13, 14: a user is delivered a plurality of scene entity combinations which are grouped according to metadata transmitted with a container or according to preset metadata at the listening system, a user may operatively group scene entities in a user directed manner by entering user data such as upon the user interface of figure 14). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to render to a user using a first and second scene entity combinations utilizing a first preset metadata and a second subsequent client selected rendering matrices as taught or suggested by Oh within the Oj system and method. The average skilled practitioner would have been motivated to do so for the purpose of allowing a user to visualize and operate upon audio channels, objects, etc. in a client interface and to thereby .

Claims 6, 7, 9, 10, 19, 21 rejected under 35 U.S.C. 103 as being unpatentable over Ojala: 20110002469 in view of Oh: 20090222118 as applied to claims 1-5, 8, 11-14, 17, 18, 20, 22, 23 supra and further in view of Herre: 20160142846.

Regarding claim 6, 19
Oj in view of Oh teaches:
An apparatus, method and medium, wherein the apparatus is further caused to: assign a label to each audio scene entity combination (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6: a mixer renderer stage receives a plurality of audio channels, objects, etc. and reads and/or determines a matrix of output values in the form of channel parameters comprising at least loudness, direction, etc. values for each/any of the audio channels, objects, etc.; while Oj does not explicitly disclose these values as a matrix the matrix operations for output are considered inherent to the mixing and rendering of the plural audio channels, objects, etc.); (Oh: Abstract;  116-124; Fig 13, 14: user input to a client application directs selection of particular groupings of channels objects etc. in the form of preset data said preset data operable as a requirement of the client to adjust a matrix directive of output channel, object, etc. rendering parameters); and 
generate a matrix based on the label assigned to the each audio scene entity combination, wherein the matrix causes the client to determine audio data corresponding to which labels can mixed together and audio data corresponding to which labels cannot be mixed together (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6: a value of zero in a mix matrix signifies the unavailability of a channel, object, etc. mixed for output in a particular rendering configuration(s)); (Oh: 
Oj in view of Oh thus strongly suggests, but does not explicitly teach the a mix matrix with which to direct which audio channels, objects, etc. may and may not be combined together.
In a related field of endeavor Herre teaches the utility of a matrix for calculation of output information using a mixing and/or rendering matrix to output a plurality of received audio channels, objects, etc. to a particular output configuration (Herre: ¶ 73-97; Fig 7, 9) the matrix operable for determining which audio channels, objects, etc. are mixed together and which are kept separate (Herre: ¶ 73-97; 151-189; Fig 7, 9). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize the Herre taught mix matrix data to determine availability of a particular audio channel, object, etc. for combination with other channels in the Oj in view of Oh system and method. The average skilled practitioner would have been motivated to do so for the purpose of specifying mixing and rendering of audio channels, objects, etc. using user or producer determined particular channels, object, etc. combinations with regard to particular output configurations and would have expected predictable results therefrom. (please see also Oj: ¶ 6-21, 53-64; Fig 1, 2, 6; Oh: Abstract; 48-51, 75-78, 116-124; Fig 1, 5, 13, 14; 3D audio specification provided by Applicant alongside IDS filed 11/16/20)

Regarding claim 7
Oj in view of Oh in view of Herre teaches or suggests:
An apparatus, method and medium, wherein the apparatus is further caused to signal the matrix and the label for the each audio scene entity combination to the client in a MPEG-

Regarding claim 9
Oj in view of Oh in view of Herre teaches or suggests:
An apparatus, method and medium, wherein the apparatus is further caused to define information for indicating overlapping content between audio scene entities (Herre: ¶ 73-97; 151-189; Fig 7, 9: a mix matric determine the amount of overlap by the direction of particular audio channels, objects, etc. to each/any of a plurality of output channels in a mono, stereo, etc. reproduction environment); (please see also Oj: ¶ 6-21, 53-64; Fig 1, 2, 6; Oh: Abstract; 48-51, 75-78, 116-124; Fig 1, 5, 13, 14; 3D audio specification provided by Applicant alongside IDS filed 11/16/20)

Regarding claim 10
Oj in view of Oh in view of Herre teaches or suggests:
An apparatus, method and medium, wherein the apparatus is further caused to define information for indicating overlapping content between different audio scenes (Herre: ¶ 73-97; 151-189; Fig 7, 9: a mix matric determine the amount of overlap by the direction of particular audio channels, objects, etc. to each/any of a plurality of output channels in a mono, stereo, etc. reproduction environment); (please see also Oj: ¶ 6-21, 53-64; Fig 1, 2, 6; Oh: Abstract; 48-51, 

Regarding claim 21
Oj teaches:
An apparatus, method and medium comprising at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: 
generate a label for each audio scene entity combination of one or more audio scene entity combinations (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6: a mixer renderer stage receives a plurality of audio channels, objects, etc. and reads and/or determines a matrix of output values in the form of channel parameters comprising at least loudness, direction, etc. values for each/any of the audio channels, objects, etc.; while Oj does not explicitly disclose these values as a matrix the matrix operations for output are considered inherent to the mixing and rendering of the plural audio channels, objects, etc.), the plurality of audio scene entity combinations including a first audio scene entity combination and a second audio scene entity combination, (Oj: ¶ 9-21, 47-64; Fig 1, 2, 6: the microphone configuration and selection modules operate to determine and combine a plurality of channels, entities, etc. and downmix same for transmission, operation of a controller of the system to move the listener position dynamically combines a selected set of microphones and updates the same thereby generating plural first, second, etc. subsequent combinations of channels, entities, etc.)
wherein the one or more audio scene entity combinations comprise one or more audio scene entities from an audio scene (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: coding and packaging of an audio scene comprising one or more combinations of channels, objects, etc. for broadcast wherein the one or more audio scene entities each represent a respective source of audio within the audio scene (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: a microphone provides a representation of the sound assigned to a channel, entity, etc.), and 
wherein one or more audio representations are assigned to the one or more audio scene entities (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: encoding and authoring of a scene comprising one or more channels, objects, etc. disclosed as well known; further, an audio representation comprises an audio signal, channel, bitstream, etc. consequently an entity must be considered no more than a particular channel), and 
wherein the one or more audio scene entity combinations are generated based on the one or more audio scene entities and the one or more audio representations (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: coding and packaging  of an audio scene comprising one or more combinations of channels, objects, etc. for broadcast disclosed as well known), the first audio scene entity combination and the second audio scene entity combination each comprising the one or more audio scene entities having been assigned the one or more audio representations in a first manner (Oj: ¶ 9-21, 47-64; Fig 1, 2, 6: microphone channel, entities selectably and dynamically grouped based on user operation of the system controller); and 
generate a matrix based on the label assigned to the each combination, wherein the matrix indicates audio data corresponding to which labels can mixed together and audio data corresponding to which labels cannot be mixed together (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6: a value of zero in a mix matrix signifies the unavailability of a channel, object, etc. mixed for output in a particular rendering configuration(s)).
first and second manners respectively, wherein the first manner comprises assigning the one or more audio representations to the one or more audio scene entities differently than the second manner.
In a related field of endeavor consider Oh which teaches a method and apparatus for processing a received audio stream comprising using well-known preset data operable within a client to require particular automatic or user entered output audio settings (see Oh: Abstract;  116-124; Fig 13, 14: user input to a client application directs selection of particular groupings of channels objects etc. in the form of preset data said preset data operable as a requirement of the client to adjust a matrix directive of output channel, object, etc. rendering parameters; please see also group presets and switch groups in the 3D audio specification provided by Applicant alongside IDS filed 11/16/20), further the system groups particular channels, entities, etc. for output in first and second manners respectively, wherein the first manner comprises assigning the one or more audio representations to the one or more audio scene entities differently than the second manner (Oh: Abstract;  116-124; Fig 13, 14: a user is delivered a plurality of scene entity combinations which are grouped according to metadata transmitted with a container or according to preset metadata at the listening system, a user may operatively group scene entities in a user directed manner by entering user data such as upon the user interface of figure 14). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to render to a user using a first and second scene entity combinations utilizing a first preset metadata and a second subsequent client selected rendering matrices as taught or suggested by Oh within the Oj system and method. The average skilled practitioner would have been motivated to do so for the purpose of allowing a user to 
Oj in view of Oh thus strongly suggests, but does not explicitly teach the a mix matrix with which to direct which audio channels, objects, etc. may and may not be combined together.
In a related field of endeavor Herre teaches the utility of a matrix for calculation of output information using a mixing and/or rendering matrix to output a plurality of received audio channels, objects, etc. to a particular output configuration (Herre: ¶ 73-97; Fig 7, 9) the matrix operable for determining which audio channels, objects, etc. are mixed together and which are kept separate (Herre: ¶ 73-97; 151-189; Fig 7, 9). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize the Herre taught mix matrix data to determine availability of a particular audio channel, object, etc. for combination with other channels in the Oj in view of Oh system and method. The average skilled practitioner would have been motivated to do so for the purpose of specifying mixing and rendering of audio channels, objects, etc. using user or producer determined particular channels, object, etc. combinations with regard to particular output configurations and would have expected predictable results therefrom.

Claims 15, 16 rejected under 35 U.S.C. 103 as being unpatentable over Ojala: 20110002469 in view of Oh: 20090222118 as applied to claims 1-5, 8, 11-14, 17, 18, 20, 22, 23 supra and further in view of Sen: 202000013426.

Regarding claim 15

An apparatus, method and medium, wherein the apparatus is further caused to define information for identifying a format of an audio representation (Oj: ¶ 9-21, 53-64; Fig 1, 2, 6: mixer/renderer converts output channels into an identified format including mono, beamforming, binaural, 5.1, etc.; further the information for identifying a format must be considered any transmitted formatted signal, that is the disclosed MPEG, etc. formatting effectively identifies the format of an audio representation in as much as the downstream client reads and interprets the data). Oj in view of Oh thus strongly suggests, but does not explicitly teach encoding identifiers of a particular formatting with which to direct the rendering of audio channels, objects, etc. 
In a related field of endeavor Sen teaches a system and method for encoding and decoding an audio signal comprising transmitting a formatting information in a header of an audio frame (Sen: ¶ 111-120; Fig 7A, 7B: header includes a type of split bit field).It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize any among the Sen taught plurality of encoding format options and to signal such formatting to a decoder such as that of Oj and Oh. The average skilled practitioner would have been motivated to do so for the purpose of encoding audio as a channel(s), object(s) or HOA stream and providing type metadata to a decoder in a header to direct decoding and would have expected only predictable results from the well-known, routine and conventional use of a type metadata indicating an audio stream format.

Regarding claim 16
Oj in view of Oh in view of Sen teaches or suggests:


Response to Arguments

Applicant's arguments filed 11/9/21 have been fully considered but they are not persuasive. Rather than meet Applicant’s burden of providing evidence of the manner in which the instant invention overcomes the prior art combination Applicant has chosen to argue the references piecemeal. Examiner is not persuaded by such arguings. Nevertheless Examiner appreciates that Applicant has and must argue a preferred construal of the claim language. Particularly, Applicant alleges that Oj does not meet the claimed “first audio scene entity combination and second audio scene entity combination each comprising the one or more audio scene entities having been assigned the one or more audio representations in first and second manners respectively, wherein the first manner comprises assigning the one or more audio representations to the one or more audio scene entities differently than the second manner." Applicant argues that: “in Ojala there is only a single possible audio scene entity combination for . 

Examiner respectfully disagrees. Myriad possible construals of the claim language can be considered, of which Applicant’s preferred construal and Examiners broadest reasonable interpretation based on, without unduly importing limitations from the specification, comprise merely two available interpretations. 

In Examiners broadly reasonable interpretation of the claim language: Oj teaches a system, method, etc. operable to select and combine a plurality of audio sources (Oj: Abstract) using the well-known properties of a coder such as in Oj: Figure 1 to assign one or more audio representations such as channels, objects, etc. to an audio scene entity (Oj: ¶ 9-21, etc.: an entity must be considered no more than a particular channel – in accord with ¶ 86 of the instant specification audio scene entities include, but are not limited to, a diegetic sound, an ambience sound, an audio source, a non-diegetic sound, and a combination thereof; the microphones representative of sources in Oj: figure 2, 4, etc. comprise the recited entities). Thus a first and second combination of particular microphones can be considered a first and second scene entity first and second manners respectively, wherein the first manner comprises assigning the one or more audio representations to the one or more audio scene entities differently than the second manner. 
Oh, however teaches that a received audio stream comprises well-known preset data operable within a client to require particular automatic or user entered output audio settings (see Oh: Abstract; 116-124; Fig 13, 14: user input to a client application directs selection of particular groupings of channels objects etc. in the form of preset data said preset data operable as a requirement of the client to adjust a matrix directive of output channel, object, etc. rendering parameters). Oh further teaches that the system groups particular channels, entities, etc. for output in first and second manners respectively, wherein the first manner comprises assigning the one or more audio representations to the one or more audio scene entities differently than the second manner (Oh: Abstract;  116-124; Fig 13, 14: a user is delivered a plurality of scene entity combinations which are grouped in a first manner according to metadata transmitted with a container or according to preset metadata at the listening system, a user may operatively group scene entities in a user directed second manner by entering user data such as upon the user interface of figure 14). As such, Oj in view of Oh meet Examiners broadly reasonable interpretation of the claims and Applicant’s arguments are not persuasive and the independent claims are not currently in condition for allowance.
With regard to claim 3 Applicant again argues a particular preferred construal of the claim language. In Examiners broadly reasonable construal Oj in view of Oh teaches the claimed subject matter particularly the grouping of scene entities based on assigned representations such as in Oj wherein objects are grouped based on virtual coordinates, as well as more generally grouped by encoding particulars necessary to fit within a particular coding specification (Oj: ¶ 6-21, 53-64; Fig 1, 2, 6O) and Oh further teaches that the audio scene entities are grouped based on preset metadata corresponding thereto. As such, Oj in view of Oh .

Conclusion

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 


Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL C MCCORD whose telephone number is (571)270-3701. The examiner can normally be reached 730-630 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VIVIAN CHIN can be reached on 5712727848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and 





/PAUL C MCCORD/Primary Examiner, Art Unit 2654