DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

The following claims including claims 1,2,4,6-14,16,18,19,21-23 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Delamont (US 20200368616 A1).

As per claim 1, Delamont discloses an apparatus comprising for: 
at least one processor (Fig. 1b, processor units memory and storage) and at least one non-transitory memory including a computer program code, the at least one memory and the computer code configured to, with the at least one processor, cause the apparatus at least to: 
obtain at least one spatial audio signal comprising at least one audio signal (the combination of inputs from the cameras 7 and other sensors including sensors 9 of the device of fig. 1b, in order to determine the mesh and 3d model, para. 81 also the audio obtained by encoder 15 as per para. 138), wherein the at least one spatial audio signal defines an audio scene (the mesh and model upon which the virtual objects are placed, where the objects include audio objects including the simulated effect of the audio source based on tracking from sensors 9, para. 106) forming at least in part an immersive media content (the terminals are part of a game and AR, para. 1, each of which are immersive media content), where the spatial audio signal is configured to be rendered consistent with a content consumer user movement (the tracked movement of any of the users in the AR game where the parameters are any representation of the readings from any of the tracked motion cited in para. 85,86 to perform the rendering performed by rendering module 27 in fig. 2);
Render the at least one spatial audio signal to be at least partially consistent with the content consumer user movement and obtain at least one rendered audio signal (the output of the rendering module cited directly above, when rendering a game object as per para. 89);
 obtain at least one augmentation audio signal (any of the generated/obtained game objects cited in para. 1308-1310), wherein the at least one augmentation audio signal has a different format than a format of the at least one spatial audio signal (para. 1103, the different technologies and techniques can all be used by the game server to present the 3d sound effects, where the technologies and techniques are the different formats);
 render at least a part of the at least one augmentation audio signal to obtain at least one augmentation rendered audio signal (the audio received for the first, second, third augmented game objects cited in para. 1308-1310 comprises the augmentation audio signal for each of the objects as per para. 1342, which is rendered as per rendering module 27 in fig. 2);

mix the at least one first rendered audio signal and the at least one augmentation rendered audio signal to generate at least one output audio signal (since the terminals in fig. 1a,1b comprise speakers 6 to output audio, the output from the rendered game objects, including their respective associated augmented 3d audio objects require a mixing stage in order to be combined into a format that can be output by a single set of speakers 6;
wherein the audio scene comprises a virtual six degrees of freedom audio scene;
(the scene comprises a virtual scene comprising virtual game objects, the mesh and model upon which the virtual objects are placed as cited above to implement a 6dof scene as per para. 149, where the mesh and model also comprise the audio scene which provides audio per para. 158 for the virtual game objects as per para. 183 and as per para. 197: , “so too does the user's perspective of the 3D positional sound effects which are relational to the augmented virtual game objects scenes in terms of their position in the three dimensional space of the game”)


As per claim 13, the apparatus of the claim 1 rejection performs a method comprising: 

obtaining at least one spatial audio signal which can be rendered consistent with a content consumer user movement, the at least one spatial audio signal comprising at least one audio signal and at least one spatial parameter associated with the at least one audio signal, wherein the at least one audio signal defines an audio scene; 
rendering the at least one spatial audio signal to be at least partially consistent with a content consumer user movement and obtain at least one first rendered audio signal; 
obtaining at least one augmentation audio signal, wherein the at least one augmentation audio signal has a different format than a format of the at least one spatial audio signal; 
rendering at least a part of the at least one augmentation audio signal to obtain at least one augmentation rendered audio signal; and 
mixing the at least one first rendered audio signal and the at least one augmentation rendered audio signal to generate at least one output audio signal. (as per the claim 1 rejection);
wherein the audio scene comprises a virtual six degrees of freedom audio scene;(per the claim 1 rejection)



As per claims 2,14, the apparatus claimed in claim 1, where the obtained at least one spatial audio signal comprises the at least one memory and the computer program code are configured to, with the at least onr processor cause the apparatus to:
  Decode from a first bit stream the at least one spatial audio signal and the at least one spatial parameter (the received spatial audio signal decodes the bitstream via the decoder 16 via the spatial audio signal and augmentation control parameter when recovered from the bitstream that implements the network connections between multiple terminals 1 as shown in fig. 7, noting that the local terminals can transmit rendering information directly between each other as per para. 142 ).

As per claims 4,16, the apparatus of claim 1, wherein the obtained at least one augmentation audio signal comprises the at least one memory and the computer program code are configured to, with the at least onr processor cause the apparatus to:
Decode from a second bit stream the at least one augmentation audio signal (audio signals and parameters/data for rendering can be received from separate sources, where each source requires its own bitstream to a particular terminal, including a second bitstreams, such as from the server and from multiple other terminals 1 as shown in the network shown in fig. 7).

As per claims 6,18, The apparatus as claimed in claim 1, where the apparatus is further for caused to: obtain a mapping from a spatial part of the at least one augmentation audio signal to the audio scene (the mesh and model upon which the virtual objects are placed, where the objects include audio objects including the simulated effect of the audio source based on tracking, para. 81, where the mesh and  3d model is a mapping); and
control the mixing of at least one first rendered audio signal and the at least one augmentation rendered audio signal based on the mapping (the mesh and model are used in the rendering as per para. 48, and para. 81 the generated mesh data and 3D models used to render the virtual game objects which have associated 3d audio).

As per claims 7,19, the apparatus as claimed in claim 6, wherein the controlled mixing of the at least one first rendered audio signal and the at least one augmentation rendered audio signal is further for configured to cause the apparatus to: determine a mixing mode for the mixing of the at least one first rendered audio signal and the at least one augmentation rendered audio signal (since the terminals 1 can receive and render multiple virtual objects including virtual 3d audio objects as per the above claim 1 rejection, each device must perform a particular mixing of audio objects in order to output them from the same set of speakers 6 (Fig. 1b), where the amount of audio objects being mixed into speaker output signals at a given point in time defines a particular determined mixing mode (ie, how many channels are being mixed at a given time).

As per claims 8,21, the apparatus as claimed in claim 7, wherein the mixing mode for the at least one first rendered audio signal and the at least one augmentation rendered audio signal is at least one of: 
a world-locked mixing wherein an audio object associated with the at least one augmentation audio signal is fixed as a position within the audio scene (game objects, which include the 3d audio rendered from the augmentation audio signal may be attached to real world objects as per para. 97: render the virtual game objects and scenery accurately over current real-world objects and surrounding surfaces ); or 6 
an object-locked mixing wherein an audio object associated with the at least one augmentation audio signal is fixed relative to a content consumer user position and/or rotation within the audio scene (not mapped as recited in the alternative).

As per claims 9,22, The apparatus as claimed in claim 6, wherein the controlling mixing of the at least one first rendered audio signal and the at least one augmentation rendered audio signal configured to cause the apparatus to: 
determine a gain based on a content consumer user position and/or rotation and a position associated with an audio object associated with the at least one augmentation audio signal; and 
apply the gain to the at least one augmentation rendered audio signal before mixing the at least one first rendered audio signal and the at least one augmentation rendered audio signal.
(since the apparatus uses ILD to localize the audio per para. 186, the tracking of the user position/rotation/orientation per para. 197 requires a determined and applied gain in order to create an ILD between speaker channels in order to create the 3d audio, where said applied gain occurs respectively per audio object, which occurs before the mixing of all the concurrent audio objects being rendered by a user at a given moment because the objects are rendered at the rendering module before they are output by the speakers 6).

As per claims 10,23, the apparatus of claim 6, where the obtained mapping causes the apparatus to perform at least one of: 
the spatial part of the at least one augmentation audio signal to the audio scene [[from]] based on the at least one augmentation audio signal (the decoder receives audio and data/metadata per para. 138 including the mapping cited in claim 1 rejection) ,or 
obtain the mapping from [[a]] the spatial part of the at least one augmentation audio signal to the audio scene [[from]] based on a user input (not mapped as recited in the alternative).

As per claim 11, the audio scene is a six degrees of freedom scene (para. 149).

As per claim 12, The apparatus as claimed in claim 1, wherein the spatial part of the at least one augmentation audio signal defines one of:
 a three degrees of freedom scene (the portion of the 6DOF processing that is defined by the movement along the x,y, and z axis, para. 149); and 7 
a three degrees of rotational freedom with limited translational freedom scene (not mapped as recited in the alternative).




Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3,5, is/are rejected under 35 U.S.C. 103 as being unpatentable over Delamont (US 20200368616 A1).

As per claim 3, Delamont discloses network connections/first bitstream to carry audio and video data between terminals and servers as per the claim 1 and 2 rejections, but does not specify the particular protocol for the network communication.
The examiner takes official notice it is well known in the art to use existing signaling protocols including well known MPEG standards, including MPEG-1, for the purpose of compatibility with existing standards.

As per claim 5, wherein the second bit stream is a low-delay path bit stream (the mpeg-1 protocol is a low-delay path protocol).

Response to Arguments
The submitted arguments have been considered but are moot in view of the new grounds of rejection.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER KRZYSTAN whose telephone number is 571-272-7498, and whose email address is alexander.krzystan@uspto.gov

The examiner can usually be reached on m-f 7:30-4:00 est.
If attempts to reach the examiner by telephone or email are unsuccessful, the examiner’s supervisor, Fan Tsang can be reached on (571) 272-7547.  

The fax phone numbers for the organization where this application or proceeding is assigned are 571-273-8300 for regular communications and 571-273-8300 for After Final communications.
/ALEXANDER KRZYSTAN/Primary Examiner, Art Unit 2653                                                                                                                                                                                                        
Examiner Alexander Krzystan
December 12, 2022