DETAILED ACTION
Claims 1-17 are pending in the present application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant's claim for foreign priority under 35 U.S.C. 119(a)-(d).  The certified copy of European patent application number EP1817357.8 filed on 05/18/2018 has been received and made of record.

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 11/20/2020 and 11/24/2020 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Regarding claims 1 and 13-15, the phrase "and/or" renders the claim indefinite because it is unclear whether the limitation(s) following the phrase are part of the claimed invention. Please select “and” or “or”.
Regarding claims 2 and 9, the phrase "for example" renders the claim indefinite because it is unclear whether the limitation(s) following the phrase are part of the claimed invention.  See MPEP § 2173.05(d).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 7, and 9-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. PGPubs 2014/0233917 to Xiang in view of U.S. PGPubs 2012/0206452 to Geisner et al.

Regarding claim 1, Xiang teaches a method of adapting an acoustic rendering of an audio source to a visual rendering of an object in a scene, wherein the visual rendering is provided to a user (par 0005-0008, “a method comprises analyzing audio data captured with a device to identify one or more audio objects and analyzing video data captured with the device concurrent to the capture of the audio data to identify one or more video objects. The method further comprises associating at least one of the one or more audio objects with at least one of the one or more video objects, and generating multi-channel audio data from the audio data based on the association of the at least one of the one or more audio objects with the at least one of the one or more video objects”),
wherein the visual rendering is one of: a virtual-reality rendering of an image-based representation of the object in a virtual-reality scene, and an augmented-reality rendering of an image-based representation of the object with respect to a real-life scene (par 0044-0045, par 0056, par 0066, “Augmented reality audio rendering unit 28C may "augment reality" in the sense that rendering unit 28C may access an audio library (located either internal to or externally from device 10) or other audio repository to retrieve an audio object corresponding to the unmatched or unassociated video objects 32' and render audio data 38C to augment audio data 38A and 38B which reflects audio data 20 captured by microphones 16. Augmented reality audio rendering unit 28C may render audio data in the foreground given that unit 28C processes video objects 32' that are detected in the scene captured by camera 14 as video data 18”), the method comprising: 
- generating metadata associated with the image-based representation of the object, the metadata representing a modelling of the object (par 0005, par 0029, par 0039-0040, par 0049-0054, par 0062-0069, par 0071-0080, “Object 
- establishing the acoustic rendering of the audio source by (Fig 3, par 0070-0072, “FIG. 3 is a block diagram illustrating assisted audio rendering unit 28A of FIG. 1B in more detail. In the example of FIG. 3, assisted audio rendering unit 28A includes a number of spatial audio rendering units 60A-60N ("spatial audio rendering 
providing the audio source as a spatial audio source in an acoustic scene (par 0044-0046, par 0070-0072, “Each of rendering units 28 may render audio data 38A-38C in a spatialized manner. In other words, rendering units 28 may produce spatialized audio data 38A-38C, where each of audio objects 34', 34'' and 34''' (where audio objects 34''' refer to augmented reality audio objects 34''' retrieved by augmented reality audio rendering unit 28C) are allocated and rendered assuming a certain speaker configuration for playback. Rendering unit 28 may render audio objects 34', 34'' and 34''' using head-related transfer functions (HRTF) and other rendering algorithms commonly used when rendering spatialized audio data”), the acoustic scene being geometrically aligned with the visual rendering of the object (Fig 3, par 0009, par 0039-0040, par 0042, par 0052, par 0054-0055, par 0062-0067, par 0075, “With respect to those of audio objects 34 determined to belong to the first class, object association unit 26 may determine a level of correlation between the audio metadata of the one of audio objects 34 and the video metadata of the associated one video objects 32, generating combined metadata for the one of audio objects 34 to which the one video objects 32 is associated based on the determined level of correlation”, “FIGS. 2A-2D are diagrams illustrating operations performed by video capture device 10 of FIG. 1 in associating video objects 32 with audio objects 34 in accordance with the techniques described in this disclosure. In the above FIG. 2A, one of audio objects 34 (denoted 
on the basis of the metadata, establishing the object as an audio object in the acoustic scene (par 0062-0071, par 0075-0078, “video-capture device 10 may determine that a location of the object that emitted the sound specified in audio metadata 54A correlates to a high degree (e.g., which may be defined by some confidence threshold, often expressed as a percentage) with a location of the corresponding object defined by video metadata 52A. Video-capture device 10 may then render and mix the audio object to generate multi-channel audio data 40 with high confidence”, “video object 32A has moved from the first location to the second location and then to the third location. This video metadata 52A may, when associated with a corresponding one of audio objects 34 (e.g., audio object 34A), enable object association unit 26 to augment audio metadata 54A to specify the location of the object that emits audio data identified as audio object 34A more accurately (given that visual scene analysis is commonly more accurate than auditory scene analysis). Object 
rendering the acoustic scene using a spatial audio rendering technique while adapting the rendering of the acoustic scene to the audio object (par 0044-0046, par 0070-0072, “In the example of FIG. 3, each of spatial audio rendering units 60 may represent a separate audio rendering process that performs spatial audio rendering with respect to audio objects 34A'-34N' ("audio objects 34'", which are shown in the example FIG. 1B) to generate audio objects 38A. Spatial audio rendering may refer to various algorithms or processes for rendering audio data and may include, as a few examples, ambisonics, wave field synthesis (WFS) and vector-based amplitude panning (VBAP). Spatial audio rendering units 60 may process respective ones of audio objects 34' based on augmented metadata 56A-56N ("augmented metadata 56"). That is, spatial audio rendering units 60 may render audio objects 34' using augmented metadata 56 to further refine or otherwise more accurately locate the corresponding one of audio objects 34' so that this one of audio objects 34' can be more accurately reproduced when multi-channel audio data 40 is played. Spatial audio rendering units 60 may output rendered audio data 38A to audio mixing unit 30, which may then mix rendered audio data 38A to produce multi-channel audio data 40. In some instances, audio data 38A corresponding to a given audio object 34' may be mixed across two or more channels of multi-channel audio data 40”).  
But Xiang keeps silent for teaching the audio object having a reverberant and/or absorbent acoustic property; and rendering the acoustic scene using a spatial audio rendering technique while adapting the rendering of the acoustic scene to the reverberant and/or absorbent acoustic property of the audio object.
In related endeavor, Geisner et al. teach establishing the object as an audio object in the acoustic scene, the audio object having a reverberant and/or absorbent acoustic property (par 0106-0112, “The physical properties including type of material of an object are used to determine its one or more effects on audio data. A sound occlusion model 316 may include rules for representing the one or more effects which the 3D audio engine 304 can implement. For example, one type of material may be primarily a sound absorber in which the amplitude of the sound wave is dampened and the sound energy is turned to heat energy. Absorbers are good for sound proofing. A sound occlusion model may for example indicate a damping coefficient to the amplitude of the audio data to represent an aborption effect. Another type of material may act to reflect sound waves such that the angle of incidence is the angle of reflection for a pre-defined percentage of waves hitting the material. Echo and Doppler effects may be output by the 3D audio engine as a result. A third type of material acts as a sound diffuser reflecting incident sound waves in all directions. A sound occlusion model associated with the object having this type of material has rules for generating reflection signals of audio data in random directions off the size and shape of the occluding object which the 3D audio engine implements. Within these general categories of sound characteristics, there may be more specific cases like a resonant absorber which dampens the amplitude of a sound wave as it is reflected. 3D audio engines, such as may be used in interactive gaming with all artificial display environments, have techniques for modifying sound waves to create echos, Doppler ; and rendering the acoustic scene using a spatial audio rendering technique while adapting the rendering of the acoustic scene to the reverberant and/or absorbent acoustic property of the audio object (par 0008, par 0074-0075, par 0103-0112, par 0163-0165, “An occlusion engine 302 executing in the display device system 8 or the hub 12 can identify the occlusions. Although not seen, such an occlusion with respect to the user may cause audio data associated with the occluded object to be modified based on the physical properties of the occluding object. … The 3D audio engine 304 is a positional 3D audio engine which receives input audio data and outputs audio data for the earphones 130. The received input audio data may be for a virtual object or be that generated by a real object. Audio data for virtual objects generated by an application can be output to the earphones to sound as if coming from the direction of the virtual object projected into the user field of view “, “a virtual brick wall 410 appears to users Bob 406 and George 408 in their respective head mounted display devices 2 while executing a quest type of game which they are both playing, and which an action of George triggered to appear. In this example, to provide a realistic experience, neither George 408 nor Bob 406 should be able to hear each other due to the sound absorption characteristic of a thick brick wall (e.g. 18 inches) 410 between them if it were real. In FIG. 4B, audio data generated by George is blocked, e.g. his cries for help, or removed from audio received via Bob's microphone and sent to Bob's earphones. Likewise, George's 3D audio engine, modifies audio data received at George's earphones to remove audio data generated by Bob”).


Regarding claim 2, Xiang as modified by Geisner et al. teaches all the limitation of claim 1, and Geisner et al. further teach wherein the object is a room having at least one wall, and wherein the metadata defines at least part of a geometry of the room, for example, by defining a box model representing the room (par 0043, par 0049, par 0054, par 0082- 0084, par 0110, par 0145, “ An example of an environment is a 360 degree visible portion of a real location in which the user is situated. A user may only be looking at a subset of his environment which is his field of view. For example, a room is an environment. A person may be in a house and be in the kitchen looking at the top shelf of the refrigerator. The top shelf of the refrigerator is within his field of view, the kitchen is his environment, but his upstairs bedroom is not part of his current environment as walls and a ceiling block his view of the upstairs bedroom. Of course, as he moves, his environment changes. Some other examples of an environment may be a ball field, a street location, a section of a store, a customer section of a coffee shop and the like. A 

Regarding claim 3, Xiang as modified by Geisner et al. teaches all the limitation of claim 1, and Geisner et al. further teach wherein the virtual-reality scene is an omnidirectional image (par 0054, “An example of an environment is a 360 degree visible portion of a real location in which the user is situated. A user may only be looking at a subset of his environment which is his field of view”).

Regarding claim 4, Xiang as modified by Geisner et al. teaches all the limitation of claim 1, and further teach wherein the virtual-reality scene is associated with a first axis system, wherein the virtual-reality -3- 3281759.v1Attorney's Docket No.: 4965.1108-001 (P2196USPC)scene has a default orientation in the first axis system (Xiang: par 0039, par 0049, par 0075-0078, Geisner et al.: par 0053, par 0076, par 0079, par 0082-0084, par 0095-0096, rendering AR scene based on coordinate system (location)), wherein the metadata comprises one or more coordinates defining at 

Regarding claim 7, Xiang as modified by Geisner et al. teaches all the limitation of claim 1, and further teach wherein generating the metadata comprises analyzing one or more of: - an image-based representation of the scene; - the image-based representation of the object; and - depth information associated with either image-based representation (Xiang: par 0005, par 0029, par 0035-0038, par 0073-0078, provide image or video scene and image based object, Geisner et al.: par 0007, par 0052-0053, par 0070, par 0076-0077, par 0083-0085, provide image scene and image based object and depth information for 3D); using an image analysis technique or a computer vision technique to obtain a modelling of the object (Xiang: par 0029, par 0035-0038, par 0073-0078, Geisner et al.: par 0042, par 0083-0084, par 0087, par 0097-0099, generate 3D model of object).

Regarding claim 9, Xiang as modified by Geisner et al. teaches all the limitation of claim 1, and further teach wherein generating the metadata comprises indicating the 

Regarding claim 10, Xiang as modified by Geisner et al. teaches all the limitation of claim 1, and Xiang further teaches further comprising generating the metadata at a server and providing the metadata to a receiver configured to establish the acoustic rendering of the audio source (par 0036-0039, par 0075, a server could be used to transmit the audio and video data with metadata to render acoustic rendering and video rendering).

Regarding claim 11, Xiang as modified by Geisner et al. teaches all the limitation of claim 1, and Geisner et al. further teach wherein the audio source represents audio of a multiuser communication session, and wherein the virtual-reality scene represents a virtual setting of the multiuser communication session (Fig 4B, par 0085, par 0110-0111, render a AR visual scene for two user and generate audio data between two users).

Regarding claim 12, Xiang teaches a non-transitory computer-readable medium comprising a computer program, the computer program comprising instructions for causing a processor system to perform the method according to claim 1 (par 0121). The remaining limitations of the claim are similar in scope to claim 1 and rejected under the same rationale.

Regarding claim 13, Xiang teaches a non-transitory computer-readable medium (par 0121) comprising metadata associated with an image-based representation of an object (par 0005, par 0029, par 0039-0040, par 0049-0054, par 0062-0069, par 0071-0080, render audio and video scene based on metadata), the metadata defining at least part of a geometry of the object (par 0005, par 0029, par 0039-0040, par 0049-0054, par 0062-0069, par 0071-0080, “Object association unit 26 represents hardware or a combination of hardware and software that attempts to associate video objects 32 with audio objects 34. Video objects 32 and audio objects 34 may each be defined in accordance with a compatible or common format, meaning that video objects 32 and audio objects 34 are both defined in a manner that facilitates associations between objects 32 and objects 34. Each of objects 32 and 34 may include metadata defining one or more of a predicted location (e.g., an x, y, z coordinate) of the corresponding object, a size (or predicted size) of the corresponding object, a shape (or predicted shape) of the corresponding object, a speed (or a predicted speed) of the corresponding object, a location confidence level, and whether the object is in focus, or whether the object belongs to the near foreground, far foreground or the near background or the far 
In related endeavor, Geisner et al. teach the metadata defining at least part of a geometry of the object (par 0082, par 0090-0091, par 0137-0138) and indicating a reverberant and/or absorbent acoustic property of the object (par 0106-0112, “The physical properties including type of material of an object are used to determine its one or more effects on audio data. A sound occlusion model 316 may include rules for representing the one or more effects which the 3D audio engine 304 can implement. For example, one type of material may be primarily a sound absorber in which the amplitude of the sound wave is dampened and the sound energy is turned to heat energy. Absorbers are good for sound proofing. A sound occlusion model may for example indicate a damping coefficient to the amplitude of the audio data to represent an aborption effect. Another type of material may act to reflect sound waves such that the angle of incidence is the angle of reflection for a pre-defined percentage of waves hitting the material. Echo and Doppler effects may be output by the 3D audio engine as a result. A third type of material acts as a sound diffuser reflecting incident sound waves in all directions. A sound occlusion model associated with the object having this type of material has rules for generating reflection signals of audio data in random directions off the size and shape of the occluding object which the 3D audio engine implements. Within these general categories of sound characteristics, there may be more specific cases like a resonant absorber which dampens the amplitude of a sound wave as it is reflected. 3D audio engines, such as may be used in interactive gaming with all artificial 
		It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Xiang to include the metadata indicating a reverberant and/or absorbent acoustic property of the object as taught by Geisner et al. to generate/modify audio effect based on the physical properties (e.g. shape, color, size, texture) of virtual objects realistic in a display and position and movement of the virtual objects with respect to real objects display realistically in a user field of view provided by the head mounted display device.  

Regarding claim 14, Xiang teaches a processor system for generating metadata for use in adapting an acoustic rendering of an audio source to a visual rendering of an object in a scene (par 0008-0009) and the processor system comprising: - a communication interface configured to communicate with a receiver which is configured to establish the acoustic rendering of the audio source by providing the audio source as a spatial audio source in an acoustic scene (par 0036-0038). The remaining limitations of the claim are similar in scope to claim 1 and rejected under the same rationale.

Regarding claim 15, Xiang teaches a processor system for adapting an acoustic rendering of an audio source to a visual rendering of an object (par 0008-0009). The remaining limitations of the claim are similar in scope to claim 1 and rejected under the same rationale.

Regarding claim 16, Xiang as modified by Geisner et al. teaches all the limitation of claim 15, and further teach further comprising a video processor configured to establish the visual rendering of the scene by providing one of the virtual-reality rendering and the augmented-reality rendering to the user (Xiang: Fig 4, par 0044-0045, par 0056, par 0066, par 0075-0078, render an augment reality in the scene, Geisner et al.: abstract, par 0007, render a AR or mixed reality scene in HMD device).

Regarding claim 17, Xiang as modified by Geisner et al. teaches all the limitation of claim 15, the claim 17 is similar in scope to claim 11 and is rejected under the same rational.

Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. PGPubs 2014/0233917 to Xiang in view of U.S. PGPubs 2012/0206452 to Geisner et al., further in view of U.S. PGPubs 2012/0154402 to Mital et al.

Regarding claim 8, Xiang as modified by Geisner et al. teaches all the limitation of claim 1, but does not explicitly teach wherein generating the metadata comprises obtaining user input indicative of a geometry of the object via a user interface from a user.
In related endeavor, Mital et al. teach wherein generating the metadata comprises obtaining user input indicative of a geometry of the object via a user interface from a user (par 0028-0033, par 0239-0046, par 0047-0051, par 0058-0062, “The computerized tool may analyze metadata of graphical objects in the library of graphical 
		It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Xiang as modified by Geisner et al. to include wherein generating the metadata comprises obtaining user input indicative of a geometry of the object via a user interface from a user as taught by Mital et al. to provide as a suggestion on a user interface for potentially using the graphical object to represent the data to receive user input indicating a change in representation of the data set so that appearance of visual characteristics of the graphical object is modified to reflect the change to allow the user to get an understanding of an aspect of the data set.  

Allowable Subject Matter
Claims 5-6 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: The cited prior art fails to teach the combination of elements recited in claim 5, including " wherein generating the metadata comprises: - defining at least part of the geometry of the object as coordinates in a second axis system which is different from the first axis system; - determining the spatial correspondence between the first axis system and the second axis system; and - generating the metadata, or generating further metadata associated with the metadata, to indicate the spatial correspondence".

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jin Ge whose telephone number is (571)272-5556.  The examiner can normally be reached on 8:00 to 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Devona Faulk can be reached on (571)272-7515.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.



JIN . GE
Examiner
Art Unit 2616



/JIN GE/           Primary Examiner, Art Unit 2616