Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This is in response to Applicant’s request for continued examination which was filed on 11/23/2020 and has been entered. Claims 1, 23, 27 and 30 have been amended. No claims have been cancelled. No claims have been added. Claims 1-30 are pending in this application.
	
	
Response to Arguments
Claim 1 is amended to additionally recite “generate audio from a source different than the one or more media signals to playback as virtual sounds that originate from the one or more candidate spatial locations”. Independent claims 23, 27 and 30 are similarly amended. Applicant’s arguments with respect to claims have been considered but are moot because the arguments do not apply to the combination of references being used in the current rejection. 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-8, 11, 16-19 and 23-30 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by U.S. Patent No. 9,654,644 (“Spittle et al.”).
Regarding claim 1, Spittle et al. discloses an apparatus (conference server 110) comprising: a processor configured to: 
	obtain one or more media signals (audio signals 123 of conference participants 120) associated with a scene (Fig. 2 audio conference scene in GUI 200);
	identify a spatial location in the scene for each source of the one or more media signals; 	identify audio content for each media signal of the one or more media signals (col. 8, lines 11-19, the central conference controller 111 may be configured to place the plurality of upstream audio signals 123 at particular locations within a 2D or 3D conference scene and generate information regarding the arrangement (i.e. the locations) of the plurality of upstream audio signals 123 within the conference scene);	
	determine, based on the identified spatial locations, one or more candidate spatial locations in the scene that are not associated with an audio source (col. 14, lines 29-33, the GUI 200 may allow to indicate and/or to modify additional sound locations 214 (which are illustrated as filled dots 214 in FIG. 2). Such sound locations 214 may be used to render sounds other than audio signals of the participants, e.g. conference notifications and/or music); and 
	generate audio from a source different than the one or more media signals to playback as virtual sounds that originate from the one or more candidate spatial locations (col. 27, lines 11-15, the scene manager can interpret the categorization tag provided alongside the audio signal and determine the appropriate processing based on the categorization tag by making use of a predefined X-point scene and to use one of the X talker locations of the predefined X-point scene for rendering the sound signal).

Claim 23 recites a method performed by the apparatus disclosed in claim 1 thus Claim 23 is rejected in view of Spittle et al. in the same manner as discussed in reference to claim 1 above. 

Claim 27 recites A non-transitory computer-readable medium comprising instructions that, when executed by a processor, cause the processor to perform operations of the apparatus disclosed in claim 1 thus Claim 27 is rejected in view of Spittle et al. in the same manner as discussed in reference to claim 1 above. 

Claim 30 recites an apparatus comprising: means for obtaining one or more media signals associated with a scene; means for identifying a spatial location in the scene for each source of the one or more media signals; means for identifying audio content for each media signal of the one or more media signals; means for determining one or more candidate spatial locations in the scene based on the identified spatial locations; and means for generating audio to playback as virtual sounds that originate from the one or more candidate spatial locations.
	The apparatus of claim 30 operates in substantially the same manner as the apparatus of claim 1 and is therefore rejected in view of Spittle et al. in the same manner as discussed in reference to claim 1 above.

Regarding claims 2, 24 and 28, Spittle et al. discloses wherein the audio comprises complementary audio content to the audio content (col. 4, lines 52-56 the notifications are complementary to the audio signals, for example, the sound signal may indicate an entry of the upstream audio signal into the conference scene).

Regarding claims 3, Spittle et al. discloses the apparatus of claim 2, wherein the processor is further configured to generate the complementary audio content based on the audio content (claim 18, wherein the sound signal relates to a system sound generated by the conference controller and comprises a notification, an announcement and/or music).  
Regarding claim 4, Spittle et al. discloses the apparatus of claim 2, wherein the processor is further configured to select the complementary audio content based on the audio content (generating system sounds are also discussed in col. 29, lines 27-54, for example notification sound can be generated to surround a talker to emphasize an event or characteristic).  

Regarding claim 5, Spittle et al. discloses the apparatus of claim 1, further comprising one or more microphones coupled to the processor, the one or more microphones configured to capture one or more audio signals included in the one or more media signals (col. 6, lines 25-29, the audio conferencing system comprises a talker terminal configured to generate an upstream audio signal associated with a conference participant (e.g. by recording the speech signal of the conference participant using a microphone).  

Regarding claim 6, Spittle et al. discloses the apparatus of claim 5, wherein each media signal of the one or more media signals consists of an audio signal (fig. 1a, each media signal 123 corresponds to an audio signal captured at 120).  

Regarding claim 8, Spittle et al. discloses the apparatus of claim 1, further comprising: a decoder configured to decode a media bitstream to generate a decoded media bitstream, wherein a representation of the one or more media signals is included in the media bitstream (col. 4, lines 6-15, process the sound signal in accordance to the sound type, wherein the tag enables the audio processing unit to select a decoder in accordance to the sound type).

Regarding claim 9, Spittle et al. discloses the apparatus of claim 8, further comprising an audio player coupled to the decoder and to the processor, the audio player configured to play the decoded media  

Regarding claim 11, Spittle et al. discloses the apparatus of claim 1, further comprising a display screen coupled to the processor, the display screen configured to display an arrangement in space of each source of the one or more media signals (FIG. 2 shows an example Graphical User Interface (GUI) for a scene manager of an audio conference system).

Regarding claims 16, 17, 25 and 29, Spittle et al. discloses wherein the audio content for a particular audio signal included in the one or more media signals indicates a melody associated with the particular audio signal, a type of instrument associated with the particular audio signal, a genre of music associated with the particular audio signal, or a combination thereof; wherein the complementary audio content includes musical content that accompanies the audio content (col. 4, lines 58-64 The conference controller is configured to determine an interruption (e.g. a mute status) of the upstream audio signal assigned to the first talker location and the conference controller may be configured to assign the sound signal to the first talker location (thereby clearly indicating which upstream audio signal has been interrupted. Sound signals may include segments of music that are used while waiting to join a conference or when a listener's stream has been temporarily deactivated, disconnected or muted (col. 25, lines 50-52). The audio content of the audio signal, such as a mute signal, is associated with segments of music for indicating the interrupt.

Regarding claims 18, 19 and 26, Spittle et al. discloses the audio content including metadata of a speaker (col. 2, lines 37-42, the conference controller is configured to generate metadata identifying the assigned talker location and enabling an audio processing unit (at a listener's terminal) to generate a spatialized audio signal based on the upstream audio signal).

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 12 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent No. 9,654,644 (“Spittle et al.”) in view of U.S. Publication No. 2009/0080632 (“Zhang et al.”)

Regarding claim 12, Spittle et al. does not specify the apparatus of claim 1, further comprising one or more speakers coupled to the processor, the one or more speakers configured to playback the audio. 
	In a similar field of endeavor, Zhang et al. also discusses a system for spatializing audio in an audio conference between a local party and one or more remote parties. The conferencing device includes an audio output device comprising a plurality of loudspeakers (Fig. 2).  A playback module for processing each audio data stream received from the remote parties to generate a different audio signal for each loudspeaker, and for rendering each audio signal through its respective loudspeaker to produce a spatial audio sound-field which is audibly perceived by each conference in the local party ([0057]).
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate speakers configured to playback the audio as disclosed by Zhang et al. in order to render audio to teleconference participants who are physically located at the conference venue in addition to the remote parties.

Regarding claim 20, Spittle et al. does not specify the apparatus of claim 1, wherein the processor is further configured to determine a direction-of-arrival for each media signal of the one or more media signals, the spatial location for each source based on the direction-of-arrival of a corresponding media signal.
	In a similar field of endeavor, Zhang et al. also discusses spatial audio conferencing. Fig. 6 illustrates a sound-field capture device 600 which includes at least three directional microphones symmetrically configured and placed at a conferencing venue. Each microphone includes a directional sound capture element ([0006]). Appropriate captured sound-source location methods are then used to process 222 the two or more different audio signals in order to calculate the particular location within the captured sound-field 228/234/236 that the audio (such as the conferences in the party 210/214/216 that is currently talking) is principally emanating from at each point in time and generate the corresponding direction ID 306 ([0049]).
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to implement the sound field capture device to determine a direction-of-arrival for each media signal as disclosed by Zhang et al. in order to capture media signals for spatial processing corresponding to teleconference participants who are physically located at the conference venue in addition to the remote parties.

Claims 21 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent No. 9,654,644 (“Spittle et al.”) in view of GB 2562518 (“Eronen et al.”).

Regarding claims 21 and 22, Spittle et al. does not specify the apparatus of claim 1, wherein the processor is further configured to input the identified spatial locations into an adaptation block to 
	Eronen et al. also discloses spatial audio processing. Ambiance generator generates complementary audio signals is based on input audio signals. Placement of the generated signals based on assigning weights using one or more analysis steps (page 26). A classifier serves to classify the respective signals into one of predefined classes such as noise, speech, music, etc. The classifier is pre-trained using suitable training data that represents signals in the above-mentioned predefined classes. In this regard, a suitable classifier known in the art, such as a deep neural network, may be employed (page 28, lines 1-10,).
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to apply a neural network adaptation block in the invention of Spittle et al. in order to dynamically train the spatial audio processing method in selecting the ideal locations for each classified audio signal.

Claims 7 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent No. 9,654,644 (“Spittle et al.”) in view of U.S. Publication No. 2017/0364752 (“Zhou et al.”).

Regarding claim 7, Spittle et al. does not specify one or more cameras coupled to the processor, the one or more cameras configured to capture one or more images included in the one or more media signals.  
	In a similar field of endeavor, Zhou et al. discloses sound and object tracking.  Zhou et al. discloses in para. [0034] media metadata may be (e.g., automatically, programmatically, in real time, in non-real-time, etc.) generated based on sound objects and/or video objects representing real-world people or objects. The media metadata may comprise audio metadata, image metadata, metadata related to both audio and video, etc., that includes positional and motion information related to the 
	Thus it would have been obvious to a person of ordinary skill in the art before effective filing date of the claimed invention to capture audio and visual data for producing a media scene because this technique can be used in mapping sound elements for video conferencing calls in addition to teleconference calls. 

Regarding claim 10, Spittle et al. does not specify the apparatus of claim 9, further comprising a video player coupled to the decoder and to the processor, the video player configured to play the decoded media bitstream to generate one or more reconstructed images.  
	In a similar field of endeavor, Zhou et al. discloses sound and object tracking.  Zhou et al. discloses [0034] Media metadata may be generated based on sound objects and/or video objects representing real-world people or objects. The media metadata may comprise audio metadata, image metadata, metadata related to both audio and video, etc. captured by cameras and microphones. In block 504 of fig. 5, the media system (100) generates, based on the one or more sound objects and the one or more video objects, one or more candidate salient objects ([0128], also claim 6). 
	Thus it would have been obvious to a person of ordinary skill in the art before effective filing date of the claimed invention to capture audio and visual data for producing a media scene because this technique can be used in mapping sound elements for video conferencing calls in addition to teleconference calls.
Claims 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent No. 9,654,644 (“Spittle et al.”) in view of U.S. Publication No. 2019/0166674 (“Mason et al.”).

Regarding claims 13 and 14, Spittle et al. does not specify the apparatus of claim 12, further comprising a supplementary device configured to activate in response to a particular speaker of the one or more speakers outputting sound, wherein the supplementary device comprises a light and the activation of the supplementary device comprises illumination of the light. 
	In a similar field of endeavor, Mason et al. discusses an ambiance control system. Mason et al. discloses a supplementary device proximate to the particular speaker or integrated within the particular speaker, wherein the supplementary device comprises a light, and wherein activation of the supplementary device comprises illumination of the light ([0032]). The entertainment system 20 is arranged to output audio and/or video media into said environment to be consumed by at least one of the one or more users. The ambience control system 10 may optionally comprise one or more sensors. The sensors may comprise one or more ambient light sensors 32 for detecting an ambience information indicative of the ambient lighting in said environment, and/or the sensors may comprise one or more positioning devices 34 for detecting the locations of each of the one or more luminaires 42, 44.
	Thus it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to configure a supplemental device such as ambient lighting with the loudspeakers in Spittle et al. in order to enhance the teleconference such as during a presentation. For example, as disclosed by Mason et al. a romantic movie may require illumination to be soft and the luminaires to be located behind the viewers, whereas an action movie may be best having the lights positioned on either side of the screen to maximize the visual effect of explosions appearing on the screen ([0030]). 

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent No. 9,654,644 (“Spittle et al.”) in view of U.S. Publication No. 2019/0166674 (“Mason et al.”) and further in view of U.S. Patent No. 10,026,229 (“Yalniz et al.”).

Regarding claim 15, Spittle et al. in view of Mason et al. is silent regarding the apparatus of claim 13, wherein the supplementary device comprises a virtual assistant, and wherein activation of the supplementary device comprises generation of complementary sound.  
	In a similar field of endeavor Yalniz et al. discloses an auxiliary device used as an augmented reality platform. Yalniz et al. discloses a virtual assistant can also interact with the user by the application recognizing the user's voice and emotions using data acquired through the camera and/or microphone of the device. The virtual assistant can also provide entertainment, such as by telling jokes or dancing, and can also function as advertisements in some situations, such as where the virtual character is rendered as a movie or cartoon character (col. 9, lines 44-64). The image and audio data from the device trigger the virtual assistant on the supplementary device which provides complementary sound in the form of an augmented reality platform.
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to pair the system of Spittle et al. with a virtual assistant as disclosed by Yalniz et al. because having a 3D rendered animated virtual assistant in the teleconference. Call participants can browse or select from among a bunch of popular, advertised, or sponsored items through this interface (Yalniz). 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIRAPON INTAVONG whose telephone number is (571)270-7491.  The examiner can normally be reached on Monday to Friday, 10:00AM-6:00PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached on 571-272-7488.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/JIRAPON INTAVONG/Examiner, Art Unit 2652   



/AHMAD F. MATAR/Supervisory Patent Examiner, Art Unit 2652