DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

EXAMINER'S AMENDMENT
1.	An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in an interview with Gary S. Williams on 8/11/2022.

The application has been amended as follows: 

29.	A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions that, when executed by an electronic device that is in communication with one or more display devices, one or more input devices, and a set of one or more wearable audio output devices, cause the electronic device to:
display, via the one or more display devices, a user interface including respective dynamic visual representations of a plurality of participants in a communication session, including displaying, at a first location in the user interface, a first dynamic visual representation of a first participant and displaying, at a second location in the user interface, a second dynamic visual representation of a second participant different from the first participant;
output, via the set of one or more wearable audio output devices, audio from the plurality of participants in the communication session, wherein outputting audio from the plurality of participants in the 
	outputting first audio from the first participant, wherein the first audio is adjusted so as to maintain the first audio at a first simulated spatial location relative to a frame of reference of the communication session independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the first simulated spatial location corresponds to the first location of the first dynamic visual representation in the user interface; and
	outputting second audio from the second participant, wherein the second audio is adjusted, so as to maintain the second audio at a second simulated spatial location relative to the frame of reference independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the second simulated spatial location corresponds to the second location of the second dynamic visual representation in the user interface;
receive, via the one or more input devices, an input selecting the first dynamic visual representation of the first participant; and
in response to receiving the input selecting the first dynamic visual representation of the first participant:
	display the first dynamic visual representation of the first participant at a third location, different from the first location, in the user interface, and outputting the first audio from the first participant so as to position the first audio at a third simulated spatial location, relative to the frame of reference, that corresponds to the third location of the first dynamic visual representation in the user interface, wherein the third simulated spatial location is different from the first simulated spatial location; and
	display the second dynamic visual representation of the second participant at a fourth location in the user interface, and outputting the second audio from the second participant so as to position the second audio at a fourth simulated spatial location, relative to the frame of reference, that corresponds to the fourth location of the second dynamic visual representation in the user interface.

 The electronic device of claim 28, wherein the fourth location is different from the second location in the user interface, and the fourth simulated spatial location is different from the second simulated spatial location.

91.	The electronic device of claim 90, wherein:
the third location in the user interface is in a respective direction relative to the first location in the user interface, and the fourth location in the user interface is in the respective direction relative to the second location in the user interface; and
the third simulated spatial location is substantially in the respective direction relative to the first simulated spatial location, and the fourth simulated spatial location is in substantially the respective direction relative to the second simulated spatial location.

92.	The electronic device of claim 28, wherein:
the first location in the user interface is outside a respective region of the user interface;
the second location in the user interface is within the respective region;
the third location in the user interface is within the respective region; and
the fourth location in the user interface is outside the respective region.

93.	The electronic device of claim 28, wherein the third simulated spatial location is closer to a spatial location of the set of one or more wearable audio output devices than the first simulated spatial location and the second simulated spatial location, and the one or more programs include instructions for, in response to receiving the input selecting the first dynamic visual representation of the first participant, emphasizing the first audio from the first participant relative to the second audio from the second participant.

94.	The electronic device of claim 93, wherein, in response to receiving the input selecting the first dynamic visual representation of the first participant, the first audio is output so as to be positioned at the third simulated spatial location for a first predefined amount of time, and the one or more programs include instructions for, after the first audio is output so as to be positioned at the third simulated spatial location for the first predefined amount of time, outputting the first audio at a simulated spatial location that is further from the spatial location of the set of one or more wearable audio output devices than the third simulated spatial location.

95.	The electronic device of claim 93, wherein the input selecting the first dynamic visual representation of the first participant includes selection of a representation of a user of the electronic device, displayed in the user interface, and a drag gesture to move the representation of the user in the user interface to within a predefined distance of the first dynamic visual representation of the first participant, and the one or more programs include instructions for, in response to receiving the input selecting the first dynamic visual representation of the first participant, displaying an indication that the first dynamic visual representation of the first participant is associated with the representation of the user.

96. 	The electronic device of claim 93, wherein the one or more programs include instructions for:
in response to receiving the input selecting the first dynamic visual representation of the first participant, transmitting, to the first participant, a request to establish a partial communication session between the user and the first participant; and
in response to receiving an indication of the first participant accepting the request to establish the partial communication session, displaying an indication that the first dynamic visual representation of the first participant is associated with the representation of the user.  

97.	The electronic device of claim 28, wherein receiving the input selecting the first dynamic visual representation of the first participant includes detecting, via the one or more input devices, the input at an initial location corresponding to the first dynamic visual representation and continuing to detect the input at the initial location for at least a second predefined amount of time.

98.  	 The electronic device of claim 28, wherein receiving the input selecting the first dynamic visual representation of the first participant includes concurrently detecting, via the one or more input devices, a first input point concurrently with a second input point, and movement of the first input point towards or away from the second input point. 

99.	The electronic device of claim 28, wherein the input selecting the first dynamic visual representation of the first participant includes selection of the first dynamic visual representation while the first dynamic visual representation is displayed at the first location in the user interface and a drag gesture to move the first dynamic visual representation to the third location in the user interface.

100.	The electronic device of claim 28, wherein the one or more programs include instructions for scrolling the user interface in response to detecting movement of the electronic device relative to a physical environment in which the electronic device is located, wherein, while the first dynamic visual representation is displayed at the first location in the user interface, the first dynamic visual representation of the first participant is outside a focal region of the user interface, and receiving the input selecting the first dynamic visual representation of the first participant includes detecting movement of the electronic device to scroll the user interface such that the first dynamic visual representation is within the focal region of the user interface.

101.	The electronic device of claim 28, wherein the audio from the plurality of participants in the communication session is output at a plurality of simulated spatial locations relative to the frame of reference, and the one or more programs include instructions for:
receiving a request to output, in the communication session, respective audio from the electronic device; and
in response to receiving the request to output the respective audio, outputting, via the set of one or more wearable audio output devices, the respective audio, including adjusting the respective audio so as to maintain the respective audio at a first respective simulated spatial location, relative to the frame of reference, that is different from the plurality of simulated spatial locations.

102.	The electronic device of claim 101, wherein the respective audio is output via the communication session to the plurality of participants.

103.	The electronic device of claim 101, wherein the one or more programs include instructions for:
receiving a request to reposition the respective audio; and
in response to receiving the request to reposition the respective audio, outputting the respective audio, including adjusting the respective audio so as to maintain the respective audio at a second respective simulated spatial location, relative to the frame of reference, that is different from the plurality of simulated spatial locations and different from the first respective simulated spatial location.
104. 	The electronic device of claim 101, wherein a distance between respective simulated spatial locations of any two participants of the plurality of participants in the communication session is less than a distance between a simulated spatial location of the respective audio and a simulated spatial location of any respective participant of the plurality of participants.

105.	The electronic device of claim 28, wherein the one or more programs include instructions for:
detecting movement of the set of one or more wearable audio output devices relative to the frame of reference without detecting movement of the electronic device and without detecting input directed to the user interface; and
in response to detecting the movement of the set of one or more wearable audio output devices relative to the frame of reference, outputting respective audio from the plurality of participants in the communication session at respective simulated spatial locations, including adjusting the respective audio from the plurality of participants as the set of one or more wearable audio output devices moves relative to the frame of reference so as to maintain the respective audio at the respective simulated spatial locations relative to the frame of reference independently of a position of the set of one or more wearable audio output devices relative to the frame of reference.

106.	The electronic device of claim 28, wherein the electronic device is in communication with one or more cameras in a physical environment, and the one or more programs include instructions for displaying in the user interface, via the one or more display devices, a representation of a portion of the physical environment that is in a field of view of the one or more cameras, wherein the respective dynamic visual representations of the plurality of participants in the communication session are displayed at respective locations in the representation of the field of view of the one or more cameras corresponding to respective physical locations in the physical environment.
107.	The electronic device of claim 28, wherein the one or more programs include instructions for:
displaying the first dynamic visual representation of the first participant with a first size, wherein the first size is determined in accordance with a distance between a current simulated spatial location of the first audio from the first participant and a spatial location of the set of one or more wearable audio output devices; and
displaying the second dynamic visual representation of the second participant with a second size, wherein the second size is determined in accordance with a distance between a current simulated spatial location of the second audio from the second participant and the spatial location of the set of one or more wearable audio output devices.

108.	 The electronic device of claim 28, wherein the one or more programs include instructions for displaying in the user interface, in a roster region of the user interface, separate from locations at which the first dynamic visual representation of the first participant and the second dynamic visual representation of the second participant are displayed, dynamic visual representations of third and fourth participants in the communication session, and outputting, via the set of one or more audio output devices, audio from the third and fourth participants, wherein the audio from the third and fourth participants is deemphasized relative to audio output from the first and second participants.

109.	The electronic device of claim 108, wherein the audio from the third and fourth participants is deemphasized relative to audio output from the first and second participants by outputting the audio from the third and fourth participants with a different ratio of direct to reverberated sound than a ratio of direct to reverberated sound for the first participant and second participant. 

110.	 The electronic device of claim 108, wherein the audio from the third and fourth participants is deemphasized relative to audio output from the first and second participants by outputting the audio from the third and fourth participants at one or more simulated spatial locations that are further from a first spatial location, for the electronic device, than simulated spatial locations at which audio from the first participant and second participant are output. 

111.	 The electronic device of claim 108, wherein the audio from the third and fourth participants is deemphasized relative to audio output from the first and second participants by changing volume of the audio output from the third and fourth participants relative to volume of the audio output from the first participant and second participant. 

112.	The electronic device of claim 108, wherein the one or more programs include instructions for identifying the third and fourth participants as roster participants, in accordance with predefined characteristics of the audio from the third and fourth participants meeting predefined roster criteria, and in accordance with the identification of the third and fourth participants as roster participants, displaying the dynamic visual representations of third and fourth participants in the roster region of the user interface and deemphasizing the audio output, via the set of one or more audio output devices, from the third and fourth participants, relative to audio output from the first and second participants.

113.	 The electronic device of claim 28, wherein the one or more programs include instructions for determining one or more metrics of overlapping audio from two or more participants in the communication session, and in accordance with a determination that the one or more metrics of overlapping audio meet predefined overlap criteria, positioning simulated spatial locations of audio from the plurality of participants at wider angles, relative to each other, than default angles at which the simulated spatial locations of audio from the plurality of participants are positioned in accordance with a determination that the one or more metrics of overlapping audio do not meet the predefined overlap criteria. 

114.	The electronic device of claim 28, wherein 
the first audio from the first participant is output at the first simulated spatial location and the second audio from the second participant is output at the second simulated spatial location while the one or more display devices of the electronic device has a first orientation relative to a physical environment in which the electronic device is located; and
the one or more programs include instructions for:
	detecting a change in the orientation of the electronic device from the first orientation to a second orientation relative to the physical environment in which the electronic device is located, and
	in response to detecting the change in the orientation of the electronic device from the first orientation to a second orientation relative to the physical environment in which the electronic device is located, changing at least one of a simulated spatial location at which the first audio from the first participant is output and a simulated spatial location at which the second audio from the second participant is output so as to change a distance between the simulated spatial locations at which the first audio from the first participant is output and the simulated spatial location at which the second audio from the second participant is output.

115.	The electronic device of claim 114, wherein the one or more programs include instructions for, in response to detecting the change in the orientation of the electronic device from the first orientation to a second orientation relative to the physical environment in which the electronic device is located, changing at least one of a location of the first dynamic visual representation of the first participant in the user interface and a location of the second dynamic visual representation of the second participant in the user interface so as to change a distance between the locations of the first dynamic visual representation of the first participant and the second dynamic visual representation of the second participant in the user interface.

116.	The non-transitory computer readable storage medium of claim 29, wherein the fourth location is different from the second location in the user interface, and the fourth simulated spatial location is different from the second simulated spatial location.

117.	The non-transitory computer readable storage medium of claim 116, wherein:
the third location in the user interface is in a respective direction relative to the first location in the user interface, and the fourth location in the user interface is in the respective direction relative to the second location in the user interface; and
the third simulated spatial location is substantially in the respective direction relative to the first simulated spatial location, and the fourth simulated spatial location is in substantially the respective direction relative to the second simulated spatial location.

118.	 The non-transitory computer readable storage medium of claim 29, wherein:
the first location in the user interface is outside a respective region of the user interface;
the second location in the user interface is within the respective region;
the third location in the user interface is within the respective region; and
the fourth location in the user interface is outside the respective region.

119.	The non-transitory computer readable storage medium of claim 29, wherein the third simulated spatial location is closer to a spatial location of the set of one or more wearable audio output devices than the first simulated spatial location and the second simulated spatial location, and the one or more programs comprising instructions that, when executed by the electronic device, cause the electronic device to, in response to receiving the input selecting the first dynamic visual representation of the first participant, emphasize the first audio from the first participant relative to the second audio from the second participant.

120.	The non-transitory computer readable storage medium of claim 119, wherein, in response to receiving the input selecting the first dynamic visual representation of the first participant, the first audio is output so as to be positioned at the third simulated spatial location for a first predefined amount of time, and the one or more programs comprising instructions that, when executed by the electronic device, cause the electronic device to, after the first audio is output so as to be positioned at the third simulated spatial location for the first predefined amount of time, output the first audio at a simulated spatial location that is further from the spatial location of the set of one or more wearable audio output devices than the third simulated spatial location.

121.	The non-transitory computer readable storage medium of claim 119, wherein the input selecting the first dynamic visual representation of the first participant includes selection of a representation of a user of the electronic device, displayed in the user interface, and a drag gesture to move the representation of the user in the user interface to within a predefined distance of the first dynamic visual representation of the first participant, and the one or more programs comprising instructions that, when executed by the electronic device, cause the electronic device to, in response to receiving the input selecting the first dynamic visual representation of the first participant, display an indication that the first dynamic visual representation of the first participant is associated with the representation of the user.

122.	The non-transitory computer readable storage medium of claim 119, wherein the one or more programs comprising instructions that, when executed by the electronic device, cause the electronic device to:
in response to receiving the input selecting the first dynamic visual representation of the first participant, transmit, to the first participant, a request to establish a partial communication session between the user and the first participant; and
in response to receiving an indication of the first participant accepting the request to establish the partial communication session, display an indication that the first dynamic visual representation of the first participant is associated with the representation of the user.  

123.	 The non-transitory computer readable storage medium of claim 29, wherein receiving the input selecting the first dynamic visual representation of the first participant includes detecting, via the one or more input devices, the input at an initial location corresponding to the first dynamic visual representation and continuing to detect the input at the initial location for at least a second predefined amount of time.

124.	 The non-transitory computer readable storage medium of claim 29, wherein receiving the input selecting the first dynamic visual representation of the first participant includes concurrently detecting, via the one or more input devices, a first input point concurrently with a second input point, and movement of the first input point towards or away from the second input point. 

125.	The non-transitory computer readable storage medium of claim 29, wherein the input selecting the first dynamic visual representation of the first participant includes selection of the first dynamic visual representation while the first dynamic visual representation is displayed at the first location in the user interface and a drag gesture to move the first dynamic visual representation to the third location in the user interface.
126.	The non-transitory computer readable storage medium of claim 29, wherein the one or more programs comprise instructions that, when executed by the electronic device, cause the electronic device to scroll the user interface in response to detecting movement of the electronic device relative to a physical environment in which the electronic device is located, wherein, while the first dynamic visual representation is displayed at the first location in the user interface, the first dynamic visual representation of the first participant is outside a focal region of the user interface, and receiving the input selecting the first dynamic visual representation of the first participant includes detecting movement of the electronic device to scroll the user interface such that the first dynamic visual representation is within the focal region of the user interface.

127.	The non-transitory computer readable storage medium of claim 29, wherein the audio from the plurality of participants in the communication session is output at a plurality of simulated spatial locations relative to the frame of reference, and the one or more programs comprise instructions that, when executed by the electronic device, cause the electronic device to:
receive a request to output, in the communication session, respective audio from the electronic device; and
in response to receiving the request to output the respective audio, output, via the set of one or more wearable audio output devices, the respective audio, including adjusting the respective audio so as to maintain the respective audio at a first respective simulated spatial location, relative to the frame of reference, that is different from the plurality of simulated spatial locations.

128.	The non-transitory computer readable storage medium of claim 127, wherein the respective audio is output via the communication session to the plurality of participants.
129.	 The non-transitory computer readable storage medium of claim 127, wherein the one or more programs comprise instructions that, when executed by the electronic device, cause the electronic device to:
receive a request to reposition the respective audio; and
in response to receiving the request to reposition the respective audio, output the respective audio, including adjusting the respective audio so as to maintain the respective audio at a second respective simulated spatial location, relative to the frame of reference, that is different from the plurality of simulated spatial locations and different from the first respective simulated spatial location.

130.	The non-transitory computer readable storage medium of claim 127, wherein a distance between respective simulated spatial locations of any two participants of the plurality of participants in the communication session is less than a distance between a simulated spatial location of the respective audio and a simulated spatial location of any respective participant of the plurality of participants.

131.	The non-transitory computer readable storage medium of claim 29, wherein the one or more programs comprising instructions that, when executed by the electronic device, cause the electronic device to:
detect movement of the set of one or more wearable audio output devices relative to the frame of reference without detecting movement of the electronic device and without detecting input directed to the user interface; and
in response to detecting the movement of the set of one or more wearable audio output devices relative to the frame of reference, output respective audio from the plurality of participants in the communication session at respective simulated spatial locations, including adjusting the respective audio from the plurality of participants as the set of one or more wearable audio output devices moves relative to the frame of reference so as to maintain the respective audio at the respective simulated spatial locations relative to the frame of reference independently of a position of the set of one or more wearable audio output devices relative to the frame of reference.

132.	The non-transitory computer readable storage medium of claim 29, wherein the electronic device is in communication with one or more cameras in a physical environment, and the one or more programs comprising instructions that, when executed by the electronic device, cause the electronic device to display in the user interface, via the one or more display devices, a representation of a portion of the physical environment that is in a field of view of the one or more cameras, wherein the respective dynamic visual representations of the plurality of participants in the communication session are displayed at respective locations in the representation of the field of view of the one or more cameras corresponding to respective physical locations in the physical environment.

133. 	The non-transitory computer readable storage medium of claim 29, wherein the one or more programs comprising instructions that, when executed by the electronic device, cause the electronic device to:
display the first dynamic visual representation of the first participant with a first size, wherein the first size is determined in accordance with a distance between a current simulated spatial location of the first audio from the first participant and a spatial location of the set of one or more wearable audio output devices; and
display the second dynamic visual representation of the second participant with a second size, wherein the second size is determined in accordance with a distance between a current simulated spatial location of the second audio from the second participant and the spatial location of the set of one or more wearable audio output devices.

134.	The non-transitory computer readable storage medium of claim 29, wherein the one or more programs comprising instructions that, when executed by the electronic device, cause the electronic device to display in the user interface, in a roster region of the user interface, separate from locations at which the first dynamic visual representation of the first participant and the second dynamic visual representation of the second participant are displayed, dynamic visual representations of third and fourth participants in the communication session, and outputting, via the set of one or more audio output devices, audio from the third and fourth participants, wherein the audio from the third and fourth participants is deemphasized relative to audio output from the first and second participants.

135.	The non-transitory computer readable storage medium of claim 134, wherein the audio from the third and fourth participants is deemphasized relative to audio output from the first and second participants by outputting the audio from the third and fourth participants with a different ratio of direct to reverberated sound than a ratio of direct to reverberated sound for the first participant and second participant. 

136.	The non-transitory computer readable storage medium of claim 134, wherein the audio from the third and fourth participants is deemphasized relative to audio output from the first and second participants by outputting the audio from the third and fourth participants at one or more simulated spatial locations that are further from a first spatial location, for the electronic device, than simulated spatial locations at which audio from the first participant and second participant are output. 

137.	The non-transitory computer readable storage medium of claim 134, wherein the audio from the third and fourth participants is deemphasized relative to audio output from the first and second participants by changing volume of the audio output from the third and fourth participants relative to volume of the audio output from the first participant and second participant. 
138.	The non-transitory computer readable storage medium of claim 134, wherein the one or more programs comprising instructions that, when executed by the electronic device, cause the electronic device to identify the third and fourth participants as roster participants, in accordance with predefined characteristics of the audio from the third and fourth participants meeting predefined roster criteria, and in accordance with the identification of the third and fourth participants as roster participants, display the dynamic visual representations of third and fourth participants in the roster region of the user interface and deemphasizing the audio output, via the set of one or more audio output devices, from the third and fourth participants, relative to audio output from the first and second participants.

139. The non-transitory computer readable storage medium of claim 29, wherein the one or more programs comprising instructions that, when executed by the electronic device, cause the electronic device to determine one or more metrics of overlapping audio from two or more participants in the communication session, and in accordance with a determination that the one or more metrics of overlapping audio meet predefined overlap criteria, position simulated spatial locations of audio from the plurality of participants at wider angles, relative to each other, than default angles at which the simulated spatial locations of audio from the plurality of participants are positioned in accordance with a determination that the one or more metrics of overlapping audio do not meet the predefined overlap criteria. 

140.	The non-transitory computer readable storage medium of claim 29, wherein 
the first audio from the first participant is output at the first simulated spatial location and the second audio from the second participant is output at the second simulated spatial location while the one or more display devices of the electronic device has a first orientation relative to a physical environment in which the electronic device is located; and
the one or more programs comprising instructions that, when executed by the electronic device, cause the electronic device to:
	detect a change in the orientation of the electronic device from the first orientation to a second orientation relative to the physical environment in which the electronic device is located, and
	in response to detecting the change in the orientation of the electronic device from the first orientation to a second orientation relative to the physical environment in which the electronic device is located, change at least one of a simulated spatial location at which the first audio from the first participant is output and a simulated spatial location at which the second audio from the second participant is output so as to change a distance between the simulated spatial locations at which the first audio from the first participant is output and the simulated spatial location at which the second audio from the second participant is output.

141.	The non-transitory computer readable storage medium of claim 140, wherein the one or more programs comprising instructions that, when executed by the electronic device, cause the electronic device to, in response to detecting the change in the orientation of the electronic device from the first orientation to a second orientation relative to the physical environment in which the electronic device is located, change at least one of a location of the first dynamic visual representation of the first participant in the user interface and a location of the second dynamic visual representation of the second participant in the user interface so as to change a distance between the locations of the first dynamic visual representation of the first participant and the second dynamic visual representation of the second participant in the user interface.




Allowable Subject Matter
2.	Claims 1-29, 90-141 are allowed.

The following is an examiner’s statement of reasons for allowance: 

Regarding claim 1 in combination with other limitations of the claims the prior art of record fails to disclose or specifically suggest a method, comprising: at an electronic device that is in communication with one or more display devices, one or more input devices, and a set of one or more wearable audio output devices: displaying, via the one or more display devices, a user interface including respective dynamic visual representations of a plurality of participants in a communication session, including displaying, at a first location in the user interface, a first dynamic visual representation of a first participant and displaying, at a second location in the user interface, a second dynamic visual representation of a second participant different from the first participant; outputting, via the set of one or more wearable audio output devices, audio from the plurality of participants in the communication session, including: outputting first audio from the first participant, wherein the first audio is adjusted so as to maintain the first audio at a first simulated spatial location relative to a frame of reference of the communication session independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the first simulated spatial location corresponds to the first location of the first dynamic visual representation in the user interface; and outputting second audio from the second participant, wherein the second audio is adjusted, so as to maintain the second audio at a second simulated spatial location relative to the frame of reference independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the second simulated spatial location corresponds to the second location of the second dynamic visual representation in the user interface; receiving, via the one or more input devices, an input selecting the first dynamic visual representation of the first participant; in response to receiving the input selecting the first dynamic visual representation of the first participant: P51372US1/063266-7725-US2 Preliminary Amendmentdisplaying the first dynamic visual representation of the first participant at a third location, different from the first location, in the user interface, and outputting the first audio from the first participant so as to position the first audio at a third simulated spatial location, relative to the frame of reference, that corresponds to the third location of the first dynamic visual representation in the user interface, wherein the third simulated spatial location is different from the first simulated spatial location; and displaying the second dynamic visual representation of the second participant at a fourth location in the user interface, and outputting the second audio from the second participant so as to position the second audio at a fourth simulated spatial location, relative to the frame of reference, that corresponds to the fourth location of the second dynamic visual representation in the user interface, when taking the claim as a whole.
The following is an examiner’s statement of reason for allowance: 
The claims filed on (11/16/2021) provide specific functions and features that are not addressed by the combined references on record  Libin (US 11,228,625), Samadani (US 2010/0309284) and Diamant (US 2019/0341050). Those references describe, teach and suggest a video conference configuration that connect many dozens or even hundreds of individual and group participants. Individual participants may be typically located at their desks in business or home offices, at internet cafes or ordinary cafes or at other individual conferencing friendly spaces before video cameras of their own or rented/borrowed notebook computers, tablets or other mobile devices. Group participants may join the meeting from one or several video-enabled conference rooms, dedicated open areas in offices and other group conferencing friendly spaces. Video streams from all individual or group participant may be transferred to video conferencing server(s) located in the cloud or on premises. The server may process and integrate received video streams and distribute the integrated stream to viewing devices, such as individual notebooks, tablets and other personal devices or to display monitors of video conferencing systems in participating conference rooms. In addition to video streams, the system may register and receive data from other sources, such as smart watches, bracelets and other wearable devices owned by meeting participants. Additionally, the system may keep individual and team profiles of meeting participants, including their emotional types. For individual meeting participants sitting right before video cameras of their notebooks or other devices, capturing and pre-processing video streams may be straightforward as there is only one participant within the stream. The situation may be different for meeting rooms where one or several video conferencing cameras may be installed. In such case, the system may identify individual participants within a sequence of scenes captured by a video camera and track the participants during the meeting, including periods of absence and re-appearance of the participants when they are leaving and re-entering a meeting room and re-positioning of the participants when they are moving within the room. However, by Applicant’s claims have been distinguished from the combination of  Libin, Samadani, Diamant. Those references do not describe, teach or suggest the concepts of at an electronic device that is in communication with one or more display devices, one or more input devices, and a set of one or more wearable audio output devices: displaying, via the one or more display devices, a user interface including respective dynamic visual representations of a plurality of participants in a communication session, including displaying, at a first location in the user interface, a first dynamic visual representation of a first participant and displaying, at a second location in the user interface, a second dynamic visual representation of a second participant different from the first participant; outputting, via the set of one or more wearable audio output devices, audio from the plurality of participants in the communication session, including: outputting first audio from the first participant, wherein the first audio is adjusted so as to maintain the first audio at a first simulated spatial location relative to a frame of reference of the communication session independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the first simulated spatial location corresponds to the first location of the first dynamic visual representation in the user interface; and outputting second audio from the second participant, wherein the second audio is adjusted, so as to maintain the second audio at a second simulated spatial location relative to the frame of reference independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the second simulated spatial location corresponds to the second location of the second dynamic visual representation in the user interface; receiving, via the one or more input devices, an input selecting the first dynamic visual representation of the first participant; in response to receiving the input selecting the first dynamic visual representation of the first participant: P51372US1/063266-7725-US2 Preliminary Amendmentdisplaying the first dynamic visual representation of the first participant at a third location, different from the first location, in the user interface, and outputting the first audio from the first participant so as to position the first audio at a third simulated spatial location, relative to the frame of reference, that corresponds to the third location of the first dynamic visual representation in the user interface, wherein the third simulated spatial location is different from the first simulated spatial location; and displaying the second dynamic visual representation of the second participant at a fourth location in the user interface, and outputting the second audio from the second participant so as to position the second audio at a fourth simulated spatial location, relative to the frame of reference, that corresponds to the fourth location of the second dynamic visual representation in the user interface. Thus the prior art on record in correlation with amended claim language does not disclose claim as whole. For the foregoing reason, the claims are allowable over the cited prior art.


Regarding claim 28 in combination with other limitations of the claims the prior art of record fails to disclose or specifically suggest an electronic device that is in communication with one or more display devices, one or more input devices, and a set of one or more wearable audio output devices, the electronic device comprising: one or more processors; and memory storing one or more programs, wherein the one or more programs are configured to be executed by the one or more processors, the one or more programs including instructions for: displaying, via the one or more display devices, a user interface including respective dynamic visual representations of a plurality of participants in a communication session, including displaying, at a first location in the user interface, a first dynamic visual representation of a first participant and displaying, at a second location in the user interface, a second dynamic visual representation of a second participant different from the first participant; outputting, via the set of one or more wearable audio output devices, audio from the plurality of participants in the communication session, including: P51372US1/063266-7725-US9 Preliminary Amendmentoutputting first audio from the first participant, wherein the first audio is adjusted so as to maintain the first audio at a first simulated spatial location relative to a frame of reference of the communication session independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the first simulated spatial location corresponds to the first location of the first dynamic visual representation in the user interface; and outputting second audio from the second participant, wherein the second audio is adjusted, so as to maintain the second audio at a second simulated spatial location relative to the frame of reference independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the second simulated spatial location corresponds to the second location of the second dynamic visual representation in the user interface; receiving, via the one or more input devices, an input selecting the first dynamic visual representation of the first participant; in response to receiving the input selecting the first dynamic visual representation of the first participant: displaying the first dynamic visual representation of the first participant at a third location, different from the first location, in the user interface, and outputting the first audio from the first participant so as to position the first audio at a third simulated spatial location, relative to the frame of reference, that corresponds to the third location of the first dynamic visual representation in the user interface, wherein the third simulated spatial location is different from the first simulated spatial location; and displaying the second dynamic visual representation of the second participant at a fourth location in the user interface, and outputting the second audio from the second participant so as to position the second audio at a fourth simulated spatial location, relative to the frame of reference, that corresponds to the fourth location of the second dynamic visual representation in the user interface, when taking the claim as a whole.
The following is an examiner’s statement of reason for allowance: 
The claims filed on (11/16/2021) provide specific functions and features that are not addressed by the combined references on record  Libin (US 11,228,625), Samadani (US 2010/0309284) and Diamant (US 2019/0341050). Those references describe, teach and suggest a video conference configuration that connect many dozens or even hundreds of individual and group participants. Individual participants may be typically located at their desks in business or home offices, at internet cafes or ordinary cafes or at other individual conferencing friendly spaces before video cameras of their own or rented/borrowed notebook computers, tablets or other mobile devices. Group participants may join the meeting from one or several video-enabled conference rooms, dedicated open areas in offices and other group conferencing friendly spaces. Video streams from all individual or group participant may be transferred to video conferencing server(s) located in the cloud or on premises. The server may process and integrate received video streams and distribute the integrated stream to viewing devices, such as individual notebooks, tablets and other personal devices or to display monitors of video conferencing systems in participating conference rooms. In addition to video streams, the system may register and receive data from other sources, such as smart watches, bracelets and other wearable devices owned by meeting participants. Additionally, the system may keep individual and team profiles of meeting participants, including their emotional types. For individual meeting participants sitting right before video cameras of their notebooks or other devices, capturing and pre-processing video streams may be straightforward as there is only one participant within the stream. The situation may be different for meeting rooms where one or several video conferencing cameras may be installed. In such case, the system may identify individual participants within a sequence of scenes captured by a video camera and track the participants during the meeting, including periods of absence and re-appearance of the participants when they are leaving and re-entering a meeting room and re-positioning of the participants when they are moving within the room. However, by Applicant’s claims have been distinguished from the combination of  Libin, Samadani, Diamant. Those references do not describe, teach or suggest the concepts of an electronic device that is in communication with one or more display devices, one or more input devices, and a set of one or more wearable audio output devices, the electronic device comprising: one or more processors; and memory storing one or more programs, wherein the one or more programs are configured to be executed by the one or more processors, the one or more programs including instructions for: displaying, via the one or more display devices, a user interface including respective dynamic visual representations of a plurality of participants in a communication session, including displaying, at a first location in the user interface, a first dynamic visual representation of a first participant and displaying, at a second location in the user interface, a second dynamic visual representation of a second participant different from the first participant; outputting, via the set of one or more wearable audio output devices, audio from the plurality of participants in the communication session, including: P51372US1/063266-7725-US9 Preliminary Amendmentoutputting first audio from the first participant, wherein the first audio is adjusted so as to maintain the first audio at a first simulated spatial location relative to a frame of reference of the communication session independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the first simulated spatial location corresponds to the first location of the first dynamic visual representation in the user interface; and outputting second audio from the second participant, wherein the second audio is adjusted, so as to maintain the second audio at a second simulated spatial location relative to the frame of reference independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the second simulated spatial location corresponds to the second location of the second dynamic visual representation in the user interface; receiving, via the one or more input devices, an input selecting the first dynamic visual representation of the first participant; in response to receiving the input selecting the first dynamic visual representation of the first participant: displaying the first dynamic visual representation of the first participant at a third location, different from the first location, in the user interface, and outputting the first audio from the first participant so as to position the first audio at a third simulated spatial location, relative to the frame of reference, that corresponds to the third location of the first dynamic visual representation in the user interface, wherein the third simulated spatial location is different from the first simulated spatial location; and displaying the second dynamic visual representation of the second participant at a fourth location in the user interface, and outputting the second audio from the second participant so as to position the second audio at a fourth simulated spatial location, relative to the frame of reference, that corresponds to the fourth location of the second dynamic visual representation in the user interface. Thus the prior art on record in correlation with amended claim language does not disclose claim as whole. For the foregoing reason, the claims are allowable over the cited prior art.

Regarding claim 29 in combination with other limitations of the claims the prior art of record fails to disclose or specifically suggest a computer readable storage medium storing one or more programs, the one or more programs comprising instructions that, when executed by an electronic device that is in communication with one or more display devices, one or more input devices, and a set of one or more wearable audio output devices, cause the electronic device to: P51372US1/063266-7725-US10 Preliminary Amendmentdisplay, via the one or more display devices, a user interface including respective dynamic visual representations of a plurality of participants in a communication session, including displaying, at a first location in the user interface, a first dynamic visual representation of a first participant and displaying, at a second location in the user interface, a second dynamic visual representation of a second participant different from the first participant; output, via the set of one or more wearable audio output devices, audio from the plurality of participants in the communication session, wherein outputting audio from the plurality of participants in the audio communication session includes: outputting first audio from the first participant, wherein the first audio is adjusted so as to maintain the first audio at a first simulated spatial location relative to a frame of reference of the communication session independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the first simulated spatial location corresponds to the first location of the first dynamic visual representation in the user interface; and outputting second audio from the second participant, wherein the second audio is adjusted, so as to maintain the second audio at a second simulated spatial location relative to the frame of reference independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the second simulated spatial location corresponds to the second location of the second dynamic visual representation in the user interface; receive, via the one or more input devices, an input selecting the first dynamic visual representation of the first participant; and in response to receiving the input selecting the first dynamic visual representation of the first participant: display the first dynamic visual representation of the first participant at a third location, different from the first location, in the user interface, and outputting the first audio from the first participant so as to position the first audio at a third simulated spatial location, relative to the frame of reference, that corresponds to the third location of the first dynamic visual representation in the user interface, wherein the third simulated spatial location is different from the first simulated spatial location; and P51372US1/063266-7725-US11 Preliminary Amendmentdisplay the second dynamic visual representation of the second participant at a fourth location in the user interface, and outputting the second audio from the second participant so as to position the second audio at a fourth simulated spatial location, relative to the frame of reference, that corresponds to the fourth location of the second dynamic visual representation in the user interface, when taking the claim as a whole.
The following is an examiner’s statement of reason for allowance: 
The claims filed on (11/16/2021) provide specific functions and features that are not addressed by the combined references on record  Libin (US 11,228,625), Samadani (US 2010/0309284) and Diamant (US 2019/0341050). Those references describe, teach and suggest a video conference configuration that connect many dozens or even hundreds of individual and group participants. Individual participants may be typically located at their desks in business or home offices, at internet cafes or ordinary cafes or at other individual conferencing friendly spaces before video cameras of their own or rented/borrowed notebook computers, tablets or other mobile devices. Group participants may join the meeting from one or several video-enabled conference rooms, dedicated open areas in offices and other group conferencing friendly spaces. Video streams from all individual or group participant may be transferred to video conferencing server(s) located in the cloud or on premises. The server may process and integrate received video streams and distribute the integrated stream to viewing devices, such as individual notebooks, tablets and other personal devices or to display monitors of video conferencing systems in participating conference rooms. In addition to video streams, the system may register and receive data from other sources, such as smart watches, bracelets and other wearable devices owned by meeting participants. Additionally, the system may keep individual and team profiles of meeting participants, including their emotional types. For individual meeting participants sitting right before video cameras of their notebooks or other devices, capturing and pre-processing video streams may be straightforward as there is only one participant within the stream. The situation may be different for meeting rooms where one or several video conferencing cameras may be installed. In such case, the system may identify individual participants within a sequence of scenes captured by a video camera and track the participants during the meeting, including periods of absence and re-appearance of the participants when they are leaving and re-entering a meeting room and re-positioning of the participants when they are moving within the room. However, by Applicant’s claims have been distinguished from the combination of  Libin, Samadani, Diamant. Those references do not describe, teach or suggest the concepts of at an electronic device that is in communication with one or more display devices, one or more input devices, and a set of one or more wearable audio output devices, cause the electronic device to: P51372US1/063266-7725-US10 Preliminary Amendmentdisplay, via the one or more display devices, a user interface including respective dynamic visual representations of a plurality of participants in a communication session, including displaying, at a first location in the user interface, a first dynamic visual representation of a first participant and displaying, at a second location in the user interface, a second dynamic visual representation of a second participant different from the first participant; output, via the set of one or more wearable audio output devices, audio from the plurality of participants in the communication session, wherein outputting audio from the plurality of participants in the audio communication session includes: outputting first audio from the first participant, wherein the first audio is adjusted so as to maintain the first audio at a first simulated spatial location relative to a frame of reference of the communication session independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the first simulated spatial location corresponds to the first location of the first dynamic visual representation in the user interface; and outputting second audio from the second participant, wherein the second audio is adjusted, so as to maintain the second audio at a second simulated spatial location relative to the frame of reference independently of a position of the set of one or more wearable audio output devices relative to the frame of reference, wherein the second simulated spatial location corresponds to the second location of the second dynamic visual representation in the user interface; receive, via the one or more input devices, an input selecting the first dynamic visual representation of the first participant; and in response to receiving the input selecting the first dynamic visual representation of the first participant: display the first dynamic visual representation of the first participant at a third location, different from the first location, in the user interface, and outputting the first audio from the first participant so as to position the first audio at a third simulated spatial location, relative to the frame of reference, that corresponds to the third location of the first dynamic visual representation in the user interface, wherein the third simulated spatial location is different from the first simulated spatial location; and P51372US1/063266-7725-US11 Preliminary Amendmentdisplay the second dynamic visual representation of the second participant at a fourth location in the user interface, and outputting the second audio from the second participant so as to position the second audio at a fourth simulated spatial location, relative to the frame of reference, that corresponds to the fourth location of the second dynamic visual representation in the user interface. Thus the prior art on record in correlation with amended claim language does not disclose claim as whole. For the foregoing reason, the claims are allowable over the cited prior art.

 

      Citation of Prior Art
3.	In view of (US 11,228,625), Libin discloses a system that involves connecting a current active speaker and non-speaking participants of a video conference to a video conferencing server that sends and receives video streams between the participants and the speaker. A health index of the conference is determined based on reactions of the participants, interactions among the current speaker and the active participants, and on non-conventional roles assigned to the participants. A number of recommendations are provided to a subset of participants based on the health index for modifying the conference. The recommendations are used for asking a participant to provide an opinion on a subject matter. This provides connecting participants of the videoconference to a video conferencing server that sends and receives video streams between the participants, where non-verbal cues of the participants are detected, and thus enables to provide meeting participants with a dynamic assessment of meeting efficiency and with recommended ways to improve the meeting (see fig. 6, col. 10, line 35-col. 11, line 26). 

In view of (US 2010/0309284), Samadani discloses a system that involves dynamic visual representation generating method involves rendering audio-visual representation of one or more participants at each site taking part in video conference using a computing device then receiving a saliency signal identifying the degree of current and/or recent activity of one or more participants at each site. Image processing based on saliency signal is applied to elicit visual copout of active participants associated with each site while maintaining fixed scales and borders of the visual representation of one or more participants at each site. Provides copout changes in display which immediately attract viewer's attention without requiring scanning or searching. Generates saliency signals which avoid distracting, spurious salient visual effects (see fig. 8, ¶0034-0039).

In view of (US 2019/0341050), Diamant discloses a system that involves a he method involves operating a face identification machine to recognize a face of a first remote conference participant in a digital video. A speech recognition machine (130) is operated to translate a first computer-readable audio signal to a first text and a second computer-readable audio signal to a second text. An attribution machine is operated to attribute the first text to the first remote conference participant recognized by the face identification machine and attribute the second text to a second remote conference participant. A transcription machine is operated to automatically create a transcript of the conference, where the transcript includes the first text attributed to the first remote conference participant and the second text attributed to the second remote conference participant. This enables increasing signatures quality and reducing enrollment effort for conference participant to register a signature. The method enables utilizing a computerized intelligent assistant to record additional face images so as to automatically improve signature for conference participant based on face images (see fig. 4, ¶ 0124).

The references cited above neither in combination or alone do not disclose the functionality of the claimed limitations when taking the claimed invention as a whole.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

	Conclusion

4.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to ASSAD MOHAMMED whose telephone number is (571)270-7253.  The examiner can normally be reached on 9:00AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on 571-272-7503.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ASSAD MOHAMMED/Examiner, Art Unit 2651  

/DUC NGUYEN/Supervisory Patent Examiner, Art Unit 2651