DETAILED ACTION
This non-final action is in response to application filed on 04/07/2021. In this application, claims 1-20 are pending, with claims 1, 9 and 16 being independent. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 63/008,275 filed on April 10, 2020.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/07/2021 and 11/05/2021 in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 9 and 16 are rejected under 35 U.S.C. 102(a)( 2) as being anticipated by Stokking et al. (US 2022/0279254, Priority Date Jul. 17, 2019).
As per claim 1, Stokking discloses an apparatus (Stokking Fig. 11, orchestration node ON) for providing volumetric conversational services (Stokking Para. [00224], the orchestration node, which may be a conferencing/application server, may instruct the UE to use the edge node; Stokking Para. [0231], the Application Server may be aware of three user devices UE1, UE2 and U3 wanting to have a VR conference session [volumetric conversational services], e.g., through a process through a website or the like. As discussed above, the Application Server may be aware of the point of attachment of the various UEs, and thus can assign the appropriate edge servers to each UE), the apparatus (Stokking Fig. 11, orchestration node ON) comprising:
a communication interface (Stokking Fig. 15, Network interface 420 and Para. [0254], FIG. 15 shows a processor system 400 embodying entities as described elsewhere in this specification, such as an edge node, a combiner and orchestration node) configured to receive a signaling message from each of a plurality of user equipment (UEs) (Stokking Para. [0214], the orchestration node ON is shown to exchange session signaling information 312, 314 with a transmitter device UE1 and a receiver device UE4), the signaling messages indicating a capability of the plurality of UEs (Stokking Para. [0215], Such instructions may be part of a signaling between the orchestration node ON and the transmitter device UE1 via which the capabilities of the transmitter device UE1 may be determined, e.g., in terms of computation resources, battery level, etc.), respectively, to process participant volumetric content (Stokking Para. [0215], The orchestration node ON may, based on this information, decide between having the transmitter device UE1 process and encode the video, or having the edge node EN1 process and encode the video using a tile-based video streaming codec); and
a processor operably coupled to the communication interface (Stokking Fig. 15, Processor Subsystem 440 coupled to Network interface 420 and Para. [0254], FIG. 15 shows a processor system 400 embodying entities as described elsewhere in this specification, such as an edge node, a combiner and orchestration node), wherein the processor is configured to:
identify a conference associated with the plurality of UEs for which volumetric processing is requested (Stokking Para. [0231], the Application Server may be aware of three user devices UE1, UE2 and EU3 wanting to have a VR conference session, e.g., through a process through a website or the like),
provision a plurality of media resource functions in edge application servers of edge data networks (Stokking Fig. 7, Edge nodes EN1-EN3 and Para. [0050], an edge node may be an edge node of a 5G or later generation telecommunication network, or any other type of edge computing system, e.g., located at an edge between the telecommunication network and the access network via which the transmitter device is connected to the telecommunication network) for processing the participant volumetric content from the plurality of UEs (Stokking Para. [0215], The orchestration node ON may, based on this information, decide between having the transmitter device UE1 process and encode the video, or having the edge node EN1 [media resource function] process and encode the video using a tile-based video streaming codec),
assign one or more of the plurality of UEs to a respective media resource function of the plurality of media resource functions (Stokking Para. [0231], the Application Server may be aware of the point of attachment of the various UEs, and thus can assign the appropriate edge servers [media resource functions] to each UE; Stokking Para. [0207], an edge node EN4 is shown to implement the combine function 150. The edge node EN4 [respective media resource function] may for example be an edge node assigned to the receiver device UE4),
instruct the participant volumetric content from the plurality of UEs to be sent to the plurality of media resource functions (Stokking Fig. 11 and Para. [0215], instructions 90 may be sent to the transmitter device UE1 containing the network address (e.g., IP address, port number) of the edge node EN1 to which the transmitter device UE1 is to send its video after capture 100; Stokking Fig. 7 and Para. [0205], each transmitter device UE1-UE3 is shown to perform a capture 100, after which the captured video is sent directly to a respective edge node EN1-EN3), and
instruct conference volumetric content converted by the respective media resource functions to be sent to the one or more UEs for the conference (Stokking Para. [0203], As can be seen in FIG. 6, the captured video frame may include the participant wearing an HMD, which may then be removed, along with the background of the participants, by video processing 130 in respective edge nodes. The processed videos may then be tiled and encoded 140 by the edge nodes, and sent as separate tile-based video streams 50-53 to a combiner which combines 150 the tiles in the compressed domain to obtain a combined tile-based video stream 60 [conference volumetric content], which may then be transmitted by the combiner to a receiver device where it may be decoded 170 and split to obtain separate videos of the participants, which may finally be rendered 180, e.g., as video avatars in a computer-based environment; Stokking Para. [0207], an edge node EN4 is shown to implement the combine function 150. The edge node EN4 [respective media resource function] may for example be an edge node assigned to the receiver device UE4; Stokking Para. [0189], each transmitter device may also be a receiver device and vice versa, in that each device may receive the videos of the other devices and transmit its own video to the other devices).

As per claim 9, Stokking discloses a media resource function in an edge network (Stokking Fig. 7&11, Edge nodes EN1-EN4 and Para. [0050], an edge node may be an edge node of a 5G or later generation telecommunication network, or any other type of edge computing system, e.g., located at an edge between the telecommunication network and the access network via which the transmitter device is connected to the telecommunication network), the media resource function comprising:
a communication interface (Stokking Fig. 15, Network interface 420 and Para. [0254], FIG. 15 shows a processor system 400 embodying entities as described elsewhere in this specification, such as an edge node, a combiner and orchestration node) configured to receive a signaling message from an apparatus in a core network to provision the media resource function for processing volumetric content from one or more user equipments (UEs) (Stokking Para. [0216], Additionally or alternatively, the orchestration node ON [apparatus] may be configured to send instructions 91 to the edge node EN1 which may for example identify one or more of: the transmitter device UE1, which video stream to expect, how to process this video stream, how to tile the processed video and encode the processed video, and where to send the tile-based video stream afterwards, e.g., in the form of a network address (e.g., IP address, port number of edge node EN4) and streaming settings), the signaling message indicating a capability of the one or more UEs assigned to the media resource function, respectively, to process participant volumetric content (Stokking Para. [0215-0216], Such instructions may be part of a signaling between the orchestration node ON and the transmitter device UE1 via which the capabilities of the transmitter device UE1 may be determined, e.g., in terms of computation resources, battery level, etc. The orchestration node ON may, based on this information, decide between having the transmitter device UE1 process and encode the video, or having the edge node EN1 process and encode the video using a tile-based video streaming codec. Additionally or alternatively, the orchestration node ON may be configured to send instructions 91 to the edge node EN1 which may for example identify one or more of: the transmitter device UE1, which video stream to expect, how to process this video stream, how to tile the processed video and encode the processed video, and where to send the tile-based video stream afterwards, e.g., in the form of a network address (e.g., IP address, port number of edge node EN4) and streaming settings); and
a processor operably coupled to the communication interface (Stokking Fig. 15, Processor Subsystem 440 coupled to Network interface 420 and Para. [0254], FIG. 15 shows a processor system 400 embodying entities as described elsewhere in this specification, such as an edge node, a combiner and orchestration node), wherein the processor is configured to:
identify a conference associated with the one or more UEs for which volumetric processing is requested (Stokking Para. [0216], the orchestration node ON may be configured to send instructions 91 to the edge node EN1 which may for example identify one or more of: the transmitter device UE1, which video stream to expect, how to process this video stream, how to tile the processed video and encode the processed video, and where to send the tile-based video stream afterwards, e.g., in the form of a network address (e.g., IP address, port number of edge node EN4) and streaming settings; Stokking Para. [0231], the Application Server may be aware of three user devices UE1, UE2 and EU3 wanting to have a VR conference session, e.g., through a process through a website or the like),
receive the participant volumetric content from the one or more UEs assigned to the media resource function (Stokking Fig. 11 and Para. [0215], instructions 90 may be sent to the transmitter device UE1 containing the network address (e.g., IP address, port number) of the edge node EN1 to which the transmitter device UE1 is to send its video after capture 100; Stokking Fig. 7 and Para. [0205], each transmitter device UE1-UE3 is shown to perform a capture 100, after which the captured video is sent directly to a respective edge node EN1-EN3),
mix the participant volumetric content with other participant volumetric content into conference volumetric content (Stokking Fig. 9 and Para. [0212], The inputs of various other users A, B, C may be first combined into a single tile-based video stream, while later the self-view D may be added. This may be the case when a network node, e.g. an edge node, is generating a self-view tile-based video stream from a captured self-view video which is transmitted by the transmitter device to the edge node and then transmitted back from the edge node to the transmitter device; Stokking Para. [0189], each transmitter device may also be a receiver device and vice versa, in that each device may receive the videos of the other devices and transmit its own video to the other devices), and
transmit the conference volumetric content to the one or more UEs for the conference (Stokking Fig. 9 and Para. [0212], The inputs of various other users A, B, C may be first combined into a single tile-based video stream, while later the self-view D may be added. This may be the case when a network node, e.g. an edge node, is generating a self-view tile-based video stream from a captured self-view video which is transmitted by the transmitter device to the edge node and then transmitted back from the edge node to the transmitter device; Stokking Para. [0189], each transmitter device may also be a receiver device and vice versa, in that each device may receive the videos of the other devices and transmit its own video to the other devices).

As per claim 16, Stokking disclose a user equipment (UE) (Stokking Fig. 8, UE1, UE4) for receiving volumetric conversational services (Stokking Para. [0231], the Application Server may be aware of three user devices UE1, UE2 and U3 wanting to have a VR conference session [volumetric conversational services], e.g., through a process through a website or the like. As discussed above, the Application Server may be aware of the point of attachment of the various UEs, and thus can assign the appropriate edge servers to each UE; Stokking Para. [0189], each transmitter device may also be a receiver device and vice versa, in that each device may receive the videos of the other devices and transmit its own video to the other devices), the UE comprising:
a communication interface (Stokking Fig. 15, Network interface 420 and Para. [0254], FIG. 15 shows a processor system 400 embodying entities as described elsewhere in this specification, such as an edge node, a combiner and orchestration node, a transmitter device, a receiver device or in general a UE); and
a processor operably coupled to the communication interface (Stokking Fig. 15, Processor Subsystem 440 coupled to Network interface 420 and Para. [0254], FIG. 15 shows a processor system 400 embodying entities as described elsewhere in this specification, such as an edge node, a combiner and orchestration node, a transmitter device, a receiver device or in general a UE), the processor configured to:
transmit, to a network configuration server, a signaling message indicating capability of the UE (Stokking Para. [0215], Such instructions may be part of a signaling between the orchestration node ON [network configuration server] and the transmitter device UE1 via which the capabilities of the transmitter device UE1 may be determined, e.g., in terms of computation resources, battery level, etc.) to process participant volumetric content for a conference with a plurality of UEs (Stokking Para. [0215], The orchestration node ON may, based on this information, decide between having the transmitter device UE1 process and encode the video, or having the edge node EN1 process and encode the video using a tile-based video streaming codec);
receive, based on the signaling message, an assignment to a media resource function in an edge application server data network provisioned (Stokking Para. [0215], The orchestration node ON may, based on this information, decide between having the transmitter device UE1 process and encode the video, or having the edge node EN1 process and encode the video using a tile-based video streaming codec; Stokking Fig. 7, Edge nodes EN1-EN3 and Para. [0050], an edge node may be an edge node of a 5G or later generation telecommunication network, or any other type of edge computing system, e.g., located at an edge between the telecommunication network and the access network via which the transmitter device is connected to the telecommunication network) for processing the participant volumetric content for the UE (Stokking Para. [0215], The orchestration node ON may, based on this information, decide between having the transmitter device UE1 process and encode the video, or having the edge node EN1 [media resource function] process and encode the video using a tile-based video streaming codec);
transmit the participant volumetric content to the media resource function (Stokking Fig. 9 and Para. [0212], The inputs of various other users A, B, C may be first combined into a single tile-based video stream, while later the self-view D may be added. This may be the case when a network node, e.g. an edge node, is generating a self-view tile-based video stream from a captured self-view video which is transmitted by the transmitter device to the edge node and then transmitted back from the edge node to the transmitter device; Stokking Para. [0189], each transmitter device may also be a receiver device and vice versa, in that each device may receive the videos of the other devices and transmit its own video to the other devices)
receive conference volumetric content converted from other participant volumetric content corresponding to one or more UEs for the conference by the media resource function (Stokking Fig. 9 and Para. [0212], The inputs of various other users A, B, C may be first combined into a single tile-based video stream, while later the self-view D may be added. This may be the case when a network node, e.g. an edge node, is generating a self-view tile-based video stream from a captured self-view video which is transmitted by the transmitter device to the edge node and then transmitted back from the edge node to the transmitter device; Stokking Para. [0189], each transmitter device may also be a receiver device and vice versa, in that each device may receive the videos of the other devices and transmit its own video to the other devices); and
render the converted conference volumetric content for the conference (Stokking Para. [0203], The processed videos may then be tiled and encoded 140 by the edge nodes, and sent as separate tile-based video streams 50-53 to a combiner which combines 150 the tiles in the compressed domain to obtain a combined tile-based video stream 60, which may then be transmitted by the combiner to a receiver device where it may be decoded 170 and split to obtain separate videos of the participants, which may finally be rendered 180, e.g., as video avatars in a computer-based environment).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 2-3, 7, 11-12, 17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Stokking et al. (US 2022/0279254, Priority Date Jul. 17, 2019), in view of Barzuza et al. (US 2015/0215581, Pub. Date Jul. 30, 2015).
As per claim 2, Stokking discloses the apparatus according to claim 1, as set forth above, Stokking also disclose the processor (Stokking Fig. 15, Processor Subsystem 440 coupled to Network interface 420 and Para. [0254], FIG. 15 shows a processor system 400 embodying entities as described elsewhere in this specification, such as an edge node, a combiner and orchestration node).
Stokking does not explicitly disclose wherein:
the conference is initiated with an empty scene defined by an empty scene description, 
the processor is further configured to:
provide the empty scene description to participant media resource functions for the participant media resource functions to generate a three-dimensional object of a participant into the empty scene using the empty scene description.
Barzuza teaches:
the conference is initiated with an empty scene defined by an empty scene description (Barzuza Para. [0030], if the meeting location includes a conference table, then conferencing system 101 may determine the position of participant 122 to be an empty seat at the conference table. The presence of the empty seat may be determined based on video captured from HMD 103 (e.g. analyzing the video to determine the presence of a person at each position), based on information [empty scene description] manually entered by a participant or an administrator when deploying the system for a conference session),  
provide the empty scene description to system for the system (Barzuza Para. [0030], if the meeting location includes a conference table, then conferencing system 101 may determine the position of participant 122 to be an empty seat at the conference table. The presence of the empty seat may be determined based on video captured from HMD 103 (e.g. analyzing the video to determine the presence of a person at each position), based on information [empty scene description] manually entered by a participant or an administrator when deploying the system for a conference session) to generate a three-dimensional object of a participant into the empty scene using the empty scene description (Barzuza Para. [0030], conferencing system 101 generates AR video that makes participant 122 appear to participant 123 at the first position when viewed through AR HMD 103 (step 304); Barzuza Para. [0056], AR video is then generated for each AR HMD 421-423 and 431-432 (step 602). Each AR HMD's video includes representations of the other participants not physically located in the same room. In some examples, each video may include all remote participants currently within view at their current positions such that each respective AR HMD simply displays the video; Barzuza Para. [0067], Even though the position of participant 815 is physically empty, the AR video presented to participant 811 makes it seem as though participant 815 is at that position).
It would been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to further modify Stokking in view of Barzuza for the conference is initiated with an empty scene defined by an empty scene description, the processor is further configured to: provide the empty scene description to participant media resource functions for the participant media resource functions to generate a three-dimensional object of a participant into the empty scene using the empty scene description.
One of ordinary skill in the art would have been motived because it offers the advantage of providing a conferencing experience to conferencing participants at different locations as though the participants are at the same location (Barzuza Para. [0026]).

As per claim 3, Stokking-Barzuza discloses the apparatus according to claim 1, as set forth above, Stokking-Barzuza also discloses wherein the processor is further configured to direct the participant media resource functions to transmit data for the three-dimensional object to at least one group media resource function (Stokking Fig. 11 and Para. [0216], the orchestration node ON may be configured to send instructions 91 to the edge node EN1 which may for example identify one or more of: the transmitter device UE1, which video stream to expect, how to process this video stream, how to tile the processed video and encode the processed video, and where to send the tile-based video stream afterwards, e.g., in the form of a network address (e.g., IP address, port number of edge node EN4)) to fuse into the empty scene with three-dimensional objects of other participants (Barzuza Para. [0030], if the meeting location includes a conference table, then conferencing system 101 may determine the position of participant 122 to be an empty seat at the conference table. The presence of the empty seat may be determined based on video captured from HMD 103 (e.g. analyzing the video to determine the presence of a person at each position), based on information [empty scene description] manually entered by a participant or an administrator when deploying the system for a conference session; Barzuza Para. [0030], conferencing system 101 generates AR video that makes participant 122 appear to participant 123 at the first position when viewed through AR HMD 103 (step 304).
Similar rationale in claim 2 is applied.

As per claim 7, Stokking-Barzuza discloses the apparatus according to claim 2, as set forth above, Stokking-Barzuza also discloses wherein to provision the plurality of media resource functions, the processor is further configured to:
provision participant media resource functions (Stokking Fig. 7&8, Edge nodes EN1-EN3; Stokking Para. [0231], the Application Server may be aware of the point of attachment of the various UEs, and thus can assign the appropriate edge servers to each UE; Stokking Para. [0189], each transmitter device may also be a receiver device and vice versa, in that each device may receive the videos of the other devices and transmit its own video to the other devices) where each is configured to process participant volumetric content for an assigned UE into the empty scene (Barzuza Para. [0030], conferencing system 101 generates AR video that makes participant 122 appear to participant 123 at the first position when viewed through AR HMD 103 (step 304); Barzuza Para. [0056], AR video is then generated for each AR HMD 421-423 and 431-432 (step 602). Each AR HMD's video includes representations of the other participants not physically located in the same room. In some examples, each video may include all remote participants currently within view at their current positions such that each respective AR HMD simply displays the video; Barzuza Para. [0067], Even though the position of participant 815 is physically empty, the AR video presented to participant 811 makes it seem as though participant 815 is at that position; Stokking Fig. 7&11, Edge nodes EN1-EN3), and
provision at least one group media resource function (Stokking Fig. 7&8, EN4; Stokking Para. [0231], the Application Server may be aware of the point of attachment of the various UEs, and thus can assign the appropriate edge servers to each UE; Stokking Para. [0189], each transmitter device may also be a receiver device and vice versa, in that each device may receive the videos of the other devices and transmit its own video to the other devices) configured to mix each participant volumetric content into the conference volumetric content for the conference (Barzuza Para. [0030], conferencing system 101 generates AR video that makes participant 122 appear to participant 123 at the first position when viewed through AR HMD 103 (step 304); Barzuza Para. [0056], AR video is then generated for each AR HMD 421-423 and 431-432 (step 602). Each AR HMD's video includes representations of the other participants not physically located in the same room. In some examples, each video may include all remote participants currently within view at their current positions such that each respective AR HMD simply displays the video; Barzuza Para. [0067], Even though the position of participant 815 is physically empty, the AR video presented to participant 811 makes it seem as though participant 815 is at that position).
Similar rationale in claim 2 is applied.

Claims 11-12 reciting similar subject matters to those recited in the apparatus claims 2-3 respectively, and are rejected under similar rationales.

As per claim 17, Stokking discloses the apparatus according to claim 16, as set forth above, Stokking does not explicitly discloses wherein the conference volumetric content includes the participant volumetric content and the other participant volumetric content for the one or more UEs mixed with an empty scene defined by an empty scene description.
Barzuza discloses:
the conference volumetric content includes the participant volumetric content and the other participant volumetric content (Barzuza Para. [0030], conferencing system 101 generates AR video that makes participant 122 appear to participant 123 at the first position when viewed through AR HMD 103 (step 304); Barzuza Para. [0056], AR video is then generated for each AR HMD 421-423 and 431-432 (step 602). Each AR HMD's video includes representations of the other participants not physically located in the same room. In some examples, each video may include all remote participants currently within view at their current positions such that each respective AR HMD simply displays the video; Barzuza Para. [0067], Even though the position of participant 815 is physically empty, the AR video presented to participant 811 makes it seem as though participant 815 is at that position) for the one or more UEs mixed with an empty scene defined by an empty scene description (Barzuza Para. [0030], if the meeting location includes a conference table, then conferencing system 101 may determine the position of participant 122 to be an empty seat at the conference table. The presence of the empty seat may be determined based on video captured from HMD 103 (e.g. analyzing the video to determine the presence of a person at each position), based on information [empty scene description] manually entered by a participant or an administrator when deploying the system for a conference session; Barzuza Para. [0067], Even though the position of participant 815 is physically empty, the AR video presented to participant 811 makes it seem as though participant 815 is at that position).
It would been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to further modify Stokking in view of Barzuza for wherein the conference volumetric content includes the participant volumetric content and the other participant volumetric content for the one or more UEs mixed with an empty scene defined by an empty scene description.
One of ordinary skill in the art would have been motived because it offers the advantage of providing a conferencing experience to conferencing participants at different locations as though the participants are at the same location (Barzuza Para. [0026]).

As per claim 19, Stokking-Barzuza discloses the apparatus according to claim 17, as set forth above, Stokking-Barzuza also discloses wherein:
the assignment to a media resource function includes an assignment to a participant media resource function (Stokking Fig. 7&8, Edge nodes EN1-EN3; Stokking Para. [0231], the Application Server may be aware of the point of attachment of the various UEs, and thus can assign the appropriate edge servers to each UE; Stokking Para. [0189], each transmitter device may also be a receiver device and vice versa, in that each device may receive the videos of the other devices and transmit its own video to the other devices) and an assignment to a group media resource function (Stokking Fig. 7&8, EN4; Stokking Para. [0231], the Application Server may be aware of the point of attachment of the various UEs, and thus can assign the appropriate edge servers to each UE; Stokking Para. [0189], each transmitter device may also be a receiver device and vice versa, in that each device may receive the videos of the other devices and transmit its own video to the other devices), 
to transmit the participant volumetric content, the processor is further configured to transmit the participant volumetric content to an assigned participant media resource function configured (Stokking Fig. 9 and Para. [0212], The inputs of various other users A, B, C may be first combined into a single tile-based video stream, while later the self-view D may be added. This may be the case when a network node, e.g. an edge node, is generating a self-view tile-based video stream from a captured self-view video which is transmitted by the transmitter device to the edge node and then transmitted back from the edge node to the transmitter device; Stokking Para. [0189], each transmitter device may also be a receiver device and vice versa, in that each device may receive the videos of the other devices and transmit its own video to the other devices) to process the participant volumetric content for the UE into the empty scene (Barzuza Para. [0030], conferencing system 101 generates AR video that makes participant 122 appear to participant 123 at the first position when viewed through AR HMD 103 (step 304); Barzuza Para. [0056], AR video is then generated for each AR HMD 421-423 and 431-432 (step 602). Each AR HMD's video includes representations of the other participants not physically located in the same room. In some examples, each video may include all remote participants currently within view at their current positions such that each respective AR HMD simply displays the video; Barzuza Para. [0067], Even though the position of participant 815 is physically empty, the AR video presented to participant 811 makes it seem as though participant 815 is at that position, and
to receive the conference volumetric content, the processor is further configured to receive the conference volumetric content mixed, by the group media resource function, with the process participant volumetric content for the UE with other participant volumetric content for other UEs for the conference (Stokking Fig. 9 and Para. [0212], The inputs of various other users A, B, C may be first combined into a single tile-based video stream, while later the self-view D may be added. This may be the case when a network node, e.g. an edge node, is generating a self-view tile-based video stream from a captured self-view video which is transmitted by the transmitter device to the edge node and then transmitted back from the edge node to the transmitter device; Stokking Para. [0189], each transmitter device may also be a receiver device and vice versa, in that each device may receive the videos of the other devices and transmit its own video to the other devices; Stokking Fig. 7&8, EN4; Stokking Para. [0231], the Application Server may be aware of the point of attachment of the various UEs, and thus can assign the appropriate edge servers to each UE; Stokking Para. [0189], each transmitter device may also be a receiver device and vice versa, in that each device may receive the videos of the other devices and transmit its own video to the other devices).
Similar rationale in claim 17 is applied.

As per claim 20, Stokking-Barzuza discloses the method according to claim 19, as set forth above, Stokking does not explicitly disclose
wherein the conference volumetric content received by the UE indicates a location of the UE for the conference in relation to the other participant volumetric content and excludes the participant volumetric content related to the UE.
Barzuza teaches:
 the conference volumetric content received by the UE indicates a location of the UE for the conference in relation to the other participant volumetric content (Barzuza Para. [0068], As views 901 and 902 change (e.g. as participants 811 and 815 tilt or pan their heads), conferencing system 401 tracks those view changes and adjusts the AR video for each participant accordingly to ensure the remote participants continue to be presented in their correct positions) and excludes the participant volumetric content related to the UE (Barzuza Para. [0067], participants 811-813 at room 402 are able to see video of participants 814-815 at positions in room 402 as though participants 814-815 are located in room 402. Likewise, participants 814-815at room 403 are able to see video of participants 811-813 at positions in room 403 as though participants 811-813 are located in room 403. In a particular example, view 901 is what participant 811 sees through their AR HMD 421. [Participant 811 [views 901] can see video of participants 814-815 through his/her AR HMD 421 [not including video of himself/herself]).
It would been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to further modify Stokking in view of Barzuza for wherein the conference volumetric content received by the UE indicates a location of the UE for the conference in relation to the other participant volumetric content and excludes the participant volumetric content related to the UE.
One of ordinary skill in the art would have been motived because it offers the advantage of providing a conferencing experience to conferencing participants at different locations as though the participants are at the same location (Barzuza Para. [0026]).

Claims 4, 13 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Stokking et al. (US 2022/0279254, Priority Date Jul. 17, 2019), in view of Barzuza et al. (US 2015/0215581, Pub. Date Jul. 30, 2015), and in view of Stokking et al. (US 2021/0209855, Filed May 23, 2019; hereinafter Stokking II).
As per claim 4, Stokking-Barzuza discloses the apparatus according to claim 2, as set forth above, Stokking does not explicitly disclose wherein the empty scene description includes total participants or conference size, conference room dimensions, scene background information, participant locations, and participant orientations.
Stokking II teaches:
scene description (Stokking II Para. [0223], the processor system 530 may (semi)automatically generate the metadata using an image analysis and/or computer vision technique) includes total participants or conference size (Stokking II Para. [0036], the object is a room; Stokking II Para. [0034], the metadata may provide a more explicit indication of the geometry of the object; Stokking II Para. [0035], the term 'geometry' may at least refer to an approximate shape of the object and may in some embodiments also include an approximate size of the object), conference room dimensions (Stokking II Para. [0173-0177], A room model may be described as metadata, e.g. as defined in [1]. In a specific example, the metadata may define the following parameters Dimensions width in meters, e.g., width=4; height in meters, e.g., height=2.5; depth in meters, e.g., depth=5), scene background information (Stokking II Para. [0178-0181], Materials left wall material as a string, e.g., left=brick-painted; right wall material as a string, e.g., right=curtain-heavy; front wall material as a string, e.g., front=brick-bare), participant locations, and participant orientations (Stokking II Para. [0185-0186], The metadata may also specify the camera position within the room, thereby effectively indicating how the room is positioned in the image. For that purpose, the following parameters as defined by [1] may for example be used: listenerPosition (x, y, z) where x-axis=left/right (width), y-axis=forward/backward (depth) and z-axis=up/down (height) [participant orientations] with respect to the room's center. A specific example is listenerPosition [participant locations] =(0, 1, 0.5)).
It would been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to further modify Stokking in view of Stokking II for the empty scene description includes total participants or conference size, conference room dimensions, scene background information, participant locations, and participant orientations.
One of ordinary skill in the art would have been motived because it offers the advantage of allowing the system to reconstruct a room according to parameters (see Stokking II Para. [0185-0186]).

Claims 13 and 18 reciting similar subject matters to those recited in the apparatus claim 4, and are rejected under similar rationales.

Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Stokking et al. (US 2022/0279254, Priority Date Jul. 17, 2019), in view of Barzuza et al. (US 2015/0215581, Pub. Date Jul. 30, 2015), and in view of Brimhall et al. (US 10,665,037, Filed Oct. 29, 2019).
As per claim 5, Stokking-Barzuza discloses the apparatus according to claim 3, as set forth above, Stokking does not explicitly disclose the processor is further configured to transmit a scene template to the participant media resource functions for standardizing construction of the three-dimensional object.
Brimhall teaches:
transmit a scene template to participant media resource function for standardizing construction of the three-dimensional object (Brimhall col. 6 lines 61-65, At the developer system 106, a developer can use the user interface 104 to select and upload a 3D model to the server 102 [participant media resource function], such in the present example a 3D image file 108 [scene template]; Brimhall col. 7 lines 5-7, At the server 102, the image file 108 is processed by an encoder 110. The encoder 110 is configured to create a number of different augmented reality model or scene files. In particular, the encoder 110 processes the 3D image file 108 to identify various surfaces, textures, shading, geometry, or other characteristics of an object represented by the 3D image file 108).
It would been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to further modify Stokking in view of Brimhall for the processor is further configured to transmit a scene template to the participant media resource functions for standardizing construction of the three-dimensional object.
One of ordinary skill in the art would have been motived because it offers the advantage of creating augmented reality model or scene files for any augmented reality platforms (Brimhall col. 7 lines 18-19).

As per claim 14, Stokking-Barzuza discloses the apparatus according to claim 11, as set forth above, Stokking does not explicitly disclose wherein the processor is further configured to receive, from the apparatus in the core network, a scene template for standardizing construction of the three-dimensional object.
Brimhall teaches:
receive, from the apparatus in the core network, a scene template for standardizing construction of the three-dimensional object (Brimhall col. 6 lines 61-65, At the developer system 106, a developer can use the user interface 104 to select and upload a 3D model to the server 102, such in the present example a 3D image file 108 [scene template]; Brimhall col. 7 lines 5-7, At the server 102, the image file 108 is processed by an encoder 110. The encoder 110 is configured to create a number of different augmented reality model or scene files. In particular, the encoder 110 processes the 3D image file 108 to identify various surfaces, textures, shading, geometry, or other characteristics of an object represented by the 3D image file 108).
It would been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to further modify Stokking in view of Brimhall for wherein the processor is further configured to receive, from the apparatus in the core network, a scene template for standardizing construction of the three-dimensional object.
One of ordinary skill in the art would have been motived because it offers the advantage of creating augmented reality model or scene files for any augmented reality platforms (Brimhall col. 7 lines 18-19).

Claims 6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Stokking et al. (US 2022/0279254, Priority Date Jul. 17, 2019), in view of Barzuza et al. (US 2015/0215581, Pub. Date Jul. 30, 2015), in view of Brimhall et al. (US 10,665,037, Filed Oct. 29, 2019), and further in view of Stokking et al. (US 2019/0313160, Pub. Date Oct. 10, 2019, hereinafter Stokking III)
As per claim 6, Stokking-Barzuza-Brimhall discloses the apparatus according to claim 5, as set forth above, Stokking does not explicitly disclose wherein the scene template includes: 
a stream description providing a description of a volumetric content stream from a participant, 
a partial scene description providing a description of multiple volumetric contents that a media resource function is responsible for reconstructing, and 
a receiver list including information about endpoints to send processed participant volumetric content.
Brimhall teaches:
the scene template (Brimhall col. 6 lines 61-65, At the developer system 106, a developer can use the user interface 104 to select and upload a 3D model to the server 102, such in the present example a 3D image file 108 [scene template]) includes: a partial scene description providing a description of multiple volumetric contents that a media resource function is responsible for reconstructing (Brimhall col. 7 lines 5-19, At the server 102, the image file 108 is processed by an encoder 110. The encoder 110 is configured to create a number of different augmented reality model or scene files. In particular, the encoder 110 processes the 3D image file 108 to identify various surfaces, textures, shading, geometry, or other characteristics of an object [description] represented by the 3D image file 108 … The encoder 110 is then configured to use this identified 15 information to create one or more augmented reality model or scene files. Typically, the encoder 110 is configured to create augmented reality model or scene files for any augmented reality platforms about which the server 102 is aware).
It would been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to further modify Stokking in view of Brimhall for a partial scene description providing a description of multiple volumetric contents that a media resource function is responsible for reconstructing.
One of ordinary skill in the art would have been motived because it offers the advantage of creating augmented reality model or scene files for any augmented reality platforms (Brimhall col. 7 lines 18-19).
Stokking-Brimhall does not explicitly disclose:
a stream description providing a description of a volumetric content stream from a participant, 
a receiver list including information about endpoints to send processed participant volumetric content.
Stokking III teaches:
a stream description providing a description of a volumetric content stream from a participant (Stokking Para. [0036], the forwarding instructions may be configured to instruct the one or more forwarding nodes to selectively forward one or more of the plurality of streams to a Virtual Reality [VR] rendering device; Stokking Para. [0045], a signaling interface configured to receive forwarding instructions to change one or more forwarding rules of the forwarding node so as to selectively forward one or more of the plurality of streams to a VR rendering device), 
a receiver list including information about endpoints to send processed participant volumetric content (Stokking Fig. 7 and Para. [0179], in the destination metadata, the network address may be listed to which the stream is currently being sent, e.g., an address of an edge node; Stokking Para [0042], the destination metadata may be indicative of a network destination of a streaming of each of the plurality of streams within a network).
It would been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to further modify Stokking in view of Stokking III for wherein the scene template includes: a stream description providing a description of a volumetric content stream from a participant, a receiver list including information about endpoints to send processed participant volumetric content.
One of ordinary skill in the art would have been motived because it offers the advantage of instructing forwarding nodes to selectively forward one or more of the plurality of streams to a Virtual Reality [VR] rendering device (see Stokking III Para. [0036]).

Claim 15 reciting similar subject matters to those recited in the apparatus claim 6, and is rejected under similar rationales.

Claims 8 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Stokking et al. (US 2022/0279254, Priority Date Jul. 17, 2019), in view of Barzuza et al. (US 2015/0215581, Pub. Date Jul. 30, 2015), in view of Yang et al. (US 2018/0295180, Pub. Date Oct. 11, 2018).
As per claim 8, Stokking-Barzuza discloses the apparatus according to claim 7, as set forth above, Stokking also discloses wherein, to provision the at least one group media resource function, the processor is further configured to instruct the at least one group media resource function to separate volumetric data for a specific UE when outputting the conference volumetric content to the specific UE.
Yang teaches:
instruct the at least one group media resource function (Yang Fig. 10 and Para. [0106], the method 1000 proceeds to operation 1006, where the service orchestrator 120 assigns the closest service controller and media server for the video conference; Yang Para. [0052], the media servers 118 can be combined with the SDN controllers 114 in other embodiments) to separate data for a specific UE when outputting content to the specific UE (Yang Para. [0097-0098], The media server 118 provides two different media streaming modes that can be selected by each participant: (1) a single media streaming mode in which a selected one of a plurality of individual participant streams is selected and provided to the user device 108 via the single media stream 902; (2) a combined media streaming mode in which all available participant streams are combined in the combined media stream 904. The media 524 can be presented to the user in accordance with the media streaming mode selected by a user).
It would been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to further modify Stokking in view of Yang for wherein, to provision the at least one group media resource function, the processor is further configured to instruct the at least one group media resource function to separate volumetric data for a specific UE when outputting the conference volumetric content to the specific UE,
One of ordinary skill in the art would have been motived because it offers the advantage of allowing user to view a particular participant individually (see Yang Para. [0097).

Claim 10 reciting similar subject matters to those recited in the apparatus claim 8, and is rejected under similar rationales.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Bergmann et al. (US 2019/0098255) Collaborative Virtual Reality Online Meeting Platform; 
Lebaredian et al. (US 2021/0049827) Cloud-Centric Platform For Collaboration And Connectivity On 3d Virtual Environments;
Pounds et al. (US 2021/0203727) Augmented Reality Objects Registry.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VINH NGUYEN whose telephone number is (571)272-4487 and email address is vinh.nguyen1@uspto.gov. The examiner can normally be reached Monday-Friday: 7:30 AM - 5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KAMAL B DIVECHA can be reached on (571)272-5863. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/VINH NGUYEN/Examiner, Art Unit 2453                                                                                                                                                                                                        

/Hitesh Patel/Primary Examiner, Art Unit 2419                                                                                                                                                                                                        
11/8/22