DETAILED ACTION
This Office Action is in response to the amendment filed on 01/25/2022.
Claims 1-20 are pending claims; Claims 1, 10 and 19 are independent claims. This action is made non-final. 

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/25/22 has been entered.
 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-8, 10-17, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Valli et al. (US 2020/0099891 A1; hereinafter as Valli) in view of Miller, IV et al. (US 2019/0188895 A1; hereinafter as Miller) further in view of Lopes et al. (US 2010/0194863 A1; hereinafter as Lopes).


Regarding independent claims 1, 10 and 19, Valli teaches:

 (Claim 1)    A system enabling spatial video-based virtual presence, comprising: one or more server computers, at least one sending client device, and at least one receiving client device wherein the one or more server computers comprise at least one processor and memory storing data and instructions implementing a virtual environment platform comprising at least one virtual environment; wherein the at least one sending client device comprise at least one processor and memory storing data and instructions configured to, when executed by the at least one processor, cause the at least one sending client device to:  | (Claim 10) A method enabling spatial video-based presence, comprising: | (Claim 19.)    A non-transitory computer readable medium having stored thereon instructions configured to cause at least one sending client device comprising a processor and memory to perform steps of claim 10. (Valli, FIGS. 6-8, 21;  [0064] telepresence systems … mobility to include users' ability to move around and to move and orient renderings of their locally captured spaces with respect to other participants. [0073]-[0076] cloud servers, display tools 604, 606, 608, 610, system diagram 700, processor 730; [0080]; [[0175] memory 2230/2232 … include a processor operative to perform instructions stored in a non-transitory computer-readable medium);

generate/generating a user real-time 3D virtual cutout by obtaining a 2D or 3D live video data feed from at least one camera and performing a background removal on the lieve video data feed thereof (Valli, FIGS. 6-8, 21; [0083] spatially-faithful conferencing is enabled via a unified coordinate system between participating sites and their occupants; [0076] a reconstruction and perspective processor 730 combines received calibrated sets of depth and texture into a 3D reconstruction of the local space in real world scale, specifies an origin for the local space either by a rule or by user interaction, specifies an orientation for the local space either by a rule (e.g., compass North) or by user interaction, and sets the local coordinate system using the derived/given origin and orientation, and the real-world scale.  To support visualizing a local user 764 in other spaces (enabling users' virtual visitations in other spaces), virtual perspective videos are produced from the eye-point of each user in the virtual geometry towards the position of the local user 764 in the unified geometry, so that the video is cropped to contain only the local user. The background around the local user may be removed and made transparent; [0147] In some embodiments of systems and methods disclosed herein, one or more users occupy their local spaces (private or office rooms) in various geographical locations. The spaces are captured in real time by multiple 3D sensors along the room walls or inside the spaces, so that a high-quality reconstruction may be formed from each of the user spaces by the system. In addition, user positions may be specifically assigned and tracked by electronic and/or image processing means; [0152] A virtual background for panoramas may be formed 1914 for delivery to other participants. [0155] user positions are set and tracked in a unified space and coordinates formed by combining separate camera and/or 3D sensor captured spaces, where views to and from users are captured based on their tracked positions, and where users stay within a capture space); and

transmit/transmitting the user real-time 3D virtual cutout for presentation by at least one receiving client device (Valli, FIGS. 6-9, 21; [0014] For some embodiments, a method may further include: receiving information from a remote user selected from one of the plurality of users for changing a location of a local environment geometry within the shared virtual geometry; selecting a new background for a background associated with the location indicated by the information received from the remote user; and replacing the background of the combined 2D perspective video with the new background.; [0062]-[0063] combining those multiple captured spaces and their occupants into a unified 3D geometry and to provide spatial orientation over a network. Systems and methods disclosed herein supports spatial faithfulness in both group conferencing and social networking with a large number of spaces and users.; [0076], [0089]; [0092]-[0095] To simplify FIG. 9, room layouts, orientations, and user positions are shown to be equal for all spaces. For some embodiments, a two-dimensional perspective video may be created that combines a background image with a perspective video of one or more remote users from the perspective of a local user. Such a combined video may be transmitted to another user or a server for some embodiments; [0152] A virtual background for panoramas may be formed 1914 for delivery to other participants; [0198]-[0204] [0152]; (examiner notes 3D reconstruction if virtual background is formed, something is in foreground corresponding to the virtual cutout and sending whatever is in foreground corresponding to virtual cutout with a virtual background corresponds to transmitting the realtime virtual cutout with the virtual background).; and

at least one receiving client device comprising at least one processor and memory storing data and instructions configured to, when executed by the at least one processor, cause the at least one receiving client device to (Valli, FIGS. 6-8, 21;  [0064] telepresence systems … mobility to include users' ability to move around and to move and orient renderings of their locally captured spaces with respect to other participants. [0073]-[0076] cloud servers, display tools 604, 606, 608, 610, system diagram 700, processor 730; [0080]; [[0175] memory 2230/2232 … include a processor operative to perform instructions stored in a non-transitory computer-readable medium); 

receive the user real-time 3D virtual cutout (see [0076]; video is cropped to contain only the local user, the background around the local user may be removed and made transparent.  [0202]; transmitting to each of the plurality of users the position of at least one of a plurality of other users within the shared virtual geometry); and 

insert and combine the user real-time 3D virtual cutout with the virtual environment by presenting the live video data feed having the background removed wherein the virtual environment is shared by the at least one sending client device and the at least one receiving client device and corresponding user real-time 3D virtual cutouts, enabling multi-user shared experiences with spatial video-based virtual presence within the virtual environment (Valli, FIGS. 6-9, 21; [0014] For some embodiments, a method may further include: receiving information from a remote user selected from one of the plurality of users for changing a location of a local environment geometry within the shared virtual geometry; selecting a new background for a background associated with the location indicated by the information received from the remote user; and replacing the background of the combined 2D perspective video with the new background.; [0062]-[0063] combining those multiple captured spaces and their occupants into a unified 3D geometry and to provide spatial orientation over a network. Systems and methods disclosed herein supports spatial faithfulness in both group conferencing and social networking with a large number of spaces and users.; [0070]-[0076], [0089]; [0092] spatially faithful (or correct) perspective views of remote participants, separated from their real backgrounds, may be formed and transmitted to each local participant, and positioned according to a unified geometry [0093]-[0095] To simplify FIG. 9, room layouts, orientations, and user positions are shown to be equal for all spaces. For some embodiments, a two-dimensional perspective video may be created that combines a background image with a perspective video of one or more remote users from the perspective of a local user. Such a combined video may be transmitted to another user or a server for some embodiments; [0152]; [0198]-[0204]); and

wherein an orientation of the user real-time 3D virtual cutout is updated automatically by tracking and analyzing one or more of user eye-and-head-tilting data and head-rotation data using the 2D or 3D live video data feed from the at least one camera (Valli, [0155] user positions are set and tracked in a unified space and coordinates formed by combining separate camera and/or 3D sensor captured spaces, where views to and from users are captured based on their tracked positions, and where users stay within a capture space. [0075]-[0076] performs real-time wide base 3D capture of the local space, each sensor produces a depth and texture map of a sub-view [note: captured spaces are captured by separate camera and/or 3D sensor].  Where background around the local user may be removed and made transparent. Also, video may be produced for 360° (full panorama) around each remote users eye-point in the local space. For some embodiments, a local background of a perspective video of a user may be replaced with another background. [0083]-[0084] A mobility and geometry manager 708 may form and maintain a unified coordinate system (or unified virtual geometry 744) for a varying number of parallel conferences according to data from a user and session manager 704 and a panorama and visitation manager 710. The mobility and geometry manager 708 may align sites participating in a conferencing session into a virtual meeting setup (such as, for example, by overlaying sub-space origins and orienting (or rotating) spaces by a rule or according to user selection). [0152] A unified geometry may be formed 1922 by aligning user spaces in a co-centric way with other user spaces in the session (by rotation). A user's position and sub-space origin may be derived 1924 in the unified geometry. User perspectives may be received 1926 from terminals (videos and directional/spatial audio). Connector B 1930 connects FIG. 18A's flowchart 1900 and FIG. 18B's flowchart 1950. Compiled panoramas may be formed 1952 for visiting users (or users replacing their local view) that show a remote user space from an angle corresponding to a local users position and viewpoint; examiner notes a user’s viewpoint corresponds to tracking and analyzing head rotation of user to determine viewpoint).
The prior art includes each element claimed as evidenced above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. One of ordinary skill in the art, before the effective filing date of the claimed invention, could have combined the elements as claimed by known methods (e.g. programming), and that in combination, each element merely performs the same function as it does separately. In addition, one of ordinary skill in the art would have implemented the semantic transitional effect because Valli suggests that each feature or element can be used alone or in any combination with the other features and elements (Valli [0211]).
Alternatively, if Valli is not interpreted to teach the limitations “wherein an orientation of the user real-time 3D virtual cutout is updated automatically by tracking and analyzing one or more of user eye-and-head-tilting data and head-rotation data using the 2D or 3D live video data feed from the at least one camera”, Miller is relied upon for teaching these limitations.
Specifically, Miller teaches at least one sending client device configured to:
wherein an orientation of the user real-time 3D virtual cutout is updated automatically by tracking and analyzing one or more of user eye-and-head-tilting data and head-rotation data using the 2D or 3D live video data feed from the at least one camera (Mille: see [0052]; a virtual avatar is a virtual representation of a real person in an AR/VR/MR environment.  During a telepresence session, a viewer can perceive an avatar of another user in the viewer’s environment and thereby create a tangible sense of the other user’s presence in the viewer’s environment.  [0080]; image acquired by cameras 316 can be processed to identify a pose of a user or another person in the user’s environment; [0081; calculate real or near-real time user head pose from wide field of view image information output from the capture devices 316 {~cameras}. [0137]; generate and update a 3D model of a user and animate the avatar by changing the avatar’s pose, moving the avatar around in a user’s environment or by animating the avatar’s facial expressions. [0383], [0395]; track and update the position and orientation of the viewer’s social triangle as the viewer moves around in the environment).
	Both references are directed to same field of endeavor of the claimed invention (i.e., virtual environment), it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the virtual environment system of Valli to include the feature of automatically updating a 3D virtual cutout based on live video data feed from at least one camera as suggested by Miller to achieve the claimed invention.  One would be motivated to make such a combination is to provide the user a new environments where physical and virtual objects co-exist and interact in real time to facilitate a comfortable, nature-feeling, rich presentation of virtual image elements amongst other virtual or real-world imagery elements (Miller: see [0003]).
	Valli does not expressly teach generating a polygonal structure to be used as a virtual frame to support the live video data feed having the background removed; and 
Insert and combine the user real-time 3D virtual cutout with the virtual environment by presenting the live video data feed having the background removed with the polygonal structure.
	Lopes is relied upon for teaching the limitations.  Specifically, Lopes teaches a technique for generating a user real-time 3D virtual cutout (see ¶ 0095; virtual 3D objects may include a virtual 3D representation of a participant 130 (e.g., a live person) captured by a camera of a different system.  Fig. 8 and ¶ 0115 illustrates a method and a system for generating a virtual 3D representation of a participant from a 2D camera image) by: obtaining a 2D or 3D live video data feed from at least one camera (Fig. 8 and ¶ 0115 illustrates a method and a system for generating a virtual 3D representation of a participant from a 2D camera image); performing a background removal on the live video data feed (see ¶¶ 0075-0076; participant contour 201 is determined using background subtraction {~ removal}); and generating a polygonal structure to be used as a virtual frame to support the live video data feed having the background removed (see ¶ 0075-0077; participant contour 201 as shown in Fig. 2 may be a continuous curve or polygons representing the outline of a participant 130, participant contour 201 may be used as a basis for developing a virtual 3D representation of participant {i.e., virtual 3D participant 155);
	insert and combine the user real-time 3D virtual cutout with the virtual environment by presenting the live video data feed having the background removed with the polygonal structure (see Fig. 1 and ¶ 0079; computer system 120 combines the participant contour 201 with location and/or dimension data regarding participant 130 to generate a virtual 3D representation of participant 130 (i.e., virtual 3D participant 155} in virtual 3D scene 190.  ¶ 0008-0010; an interaction between the first virtual 3D representation of the participant and a second virtual 3D representation of a second object is displayed).
Before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the teaching of Valli and the teaching of Lopes together to provide a feature for generating a user real-time 3D virtual cutout as claimed.  One of ordinary skill in the art would have been motivated to make such a combination because of the overlapping subject matter that both are directed to generating a 3D virtual cutout, and the advantages described in Lopes that 3D video productions and holographic displays capture a degree of realism or invoke greater perceived familiarity that a two-dimensional production or display simply cannot match (Lopes: see ¶ 0002).  

Regarding dependent claims 2 and 11, the rejection of claim 1 is incorporated.  Valli/Miller and Lopes further teach:

wherein the data and instructions further cause the at least one sending client device to establish and automatically update a viewing perspective of the virtual environment provided to the at least one sending client device by using one or more virtual cameras placed virtually and aligned with respect to the user real-time 3D virtual cutout. (Valli, FIGS. 2, 5A-9, 21; [0049]-[0051] cameras; [0071] FIG. 5A shows view lines of cameras 504, 506, 508, 510, 512, 514, 516, 518 for capturing a user 502. Individual viewpoints are formed by capturing each user … a user may have spatially faithful viewpoints only to his or her closest neighbors, captured inside and together with their backgrounds; [0076] background around the local user may be removed and made transparent. Also, video may be produced for 360° (full panorama) around each remote users eye-point in the local space. For some embodiments, a local background of a perspective video of a user may be replaced with another background. [0083]-[0084] A mobility and geometry manager 708 may form and maintain a unified coordinate system ,,,The mobility and geometry manager 708 may align sites participating in a conferencing session into a virtual meeting setup (such as, for example, by overlaying sub-space origins and orienting (or rotating) spaces by a rule or according to user selection). As a result, spatially-faithful conferencing is enabled via a unified coordinate system between participating sites and their occupants; [0092]-[0093]).

Regarding dependent claims 3 and 12, the rejection of claim 2 is incorporated.  Valli/Miller and Lopes further teach:

wherein the alignment of the one or more virtual cameras is in front of the user real-time 3D virtual cutout and tracks movement of the user real-time 3D virtual cutout (Valli, FIGS. 2, 5A-9, 21; [0049]-[0051] cameras; [0067] use location tracking to enable construction from variable viewpoints for various numbers of users and locations. [0071] FIG. 5A shows view lines of cameras 504, 506, 508, 510, 512, 514, 516, 518 for capturing a user 502. Individual viewpoints are formed by capturing each user … a user may have spatially faithful viewpoints only to his or her closest neighbors, captured inside and together with their backgrounds; [0075]-[0076] background around the local user may be removed and made transparent. Also, video may be produced for 360° (full panorama) around each remote user’s eye-point in the local space. For some embodiments, a local background of a perspective video of a user may be replaced with another background. [0083]-[0084] A mobility and geometry manager 708 may form and maintain a unified coordinate system (or unified virtual geometry 744) for a varying number of parallel conferences according to data from a user and session manager 704 and a panorama and visitation manager 710. The mobility and geometry manager 708 may align sites participating in a conferencing session into a virtual meeting setup (such as, for example, by overlaying sub-space origins and orienting (or rotating) spaces by a rule or according to user selection). As a result, spatially-faithful conferencing is enabled via a unified coordinate system between participating sites and their occupants.).

Regarding dependent claims 4 and 13, the rejection of claim 1 is incorporated.  Valli/Miller and Lopes further teach:

wherein the tracking and analyzing one or more of user eye-and-head-tilting data and head-rotation data includes using computer vision methods (Valli,  [0155] user positions are set and tracked in a unified space and coordinates formed by combining separate camera and/or 3D sensor captured spaces {~vision method}, where views to and from users are captured based on their tracked positions, and where users stay within a capture space. [0075]-[0076] performs real-time wide base 3D capture of the local space, each sensor produces a depth and texture map of a sub-view {~vision method} [note: captured spaces are captured by separate camera and/or 3D sensor]. Miller: [0080]; image acquired by cameras 316 can be processed to identify a pose of a user or another person in the user’s environment; [0081; calculate real or near-real time user head pose from wide field of view image information output from the capture devices 316 {~cameras}. [0137]; generate and update a 3D model of a user and animate the avatar by changing the avatar’s pose, moving the avatar around in a user’s environment or by animating the avatar’s facial expressions. [0141]; computer vision techniques).
	Both references are directed to same field of endeavor of the claimed invention (i.e., virtual environment), it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the virtual environment system of Valli to include the vision techniques as suggested by Miller to recognize objects in the environement.  One would be motivated to make such a combination is to provide the user a new environments where physical and virtual objects co-exist and interact in real time to facilitate a comfortable, nature-feeling, rich presentation of virtual image elements amongst other virtual or real-world imagery elements (Miller: see [0003]).

Regarding dependent claims 5 and 14, the rejection of claim 3 is incorporated.  Valli/Miller and Lopes further teach:

wherein the one or more virtual cameras point outward from eye level, or wherein the one or more virtual cameras include two virtual cameras, one per eye, that point outward from the two-eye level, or wherein the one or more virtual cameras point outward from the center of the head-position of the user real-time 3D virtual cutout, or wherein the one or more virtual cameras point outward from the center of the user real-time 3D virtual cutout. (Valli, FIGS. 2, 5A-9, 21; [0045] Augmented reality (AR) and virtual reality (VR) glasses-based systems may support spatially faithful perception. Views from remote places may be brought to a users eye-point, and immersion may be supported by enabling users to view a whole 360° panorama (although a sub-view may be relatively narrow at a time due to the restricted field-of-view of AR glasses); [0049]-[0052] Users may wear AR/VR glasses, which are able to bring proxy camera views directly to each receivers eye-point. AR glasses may provide up to 360° panoramas around the user, even in stereo (S3D).; [0067] use location tracking to enable construction from variable viewpoints for various numbers of users and locations. [0071] FIG. 5A shows view lines of cameras 504, 506, 508, 510, 512, 514, 516, 518 for capturing a user 502. Individual viewpoints are formed by capturing each user … a user may have spatially faithful viewpoints; [0075]-[0077] background around the local user may be removed and made transparent. Also, video may be produced for 360° (full panorama) around each remote users eye-point in the local space. For some embodiments, a local background of a perspective video of a user may be replaced with another background. [0083]-[0084] A mobility and geometry manager 708 may form and maintain a unified coordinate system; [0092] Spatially faithful (or correct) perspective views of remote participants 910, 912, 914, 916, 918, 920, separated from their real backgrounds, may be formed and transmitted to each local participant, and positioned according to a unified geometry 908. Correct user-centric views may be displayed by AR/VR glasses at the eye-point of each user; examiner notes user’s AR/VR glasses enabling spatially faithful perspective views of remote participants)).

Regarding dependent claims 6 and 15, the rejection of claim 1 is incorporated.  Valli/Miller and Lopes further teach:

wherein the one or more server computers include a spatially analyzed media server configured to manage, analyze and process incoming data of a plurality of sending client devices and in such analysis manages or optimizes transmission of data streams to a plurality of receiving client devices.  (Valli, FIGS. 2, 5A-9, 21; [0049]-[0051] cameras; [0055] ore meeting spaces and participants are brought together, the more challenging it becomes to support participants by an unrestricted natural experience in moving and viewing around. If unifying separately captured spaces, their positions and orientations may be optimized for maximum visibility between participants and to avoid virtual collisions with other participants and furniture. [0067] use location tracking to enable construction from variable viewpoints for various numbers of users and locations. [0071] FIG. 5A shows view lines of cameras 504, 506, 508, 510, 512, 514, 516, 518 for capturing a user 502. Individual viewpoints are formed by capturing each user … a user may have spatially faithful viewpoints only to his or her closest neighbors, captured inside and together with their backgrounds; [0075]-[0076] background around the local user may be removed and made transparent. Also, video may be produced for 360° (full panorama) around each remote users eye-point in the local space. For some embodiments, a local background of a perspective video of a user may be replaced with another background. [0081] may use proximity/distance of other users if forming connections and dispatching data (e.g., favoring interactions with virtually nearby users to reduce bitrate and computations). [0083]-[0084] A mobility and geometry manager 708 may form and maintain a unified coordinate system ... The mobility and geometry manager 708 may align sites participating in a conferencing session into a virtual meeting setup (such as, for example, by overlaying sub-space origins and orienting (or rotating) spaces by a rule or according to user selection). [0152] A unified geometry may be formed 1922 by aligning user spaces in a co-centric way with other user spaces in the session (by rotation). A user's position and sub-space origin may be derived 1924 in the unified geometry. User perspectives may be received 1926 from terminals (videos and directional/spatial audio). Connector B 1930 connects FIG. 18A's flowchart 1900 and FIG. 18B's flowchart 1950. Compiled panoramas may be formed 1952 for visiting users (or users replacing their local view) that show a remote user space from an angle corresponding to a local users position and viewpoint; examiner notes optimize how? Features of optimization? Process how? Analyze based on what? Manage transmission how/what? And examiner further notes explicitly teaching of Valli to favor interactions with nearby users to reduce bitrate and computations may correspond to transmission data and optimization thereof based on proximity-based favoring of transactions).


Regarding dependent claims 7 and 16, the rejection of claim 6 is incorporated.  Valli/Miller and Lopes further teach:

wherein the spatially analyzed media server is further configured to: in response to detecting that a distance between two or more user real-time 3D virtual cutouts has changed: calculate a distance difference between the two or more user real-time 3D virtual cutouts (Valli, FIGS. 2, 5A-9, 21; [0007] determining a distance in the shared virtual geometry between a first user selected from the plurality of users and a second user selected from the plurality of users; and responsive to determining that the distance between the first and second users is less than a threshold: selecting a resolution for a representation of the 2D perspective video; and creating the 2D perspective video based on the resolution selected. [0014], [0049]-[0051] cameras; [0054] embodiments of such methods and systems may be used for social interaction and spatial exploration inside a unified virtual landscape with a dynamic unified geometry compiled from separate user spaces, which may enable proximity-based interactions (triggered by distances and/or directions between users or spaces); [0062]-[0063] [0077] An interaction and application controller 724 also connects with a terminal display 722 and may contain user application logic and software, e.g., functions which may be triggered by a user's proximity. Proximity may also be a vector value, sensitive to orientations/directions in the geometry. [0081] may use proximity/distance of other users if forming connections and dispatching data (e.g., favoring interactions with virtually nearby users to reduce bitrate and computations); [0083]-[0084] A mobility and geometry manager 708 may form and maintain a unified coordinate system (or unified virtual geometry 744) for a varying number of parallel conferences according to data from a user and session manager 704 and a panorama and visitation manager 710. [0092]-[0095] To simplify FIG. 9, room layouts, orientations, and user positions are shown to be equal for all spaces. For some embodiments, a two-dimensional perspective video may be created that combines a background image with a perspective video of one or more remote users from the perspective of a local user; examiner notes in a spatially faithful system that explicitly teaches replacement of background images (e.g., corresponding to 3d virtual cutouts), the system is tracking distances to maintain a unified coordinate system/unified virtual geometry 744);

update at least one of a position and an orientation of each user real-time 3D virtual cutout within the virtual environment; update a corresponding viewing perspective respective to each other for each user real-time 3D virtual cutout (Valli, FIGS. 2, 5A-9, 21; [0007] determining a distance in the shared virtual geometry between a first user selected from the plurality of users and a second user selected from the plurality of users; and responsive to determining that the distance between the first and second users is less than a threshold: selecting a resolution for a representation of the 2D perspective video; and creating the 2D perspective video based on the resolution selected. [0014], [0049]-[0051] cameras; [0054] embodiments of such methods and systems may be used for social interaction and spatial exploration inside a unified virtual landscape with a dynamic unified geometry compiled from separate user spaces, which may enable proximity-based interactions (triggered by distances and/or directions between users or spaces); [0049]-[0052] Users may wear AR/VR glasses, which are able to bring proxy camera views directly to each receivers eye-point. AR glasses may provide up to 360° panoramas around the user, even in stereo (S3D); [0062]-[0063] [0077] An interaction and application controller 724 also connects with a terminal display 722 and may contain user application logic and software, e.g., functions which may be triggered by a user's proximity. Proximity may also be a vector value, sensitive to orientations/directions in the geometry. [0081] may use proximity/distance of other users if forming connections and dispatching data (e.g., favoring interactions with virtually nearby users to reduce bitrate and computations); [0083]-[0084] A mobility and geometry manager 708 may form and maintain a unified coordinate system (or unified virtual geometry 744) for a varying number of parallel conferences according to data from a user and session manager 704 and a panorama and visitation manager 710. [0092]-[0095] To simplify FIG. 9, room layouts, orientations, and user positions are shown to be equal for all spaces. For some embodiments, a two-dimensional perspective video may be created that combines a background image with a perspective video of one or more remote users from the perspective of a local user; examiner notes in a spatially faithful system that explicitly teaches replacement of background images (e.g., corresponding to 3d virtual cutouts), the system is tracking distances to maintain a unified coordinate system/unified virtual geometry 744 and updating perspectives); and

send the updated position, orientation and corresponding viewing perspective of each user real-time 3D virtual cutout to corresponding client devices. (Valli, FIGS. 2, 5A-9, 21; [0007] determining a distance in the shared virtual geometry between a first user selected from the plurality of users and a second user selected from the plurality of users; and responsive to determining that the distance between the first and second users is less than a threshold: selecting a resolution for a representation of the 2D perspective video; and creating the 2D perspective video based on the resolution selected. [0014], [0049]-[0051] cameras; [0054] embodiments of such methods and systems may be used for social interaction and spatial exploration inside a unified virtual landscape with a dynamic unified geometry compiled from separate user spaces, which may enable proximity-based interactions (triggered by distances and/or directions between users or spaces); [0062]-[0063] [0077] An interaction and application controller 724 also connects with a terminal display 722 and may contain user application logic and software, e.g., functions which may be triggered by a user's proximity. Proximity may also be a vector value, sensitive to orientations/directions in the geometry. [0081] may use proximity/distance of other users if forming connections and dispatching data (e.g., favoring interactions with virtually nearby users to reduce bitrate and computations); [0083]-[0084] A mobility and geometry manager 708 may form and maintain a unified coordinate system (or unified virtual geometry 744) for a varying number of parallel conferences according to data from a user and session manager 704 and a panorama and visitation manager 710. [0092]-[0095] To simplify FIG. 9, room layouts, orientations, and user positions are shown to be equal for all spaces. For some embodiments, a two-dimensional perspective video may be created that combines a background image with a perspective video of one or more remote users from the perspective of a local user; examiner notes in a spatially faithful system that explicitly teaches replacement of background images (e.g., corresponding to 3d virtual cutouts), the system is tracking distances to maintain a unified coordinate system/unified virtual geometry 744);

Regarding dependent claims 8 and 17, the rejection of claim 1 is incorporated.  Valli/Miller and Lopes further teach:

wherein the at least one sending client device is further configured to perform further processing or improvements on the user real-time 3D virtual cutout. (Valli, FIGS. 2, 5A-9, 21; [0049]-[0051] cameras; [0067] use location tracking to enable construction from variable viewpoints for various numbers of users and locations. [0071] FIG. 5A shows view lines of cameras 504, 506, 508, 510, 512, 514, 516, 518 for capturing a user 502. Individual viewpoints are formed by capturing each user … a user may have spatially faithful viewpoints only to his or her closest neighbors, captured inside and together with their backgrounds; [0075]-[0076] background around the local user may be removed and made transparent. Also, video may be produced for 360° (full panorama) around each remote users eye-point in the local space. For some embodiments, a local background of a perspective video of a user may be replaced with another background. [0083]-[0084] A mobility and geometry manager 708 may form and maintain a unified coordinate system (or unified virtual geometry 744) for a varying number of parallel conferences according to data from a user and session manager 704 and a panorama and visitation manager 710. The mobility and geometry manager 708 may align sites participating in a conferencing session into a virtual meeting setup (such as, for example, by overlaying sub-space origins and orienting (or rotating) spaces by a rule or according to user selection). [0152] A unified geometry may be formed 1922 by aligning user spaces in a co-centric way with other user spaces in the session (by rotation). A user's position and sub-space origin may be derived 1924 in the unified geometry. User perspectives may be received 1926 from terminals (videos and directional/spatial audio). Connector B 1930 connects FIG. 18A's flowchart 1900 and FIG. 18B's flowchart 1950. Compiled panoramas may be formed 1952 for visiting users (or users replacing their local view) that show a remote user space from an angle corresponding to a local users position and viewpoint; examiner notes a user’s viewpoint corresponds to tracking and analyzing head rotation of user to determine viewpoint and such viewpoint tracking may correspond to processing improvements … features of processing improvements).

Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Valli/Miller and Lopes as rejected in claim 1 above, further in view of Baruch (US 20170337693 A1; hereinafter Baruch).

Regarding dependent claims 9 and 18, the rejection of claim 1 is incorporated.  Valli/Miller and Lopes further teach:

wherein the background removal is performed by applying image segmentation through one or more of instance segmentation or semantic segmentation, and usage of deep neural networks by the sending client device or one or more server computers (Valli, [0075]-[0076] background around the local user may be removed and made transparent. Also, video may be produced for 360° (full panorama) around each remote users eye-point in the local space. For some embodiments, a local background of a perspective video of a user may be replaced with another background; [0124] Another method for avoiding continuity issues may combine segmented users into captured and tracked positions in a continuous virtual world or 3D landscape, instead of a compilation of physical views that may have discontinuities. Miller: [0151]; selectively cropping a user’s image, modifying the user’s background.  [0142]; object recognitions can be performed by variety of machine learning algorithm includes instance-based algorithms.  [0141]; the object recognition can be performed using a variety of computer vision techniques including deep neural networks).

Alternatively, Baruch is relied upon for explicitly algorithms for segmentation employing deep learning methods including wherein the background removal is performed by applying image segmentation through one or more of instance segmentation or semantic segmentation, and usage of deep neural networks by the sending client device or one or more server computers (Baruch, [0001], [0019]  FIG. 1, object tracking from frame to frame may be used for many applications that operate in real-time such as security and surveillance, video communication including tele-conferencing, augmented reality ..  As one example, a ten second video sequence 100 is modified with color pop to emphasize a foreground object 110 with a changed color (to red) on a background 112 taken at 30 frames per second. [0043]-[0049] Whether used as a rough segmentation for the start frame, or as the only segmentation algorithm to be applied on the start frame, the color (and/or intensity) based methods may include a graph-cut method (such as Grabcut), Deep-Learning methods (based on convolutional neural networks for example), or class specific methods, such as an algorithm which is segmenting at the rough boundary of an object such as a person rather than using a rectangle. The algorithms for finding the exact boundaries of the person are slower than the ones which only detect a bounding box of the person. A conventional depth data analysis background-foreground segmentation may use a weighted combination of the color and depth data of the pixels to determine whether the pixels are part of the background or the foreground.

Valli/Miller and Lopes pertain to systems and methods for managing user positions in a shared virtual geometry and providing a spatially faithful system (Valli, Abstract) and Valli teaches different features, applied in the mapping above, in relation to different exemplary embodiments.  It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, with the teachings of the various exemplary embodiments before them to modify the combination of features to tailor to the needs and goals at hand (Valli, [0211]).

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Valli/Miller and Lopes as rejected in claim 1 above, further in view of Macq. et al. (EP 3564905 A1; hereinafter Macq).

Regarding dependent claim 20, the rejection of claim 1 is incorporated.  Valli/Miller and Lopes further teach: generate a user 3D virtual cutout by: obtaining a photo of a user (Miller: see ¶ 0157; real-world samples such as photogrammetric scans of real humans performing body movement, articulations, facial contortions, expression) ; and generating a 3D mesh or point cloud for the user 3D virtual cutout based on the photo (see ¶ 0158, 0218-0212; mesh, point cloud).
	Valli/Miller and Lopes do not expressly disclose selectively transmitting the user 3D virtual cutout of the user real-time 3D virtual cutout for presentation by the at least one receiving client device, but Macq is relied upon for teaching the limitations (Macq: see ¶ 0047; Three-dimensional volumetric representations, which may comprise a mesh or point cloud, may comprise all data needed for an object to be rendered from any viewpoint, even initially occluded portions. Macq: See ¶ 0004; transforming the parts to a selected one of a plurality of representational models, the representational models having different respective properties, the selection being based on one or more predetermined rules; and encoding the transformed parts into a combined video set or stream for transmission to the user device.  ¶ 0017; Another example embodiment provides a method comprising: receiving video data representing a three-dimensional scene; receiving from a user device data representing a field of view within the scene; identifying, based on the field of view, a plurality of parts of the scene; transforming the parts to a selected one of a plurality of representational models, the representational models having different respective properties, the selection being based on one or more predetermined rules; and encoding the transformed parts into a combined video set or stream for transmission to the user device).
Before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the teaching of Valli/Miller/Lopes and the teaching of Macq together to provide a feature for generating a user 3D virtual cutout as claimed.  One of ordinary skill in the art would have been motivated to make such a combination because of the overlapping subject matter that both are directed to generating a 3D virtual cutout, and the advantages described in Macq that provide a six degrees-of-freedom virtual reality system, where the user is able to freely move in the virtual space; thus enable the provision and consumption of volumetric reality content (Macq: see ¶ 0003).  


Response to Arguments
Applicant's arguments filed 01/25/22 have been fully considered but they are moot in view of new ground of rejection.
	 
Conclusion
The prior art made of record on form PTO-892 and not relied upon is considered pertinent to applicant's disclosure.  Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action. For example:
Barzura et al (US 2015/0213650 A1) – a method, systems and media for presenting a meeting between remote participants (see abstract).  Meeting participants are represented by an avatar or other graphical representation in place of real-time video of participants (see Fig. 7 and [0063].  
It is noted that any citation to specific, pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way.  A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art.  In re Heck, 699 F.2d 1331, 1332-33,216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006,1009, 158 USPQ 275,277 (CCPA 1968)).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TUYETLIEN T TRAN whose telephone number is (571)270-1033.  The examiner can normally be reached on M-F: 8:00 AM - 8:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Renee Chavez can be reached on 571-270-1104.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/TUYETLIEN T TRAN/Primary Examiner, Art Unit 2179