DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Amendment
This Office Action is made in response to amendment, filed 11/24/2021. Claims 1, 4, 14, and 18 are amended.

Response to Arguments
Applicant’s arguments see “Remarks”, made in an Amendment”, filed 11/24/2021. 
With respect to Claim Rejections - 35 USC § 102 & 35 USC § 103, independent claim 1 has been amended to include in part “receiving, by the at least one processor, a first volumetric media stream from a second device, the first volumetric media stream comprising second volumetric visual data comprising a first volumetric time slice representing a field of view of a second device, the second volumetric visual data generated at a second time after the first time, wherein the first volumetric visual data comprises a pre-generated image unassociated with a respective volumetric time slice; … generating, by the at least one processor, based on the first volumetric visual data and the first portion of the first volumetric time slice representing the first object, a second volumetric time slice comprising the first volumetric visual data and the second portion of the first volumetric time slice, wherein generating the second volumetric time slice comprises removing the first portion from the first volumetric time slice and inserting the pre-generated image into the first volumetric time slice; .... " (Emphasis added).” The Applicant submits that Kamal does not teach the amended portions, specifically, Kamal captures depth data maps rather than images, and thus, does not insert a "pre-generated image into the first volumetric time slice" as amended. In response, the Examiner notes that Kamal actually generates depth maps by receiving visual data real-time, therefore, capturing image data and not capturing depth maps as argued (see FIG.5 of Kamal – plurality of cameras). The Examiner agrees that Kamal does not teach “insert a pre-generated image”, the Examiner finds VELEVSKI (US 2020/0043214 A1) to teach this amended feature (see rejections below). The Applicant further submits that Kamal does not teach "based on the first volumetric visual data and the 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-3, 5-7, 9-12, and 14-20 is rejected under 35 U.S.C. 103 as being unpatentable over Kamal et al., Pub No US 2018/0124371 A1 (hereafter Kamal) in view of VELEVSKI et al., Pub No US 2020/0043214 A1 (hereafter VELEVSKI).

Regarding Claim 1, Kamal discloses a method of providing a volumetric media stream [For purpose of the review the examiner equates volumetric with visual as disclosed in Applicant’s specification paragraph 18. And paragraph 0041 of the Applicant’s specification discloses that volumetric visual data equates to depth data; and col.5, lines 4-7: Discloses a data stream representative of a dynamic volumetric model of the surfaces of the objects included in the real-world scene may be generated.], the method comprising:
receiving, by at least one processor of a first device [para.0044: Discloses depth capture subsystem contains processor; and FIG.3 and para.0050: Discloses node (element 302-1 a first device) is a depth capture subsystem.], first volumetric visual data generated at a first time [para.0044 discloses the capture subsystem includes cameras for capturing visual data; and Thus, node element 302-1 (a first device) receives first volumetric visual data from camera; and para.0056: Discloses the depth data is timestamped, thus first depth data is at a first time.]; 
receiving, by the at least one processor, a first volumetric media stream from a second device, the first volumetric media stream comprising second volumetric visual data comprising a first volumetric time slice representing a field of view of a second device [para.0063: Discloses , the second volumetric visual data generated at a second time after the first time [para.0109: Discloses the depth map capture subsystem performing time-of-flight depth map capture technique may capture a depth map by generating the depth map based on the different times within the plurality of different times subsequent to the particular time at which the emitted light pulse is detected.];  
determining, by the at least one processor, that the first volumetric time slice comprises a first portion and a second portion, the first portion representing a first object and comprising an amount of the second volumetric visual data, and wherein the first object is absent from the second portion [FIG.3 & para.0053: Discloses because of the different fixed node positions of nodes 302 of implementation 300, each node 302 may be associated with a unique perspective of object 202 (the first object) such that the surfaces of object 202 may be detected from various perspectives surrounding object 202 and each node 302 may detect characteristics of the surfaces of object 202 that would be difficult or impossible to detect (absent from the second portion) from the fixed node positions of other nodes 302. For example, all of the respective areas of nodes 302 may be overlapping (a first portion and a second portion) with the respective areas of all the other nodes 302 in an area (e.g., a circular area) designated as real-world scene 304; and FIG.5: Discloses an illustration of a basketball game with field of view from a plurality of nodes 504 independently capturing one or more depth maps. An object such as the basketball will be captured be some of the nodes and not others. For example, the first portion node 504 (one of the plurality of nodes) may have a line of sight view of the basketball at a particular point in time, where a second portion node 504 (another one of the plurality of nodes) at the same particular time may not have a visible line of sight view to the basketball, maybe due to a playing being in the way (blocking the view, thus object is absent from the second portion). The first portion representing a first object and comprising an amount of the second volumetric visual data since all nodes 504 are capturing depth data from different viewpoints.];
determining, by the at least one processor, that the first volumetric visual data represents the first object, wherein the first object is represented by an amount of the first volumetric visual data that is different than the amount of the second volumetric visual data [FIG.8A: Discloses the first volumetric visual data captured at 802-1 and the second volumetric visual data captured at 802-1, both representing the first object 202; FIG.8B & 8C: Discloses ]that the two views differ in amount from each other (the first volumetric visual data and the second volumetric visual data). For example, observing point 212 on the object of the field of view for the first volumetric visual data provides for a full visual, whereas, point 212 on the object of the field of view for the second volumetric visual data provides for a smaller amount visible.]; 
generating, by the at least one processor, based on the first volumetric visual data and the first portion of the first volumetric time slice representing the first object, a second volumetric time slice comprising the first volumetric visual data and the second portion of the first volumetric time slice, wherein generating the second volumetric time slice comprises removing the first portion from the first volumetric time slice [FIG.3 & para.0050: Discloses generating depth data by converging independently-captured depth maps that includes a plurality of nodes 302 (i.e., nodes 302-1 through 302-8) disposed at fixed node positions with respect to ( e.g., in this case, surrounding) a real-world scene 304 that includes object 202 (the first object). Each of nodes 302 may include or implement one or more depth map capture subsystems. Thus, node 302-1 generates a first volumetric data and node 302-2 generates a second volumetric data (and so on with 203-3 thru 8); para.0053: Discloses that all of the respective areas of nodes 302 may be overlapping with the respective areas of all the other nodes 302 in an area designated as real-world scene 304; para.0054: Discloses each node may be associated with an area that includes a portion (e.g., a horizontal portion, a vertical portion, etc. – a slice see paragraph 0078) of the real-world scene that is smaller  than the entire real-world scene. As such, various portions of the real-world scene associated with each node may overlap with other portions of the real-world scene (e.g., portions of the real-world scene associated with neighboring nodes) but may not necessarily overlap with every other portion of the real-world scene associated with every other node. Thus, the second volumetric visual data (depth data generated by node 302-2) may comprise a first volumetric time slice associated with a first volumetric media stream (depth data generated from node ;
generating, by the at least one processor, a second volumetric media stream, the second volumetric media stream comprising the second volumetric time slice [para.0063: Discloses generating a data stream (e.g., a real-time data stream) representative of the dynamic volumetric model of the surfaces of object 202 included in real-world scene 304. For example, data capture processing unit 310 may generate the data stream in real time such that users not physically located within or around real-world scene 304 may be able to experience real-world scene 304 live, in real time or near-real time, via virtual reality media content representative of real-world scene 304; and para.0066: Discloses dynamically selectable viewpoint selected by user 316 while user 316 is experiencing real-world scene 304 using media player, thus, the user may select any view point desired such as 302-2 (the second volumetric time slice) and a second volumetric media stream will be generated and provide to the users media player.];  and
sending, by the at least one processor, the second volumetric media stream for presentation at a third device [para.0064: Discloses providing the media stream to the user’s media player (a third device).]. 
Kamal does not explicitly disclose wherein generating the second volumetric time slice comprises removing the first portion from the first volumetric time slice and inserting the pre-generated image into the first volumetric time slice; and wherein the first volumetric visual data comprises a pre-generated image unassociated with a respective volumetric time slice (emphasis added to distinguish the elements not taught by Kamal). However, in analogous art, VELEVSKI discloses inserting images into combined 

Regarding Claim 2, the combined teachings of Kamal and VELEVSKI discloses the method of claim 1, and Kamal further discloses wherein the first volumetric visual data is different than the first portion of the second volumetric visual data [FIG.3 & para.0053: Discloses because of the different fixed node positions of nodes 302 of implementation 300, each node 302 may be associated with a unique perspective of object 202 such that the surfaces of object 202 may be detected from various perspectives surrounding object 202 and each node 302 may detect characteristics of the surfaces of object 202 that would be difficult or impossible to detect from the fixed node positions of other nodes 302. Thus, the first portion of the captured second depth data (second volumetric visual data) may have an area (first portion) not overlapping with the captured first depth data, therefore, the first volumetric visual data is different than the first portion of the second volumetric visual data.].

Regarding Claim 3, the combined teachings of Kamal and VELEVSKI discloses the method of claim 1, and Kamal further discloses further comprising:
identifying telemetry data associated with the field of view and the second volumetric visual data [para.para.0083-0084: Discloses the field of view continuously changes as the user naturally looks around immersive virtual reality world based on input such as head movements, keyboard input, etc. (identifying telemetry data); and para.0086: Discloses the system may include one or more (first, second, …) depth map capture subsystems that provides volumetric visual data; and para.0092: Discloses using structures structured light sensor (telemetry sensor) disposed at a second fixed position with respect to the real-world scene and configured to detect the structured light pattern (telemetry data) ; and
determining, based on the telemetry data, a second field of view associated with the second device and the second volumetric visual data, the field of view different than the second field of view [para.0084: Discloses the media player device may detect (determining) user input (e.g., moving or turning the display screen upon which, the field of view is presented – telemetry data). In response, the field of view may display different objects and/or objects seen from a different viewpoint (e.g., a viewpoint corresponding to the position of the display screen – volumetric visual data) in place of the objects seen from the previous viewpoint (the field of view different than the second field of view).],
wherein the first volumetric visual data represents the first object using the second field of view [FIG.9: Discloses a Converged Depth Map (first volumetric visual data) representing object points “A”, “B”, and “C”, where the data point for “B”  is using the second field of view (second portion of the first volumetric visual data).].

Regarding Claim 5, the combined teachings of Kamal and VELEVSKI discloses the method of claim 1, and Kamal further discloses wherein determining that the first volumetric visual data represents the first object is based on one or more image analysis techniques, and wherein determining the first volumetric visual data is based on the determination that the first volumetric visual data represents the first object [para.0089: Discloses system use confidence values assigned and analyzed to ensure that the most effectively-captured depth data is relied on to the greatest extent; and FIG.9 & para.0131: Discloses determining Converged Depth Map (first volumetric visual data) is based object point confidence values.].

Regarding Claim 6, the combined teachings of Kamal and VELEVSKI discloses the method of claim 1, and Kamal further discloses further comprising:
determining a location associated with the second device and the second time [para.0017: Discloses "depth data" includes any spatial location data, positional coordinate data, or other data representative of a position of one or more surfaces (e.g., or, more particularly, one or more physical ; and
determining, based on the location, the first object [para.0018: Discloses depth map may be representative of at least one surface of an object (e.g., an object included within a real-world scene) by including or implementing depth data (e.g., depth data points each representative of a particular physical point on a surface of an object) that describes the spatial location, positional coordinates, etc., for the surface of the object.],
wherein determining that the first volumetric visual data represents the first object is based on the location of the second device [para.0030: Discloses depth map may access a depth map by receiving the depth map from another device (the second device); and FIG.5: Discloses a plurality of nodes (devices) each capturing depth data respective to their location with respect to the field of view, in this illustration a basketball could be the first object; and para.0053: Discloses each of nodes may be positioned so as to capture all or substantially all of the circular area designated as real-world scene from the perspective (i.e., angle, distance, etc.) afforded by the respective fixed node position of the node.].

Regarding Claim 7, the combined teachings of Kamal and VELEVSKI discloses the method of claim 1, and Kamal further discloses further comprising determining that the first object is to be presented at the second time, wherein determining that the first volumetric visual data represents the first object is based on the determination that the first object is to be presented at the second time [FIG.10 & para.0139: Discloses accessing a second, independently-captured (a second time after the first time) depth map of the surfaces of the objects included in the real-world scene. The second depth map accessed in operation 1004 may have been captured independently from the first depth map accessed in operation 1002; and para.0140: Discloses the depth data generation system may converge the first and second depth maps into a converged depth map of the surfaces of the objects included in the real-world scene. Thus, if the confidence value of a certain point on the surface of object is of a higher confidence value, then the first object is to be presented at the second time.].

Regarding Claim 9, the combined teachings of Kamal and VELEVSKI discloses the method of claim 1, and Kamal further discloses wherein the first volumetric visual data comprises a representation of a person or a structure, further comprising determining that the person or the structure is associated with the first object [para.0042: Discloses objects could represent a person or another living thing, a non-transparent solid, liquid, or gas, a less discrete object such as a wall, a ceiling, a floor, or any other type of object.].

Regarding Claim 10, the combined teachings of Kamal and VELEVSKI discloses the method of claim 1, and Kamal further discloses further comprising:
determining that the first volumetric time slice comprises a third portion representing a second object [FIG(s).3,5 & para.0078-0079: Discloses fixed node positions at which nodes 504 (plurality of nodes - 504-i) independently capturing one or more depth maps (second, third, fourth … volumetric visual data) representative of surfaces of objects 506 (a field of view) included in real world scene 502, such as a basketball (first object) and a player (a second object) and, are located illustrate an example where each node 504 may be associated with only a particular slice (i.e., a horizontal slice) of a real-world scene; and para.0053: Discloses the respective areas of nodes (plurality) may be overlapping (a first, second, third, … portions) with the respective areas of all the other nodes  in an area (e.g., a circular area) designated as real-world scene.]; and
determining that the second object is not represented by the first volumetric visual data or by third volumetric visual data generated before the second time [FIG.5: Discloses an illustration of a basketball game with field of view from a plurality of nodes 504 independently capturing one or more depth maps. Objects such as the basketball or a player can be captured from some of the nodes and not others. For example, a player (a second object) is not represented by the first volumetric visual data or by third volumetric visual data generated before the second time (depth mad data captured by a second node at a particular time) due to player not in view by the first volumetric visual data or by third volumetric visual data before the capture of second map data from the second node.].
wherein the second volumetric time slice further comprises the third portion [para.0053 & FIG.3: Discloses each node 302 may be associated with a unique perspective of object 202 such that the surfaces of object 202 may be detected from various perspectives (multiple portion) surrounding object 202. Each of nodes 302 may be positioned so as to capture all or substantially all of the circular area designated as real-world scene 304 from the perspective (i.e., angle, distance, etc.) afforded by the respective fixed node position of the node. For example, all of the respective areas of nodes 302 may be overlapping with the respective areas of all the other nodes 302 in an area (e.g., a circular area) designated as real-world scene 304.].

Regarding Claim 11, the combined teachings of Kamal and VELEVSKI discloses the method of claim 1, and Kamal further discloses further comprising:
determining that the first volumetric time slice comprises a third portion representing a second object [FIG(s).3,5 & para.0078-0079: Discloses fixed node positions at which nodes 504 (plurality of nodes - 504-i) independently capturing one or more depth maps (second, third, fourth … volumetric visual data) representative of surfaces of objects 506 (a field of view) included in real world scene 502, such as a basketball (first object) and a player (a second object) and, are located illustrate an example where each node 504 may be associated with only a particular slice (i.e., a horizontal slice) of a real-world scene; and para.0053: Discloses the respective areas of nodes (plurality) may be overlapping (a first, second, third, … portions) with the respective areas of all the other nodes  in an area (e.g., a circular area) designated as real-world scene.];
determining third volumetric visual data generated before the second time [para.0056: Discloses the data detected by different nodes 302 may be timestamped with a universal time shared by all of nodes 302 in system 100. Thus, determining third volumetric visual data generated before the second time based on the timestamps.]; and
determining that the second object is represented by the third volumetric visual data [FIG.5: Discloses an illustration of a basketball game with field of view from a plurality of nodes 504 independently capturing one or more depth maps. Objects such as the basketball or a player (second object) can be captured (represented) from some of the nodes (first, second, third … volumetric visual ,
wherein the second volumetric time slice further comprises the third volumetric visual data [FIG.9: Discloses Depth Map 1 (the second volumetric time slice) comprises same data (A, B, C …) in Depth Map 2 (third volumetric visual data).].

Regarding Claim 12, the combined teachings of Kamal and VELEVSKI discloses the method of claim 1, and Kamal further discloses wherein the first device is more proximal to the third device than to the second device [para.0029: Discloses generating depth data by converging independently-captured depth maps from one or more independently-captured depth maps (captured from one or more devices); and para.0036: Discloses an actual position of the physical point in 3D space, an actual depth or distance of the physical point from the element of the depth map capture subsystem performing the detection, etc.), system 100 may assign that particular depth data point a relatively high confidence value. Thus, allowing for a scenario where a first device to be more proximal to the third device than to the second device based on confidence values.].

Regarding Claim 14, Kamal discloses a system comprising memory coupled to at least one processor [For purpose of the review the examiner equates volumetric with visual as disclosed in Applicant’s specification paragraph 18. And paragraph 0041 of the Applicant’s specification discloses that volumetric visual data equates to depth data.], the at least one processor configured to:
receive first volumetric visual data generated at a first time [para.0044 discloses the capture subsystem includes cameras for capturing visual data (receives first volumetric visual data from camera); and para.0056: Discloses the depth data is timestamped, thus first depth data is at a first time.];
receive a first volumetric media stream from a second device, the first volumetric media stream comprising second volumetric visual data comprising a first volumetric time slice representing a field of view of a first device [para.0063: Discloses users not physically located within or around real-world scene receiving volumetric media stream (a first volumetric media stream) to , the second volumetric visual data generated at a second time after the first time [para.0109: Discloses the depth map capture subsystem performing time-of-flight depth map capture technique may capture a depth map by generating the depth map based on the different times within the plurality of different times subsequent to the particular time at which the emitted light pulse is detected.];
determine that the first volumetric time slice comprises a first portion and a second portion, the first portion representing a first object and comprising an amount of the second volumetric visual data, and wherein the first object is absent from the second portion [FIG.3 & para.0053: Discloses because of the different fixed node positions of nodes 302 of implementation 300, each node 302 may be associated with a unique perspective of object 202 (the first object) such that the surfaces of object 202 may be detected from various perspectives surrounding object 202 and each node 302 may detect characteristics of the surfaces of object 202 that would be difficult or impossible to detect (absent from the second portion) from the fixed node positions of other nodes 302. For example, all of the respective areas of nodes 302 may be overlapping (a first portion and a second portion) with the respective areas of all the other nodes 302 in an area (e.g., a circular area) designated as real-world scene 304; and FIG.5: Discloses an illustration of a basketball game with field of view from a plurality of nodes 504 independently capturing one or more depth maps. An object such as the basketball will be captured be some of the nodes and not others. For example, the first portion node 504 (one of the plurality of nodes) may have a line of sight view of the basketball at a particular point in time, where a second portion node 504 (another one of the plurality of nodes) at the same particular time may not have a visible line of sight view to the basketball, maybe due to a playing being in the way (blocking the view, thus object is absent from the second portion). The first portion representing a first object and comprising an amount of the second volumetric visual data since all nodes 504 are capturing depth data from different viewpoints.];
determine that the first volumetric visual data represents the first object, wherein the first object is represented by an amount of the first volumetric visual data that is different than the amount of the second volumetric visual data [FIG.8A: Discloses the first volumetric visual data captured at 802-1 and the second volumetric visual data captured at 802-1, both representing the first object 202; FIG.8B & 8C: Discloses ]that the two views differ in amount from each other (the first volumetric visual data and the second volumetric visual data). For example, observing point 212 on the object of the field of view for the first volumetric visual data provides for a full visual, whereas, point 212 on the object of the field of view for the second volumetric visual data provides for a smaller amount visible.];
generate a second volumetric time slice comprising the first volumetric visual data and the second portion of the first volumetric time slice [para.0066: Discloses dynamically selectable viewpoint selected by user 316 while user 316 is experiencing real-world scene 304 using media player. Thus, the user may select node 302-2 (the second volumetric time slice) that includes overlap of the second portion, where the portion of the first volumetric time slice 302-1 that overlaps with 302-2 of the first volumetric time slice.];
generate, based on the first volumetric visual data and the first portion of the first volumetric time slice representing the first object, a second volumetric media stream, the second volumetric media stream comprising the second volumetric time slice, wherein to generate the second volumetric time slice comprises to remove the first portion from the first volumetric time slice [para.0063: Discloses generating a data stream (e.g., a real-time data stream) representative of the dynamic volumetric model of the surfaces of object 202 included in real-world scene 304. For example, data capture processing unit 310 may generate the data stream in real time such that users not physically located within or around real-world scene 304 may be able to experience real-world scene 304 live, in real time or near-real time, via virtual reality media content representative of real-world scene 304; and para.0066: Discloses dynamically selectable viewpoint selected by user 316 (removing) while user 316 is experiencing real-world scene 304 using media player, thus, the user may select any view point desired such as 302-2 (the second volumetric time slice) and a second volumetric media stream will be generated and provide to the users media player.],; and
send the second volumetric media stream for presentation at a second device [para.0064: Discloses providing the media stream to the user’s media player (a third device).].
 a pre-generated image unassociated with a respective volumetric time slice; and insert the pre-generated image into the first volumetric time slice  (emphasis added to distinguish the elements not taught by Kamal). However, in analogous art, VELEVSKI discloses inserting images into combined image [para.0053] and stitching the pre-generated image [para.0157]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to modify Kamal in view of VELEVSKI to inserting the pre-generated image into the first volumetric time slice. One would be motivated at the time of the invention to provide this capability because there is a need to provide a system for image transfer and review which provides a relatively quick and preferably low latency system for the review of large numbers of images, or image stacks [para.0012].

Regarding Claim 15, the combined teachings of Kamal and VELEVSKI discloses the system of claim 14, and Kamal further discloses wherein the system is more proximal to the second device than to the first device [para.0029: Discloses generating depth data by converging independently-captured depth maps from one or more independently-captured depth maps (captured from one or more devices); and para.0036: Discloses an actual position of the physical point in 3D space, an actual depth or distance of the physical point from the element of the depth map capture subsystem performing the detection, etc.), system 100 may assign that particular depth data point a relatively high confidence value. Thus, allowing for a scenario where a first device to be more proximal to the third device than to the second device based on confidence values.].

Regarding Claim 16, the combined teachings of Kamal and VELEVSKI discloses the system of claim 14, and Kamal further discloses wherein the at least one processor is further configured to:
identify telemetry data received from a third device, the telemetry data associated with the field of view and the second volumetric visual data [para.para.0083-0084: Discloses the field of view continuously changes as the user naturally looks around immersive virtual reality world based on input such as head movements, keyboard input, etc. (identifying telemetry data); and para.0086: Discloses the system may include one or more (first, second, …) depth map capture subsystems that provides ; and
determine, based on the telemetry data, a second field of view associated with the second device and the second volumetric visual data, the field of view different than the second field of view [para.0084: Discloses the media player device may detect (determining) user input (e.g., moving or turning the display screen upon which, the field of view is presented – telemetry data). In response, the field of view may display different objects and/or objects seen from a different viewpoint (e.g., a viewpoint corresponding to the position of the display screen – volumetric visual data) in place of the objects seen from the previous viewpoint (the field of view different than the second field of view).],
wherein the first volumetric visual data represents the first object using the second field of view [FIG.9: Discloses a Converged Depth Map (first volumetric visual data) representing object points “A”, “B”, and “C”, where the data point for “B”  is using the second field of view (second portion of the first volumetric visual data).].

Regarding Claim 17, the combined teachings of Kamal and VELEVSKI discloses the system of claim 14, and Kamal further discloses wherein the first volumetric visual data is received from a third device in a different geographic location than the system and the first device [para.0055: Discloses in certain examples, real-world scene may include several areas (e.g., geographical areas). As such, nodes may be distributed to cover several distinct areas.].

Regarding Claim 18, Kamal discloses a device comprising memory coupled to at least one processor [For purpose of the review the examiner equates volumetric with visual as disclosed in Applicant’s specification paragraph 18. And paragraph 0041 of the Applicant’s specification discloses that volumetric visual data equates to depth data.], the at least one processor configured to:
receive first volumetric visual data generated at a first time [para.0044 discloses the capture subsystem includes cameras for capturing visual data; and Thus, node element 302-1 (a first device) ; 
receive a first volumetric media stream from a second device, the first volumetric media stream comprising second volumetric visual data comprising a first volumetric time slice representing a field of view of a second device [para.0063: Discloses users not physically located within or around real-world scene receiving volumetric media stream (a first volumetric media stream) to experience real-world scene; and para.0066: Discloses user having selectable viewpoints (time slices). For example, the node elements of FIG.3 each have different viewpoint (a first, second, …volumetric time slices) representing a field of view of a for example node element 302-2 (a first volumetric time slice representing a field of view of a second device).], the second volumetric visual data generated at a second time after the first time [para.0109: Discloses the depth map capture subsystem performing time-of-flight depth map capture technique may capture a depth map by generating the depth map based on the different times within the plurality of different times subsequent to the particular time at which the emitted light pulse is detected.];  
determine that the first volumetric time slice comprises a first portion and a second portion, the first portion representing a first object and comprising an amount of the second volumetric visual data, and wherein the first object is absent from the second portion [FIG.3 & para.0053: Discloses because of the different fixed node positions of nodes 302 of implementation 300, each node 302 may be associated with a unique perspective of object 202 (the first object) such that the surfaces of object 202 may be detected from various perspectives surrounding object 202 and each node 302 may detect characteristics of the surfaces of object 202 that would be difficult or impossible to detect (absent from the second portion) from the fixed node positions of other nodes 302. For example, all of the respective areas of nodes 302 may be overlapping (a first portion and a second portion) with the respective areas of all the other nodes 302 in an area (e.g., a circular area) designated as real-world scene 304; and FIG.5: Discloses an illustration of a basketball game with field of view from a plurality of nodes 504 independently capturing one or more depth maps. An object such as the basketball will be captured be some of the nodes and not others. For example, the first portion node 504 (one of the plurality of nodes) may have a line of sight view of the basketball at a particular point in time, where a ;
determine that the first volumetric visual data represents the first object, wherein the first object is represented by an amount of the first volumetric visual data that is different than the amount of the second volumetric visual data [FIG.8A: Discloses the first volumetric visual data captured at 802-1 and the second volumetric visual data captured at 802-1, both representing the first object 202; FIG.8B & 8C: Discloses ]that the two views differ in amount from each other (the first volumetric visual data and the second volumetric visual data). For example, observing point 212 on the object of the field of view for the first volumetric visual data provides for a full visual, whereas, point 212 on the object of the field of view for the second volumetric visual data provides for a smaller amount visible.]; 
generate based on the first volumetric visual data and the first portion of the first volumetric time slice representing the first object, a second volumetric time slice comprising the first volumetric visual data and the second portion of the first volumetric time slice, wherein to generate the second volumetric time slice comprises to remove the first portion from the first volumetric time slice [FIG.3 & para.0050: Discloses generating depth data by converging independently-captured depth maps that includes a plurality of nodes 302 (i.e., nodes 302-1 through 302-8) disposed at fixed node positions with respect to ( e.g., in this case, surrounding) a real-world scene 304 that includes object 202 (the first object). Each of nodes 302 may include or implement one or more depth map capture subsystems. Thus, node 302-1 generates a first volumetric data and node 302-2 generates a second volumetric data (and so on with 203-3 thru 8); para.0053: Discloses that all of the respective areas of nodes 302 may be overlapping with the respective areas of all the other nodes 302 in an area designated as real-world scene 304; para.0054: Discloses each node may be associated with an area that includes a portion (e.g., a horizontal portion, a vertical portion, etc. – a slice see paragraph 0078) of the real-world scene that is smaller  than the entire real-world scene. As such, various portions ;
generating, by the at least one processor, a second volumetric media stream, the second volumetric media stream comprising the second volumetric time slice [para.0063: Discloses generating a data stream (e.g., a real-time data stream) representative of the dynamic volumetric model of the surfaces of object 202 included in real-world scene 304. For example, data capture processing unit 310 may generate the data stream in real time such that users not physically located within or around real-world scene 304 may be able to experience real-world scene 304 live, in real time or near-real time, via virtual reality media content representative of real-world scene 304; and para.0066: Discloses dynamically selectable viewpoint selected by user 316 while user 316 is experiencing real-world scene 304 using media player, thus, the user may select any view point desired such as 302-2 (the second volumetric time slice) and a second volumetric media stream will be generated and provide to the users media player.];  and
sending, by the at least one processor, the second volumetric media stream for presentation at a third device [para.0064: Discloses providing the media stream to the user’s media player (a third device).]. 
a pre-generated image unassociated with a respective volumetric time slice and insert the pre-generated image into the first volumetric time slice (emphasis added to distinguish the elements not taught by Kamal). However, in analogous art, VELEVSKI discloses inserting images into combined image [para.0053] and stitching the pre-generated image [para.0157]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to modify Kamal in view of VELEVSKI to inserting the pre-generated image into the first volumetric time slice. One would be motivated at the time of the invention to provide this capability because there is a need to provide a system for image transfer and review which provides a relatively quick and preferably low latency system for the review of large numbers of images, or image stacks [para.0012].

Regarding Claim 19, the combined teachings of Kamal and VELEVSKI discloses the device of claim 18, and Kamal further discloses wherein the at least one processor is further configured to:
determine a location associated with the second device and the second time [para.0017: Discloses "depth data" includes any spatial location data, positional coordinate data, or other data representative of a position of one or more surfaces (e.g., or, more particularly, one or more physical points on the surfaces) of one or more objects in three dimensional ("3D") space; and para.0056: Discloses nodes 302 may send and receive timing signals to ensure that each node 302 detects corresponding data at the same time and that the data detected by different nodes 302 may be timestamped with a universal time shared by all of nodes 302 in system 100.]; and
determine, based on the location, the first object [para.0018: Discloses depth map may be representative of at least one surface of an object (e.g., an object included within a real-world scene) by including or implementing depth data (e.g., depth data points each representative of a particular physical point on a surface of an object) that describes the spatial location, positional coordinates, etc., for the surface of the object.],
wherein to determine that the first volumetric visual data represents the first object is based on the location of the second device [para.0030: Discloses depth map may access a depth map by receiving the depth map from another device (the second device); and FIG.5: Discloses a plurality of .

Regarding Claim 20, the combined teachings of Kamal and VELEVSKI discloses the device of claim 18, and Kamal further discloses wherein the at least one processor is further configured to determine that the first object is to be presented at the second time, wherein to determine that the first volumetric visual data represents the first object is based on the determination that the first object is to be presented at the second time [FIG.10 & para.0139: Discloses accessing a second, independently-captured (a second time after the first time) depth map of the surfaces of the objects included in the real-world scene. The second depth map accessed in operation 1004 may have been captured independently from the first depth map accessed in operation 1002; and para.0140: Discloses the depth data generation system may converge the first and second depth maps into a converged depth map of the surfaces of the objects included in the real-world scene. Thus, if the confidence value of a certain point on the surface of object is of a higher confidence value, then the first object is to be presented at the second time.].

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Kamal et al., Pub No US 2018/0124371 A1 (hereafter Kamal) in view of VELEVSKI et al., Pub No US 2020/0043214 A1 (hereafter VELEVSKI) and further in view of Lim et al., Pub No US 2019/0318488 A1 (hereafter Lim).

Regarding Claim 4, the combined teachings of Kamal and VELEVSKI discloses the method of claim 1, and further discloses further comprising:
stitching the pre-generated image [VELEVSKI – para.0157: Discloses stitching the pre-generated image.] and the second portion of the first volumetric time slice, wherein the inserting comprises the stitching [Kamal - FIG.9 and para.0131: Discloses converging (stitching) of two independently-captured depth maps 902 (i.e., depth maps 902-1 (first volumetric visual data) and 902-2 (second portion of the first volumetric visual data) into exemplary converged depth maps 904 based on confidence values.]; and
the first volumetric visual data and the second portion of the first volumetric time slice [Kamal - para.0063: Discloses generating a data stream (e.g., a real-time data stream - encoding) representative of the dynamic volumetric model of the surfaces of object 202 included in real-world scene 304. For example, data capture processing unit 310 may generate the data stream in real time such that users not physically located within or around real-world scene 304 may be able to experience real-world scene 304 live, in real time or near-real time, via virtual reality media content representative of real-world scene 304; and claim 14: Discloses generating a data stream based on the converged depth map representative of a dynamic volumetric model of the surfaces of the objects included in the real-world scene.],
wherein generating the second volumetric time slice is based on the stitching and the encoding [Kamal - para.0063: Discloses generating a data stream (e.g., a real-time data stream –comprises encoding) representative of the dynamic volumetric model of the surfaces of object 202 included in real-world scene 304; and claim 14: Discloses generating a data stream based on the converged depth map representative of a dynamic volumetric model of the surfaces of the objects included in the real-world scene.].
Kamal discloses generating a data stream, but the combination does not explicitly disclose encoding the first volumetric visual data and the second portion of the first volumetric time slice (emphasis added to distinguish the elements not taught by Kamal). However, in analogous art, Lim discloses point clouds are a set of 3D points that represent a model of a surface of an object or a scene [0004] and discloses an encoding device for point clouds encoding to generate a stream [para.0007]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to modify Kamal and VELEVSKI in view of Lim to encode data to generate a stream for streaming. One would be motivated at the time of the invention to encode due to the large bitrate requirement, point clouds are often compressed prior to transmission [para.0004]. This claim is rejected on the same grounds as claim 1.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Kamal et al., Pub No US 2018/0124371 A1 (hereafter Kamal) in view of VELEVSKI et al., Pub No US 2020/0043214 A1 (hereafter VELEVSKI) and further in view of Jason Thomas Faulkner, Pub No US 2020/0184653 A1 (hereafter Faulkner).

Regarding Claim 8, the combined teachings of Kamal and VELEVSKI discloses the method of claim 1, and does not explicitly disclose further comprising:
receiving audio data associated with the second device;
identifying one or more keywords included in the audio data; and
determining, based on the one or more keywords, that the audio data is associated with the first object, wherein determining that the first volumetric visual data represents the first object is based on the determination that the audio data is associated with the first object.
However, in analogous art, Faulkner discloses the following:
receiving audio data associated with the second device [para.0003: Discloses when a real-world object moves out of a viewing area of a first device, a second device can use data shared by the first device to identify the real-world object as the object comes into a viewing area of the second device; and para.0083: Discloses receiving an individual stream of live or recorded content can comprise media data associated with a video feed provided by a video camera (e.g., audio and visual data that capture the appearance and speech of a user participating in the communication session).];
identifying one or more keywords included in the audio data [para.0051: Discloses analyzing the audio data to determine (identifying) keywords.]; and
determining, based on the one or more keywords, that the audio data is associated with the first object, wherein determining that the first volumetric visual data represents the first object is based on the determination that the audio data is associated with the first object [para.0051: Discloses the identified keyword is associated with an object that was moving out of view of first device to in range of second device, thus the first volumetric visual data represents the first object is based on the determination that the audio data is associated with the first object.]. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to modify Kamal and VELEVSKI in view of Faulkner to decide of object based on a keyword found in the audio data. One would be motivated at the time of the invention due to a need for enhanced techniques for tracking the movement of real-world objects for improved positioning of virtual objects shared within a collaborative environment [para.0003].

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Kamal et al., Pub No US 2018/0124371 A1 (hereafter Kamal) in view of VELEVSKI et al., Pub No US 2020/0043214 A1 (hereafter VELEVSKI) and further in view of Paek et al., Pub No US 2014/0347262 A1 (hereafter Paek).

 the combined teachings of Kamal and VELEVSKI discloses the method of claim 1, and Kamal further discloses further comprising:
identifying a user input associated with the second device, the user input indicative of a selection of the first object [para.0069: Discloses media player device 314 may include or be implemented by any device (a second device) capable of presenting a field of view of an immersive virtual reality world (e.g., an immersive virtual reality world representative of real-world scene 304) and detecting user input from user 316 to dynamically update the immersive virtual reality world presented within the field of view. Inputs of the media player device 314 may include, connect to, or otherwise be associated with sensory feedback devices such as sensory feedback gloves, sensory feedback body suits, and the like, which may present the sensory data to provide users with a sensation of feeling, touching, smelling, or otherwise perceiving particular objects (indicative of a selection of the first object).]; and
Although Kamal discloses audio, video, and/or other cues may be used by each node 302 to ensure that each node 302 detects corresponding data, the combined teachings do not explicitly disclose the corresponding data could be for determining purchase information associated with the first object, wherein the second volumetric media stream comprises an indication of the purchase information for concurrent display with the second volumetric time slice (emphasis added to distinguish the elements not taught by combination). However, in analogous art, Paek discloses a video stream with image data [para.0026] presented on a display, the image data includes a representation of an object (e.g., a three-dimensional object) that is presented on the display screen for viewing by the viewer. Paragraph 0045 discloses the display screen can be a touch-sensitive display, and the viewer 104 can select the object by touching the display screen at a location of the object. Responsive to the selection of the object being detected causes an option to purchase the object to be presented on the display screen. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to modify Kamal and VELEVSKI in view of Paek to provide indication of the purchase information for concurrent display. One would be motivated at the time of the invention to this capability due to the need of various technologies pertaining to displaying objects on a display screen with visual verisimilitude with the object from the perspective of a viewer of the display screen [para.0004].

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Dmitry Kozko, (US 9,576,394 B1) - Discloses one or more video feeds obtained from an event can be leveraged to enable a custom view from any location within environment. The location can permit a virtual camera to be placed at a user specified location. Feeds can be captured by one or more cameras and can be conveyed to a broadcast system [col.4, lines 36-42].
Peleg et al., (US 2020/0143838 A1) - Discloses obtaining an input video; obtaining at least one pre-specified object, being a visual or an acoustic object or a descriptor thereof; analyzing the input video, to detect a matched object, being an object having descriptors similar to the descriptors of the at least one pre-specified object; and generating a redacted video by removing or replacing the matched objects therefrom (ABSTARCT).


If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nasser Goodarzi can be reached on 571-272-4195.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system; contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ADIL OCAK/Examiner, Art Unit 2426



/NASSER M GOODARZI/Supervisory Patent Examiner, Art Unit 2426