DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 12 January 2022 have been fully considered but they are not persuasive. In particular, the applicant disagrees with the reliance on the prior art of Herman. On this, the applicant argues that Herman merely mentions neural network and does not teach or suggest inferring additional instances of video of the environment from other perspectives not captured by the user devices as is claimed. However, it should be pointed out that the neural network of Herman is not merely mentioned for only feature detection, but was recited in the prior art to indicate techniques for feature detection or semantic segmentation of current images and past data collections. The importance of recitation of semantic segmentation in Herman indicates that use of a convolution neural network is used on current and past images which aids the system in removing undesired scenes from composite generation, ¶21. While the prior art does disclose the use of a convolution neural network in the process of feature detection and could be combined with the notion that it helps in locating the camera in a space relative to the hose vehicle, it also teaches that the techniques being disclosed by Herman uses convolution neural network to also indicate relevant scenes by using historical and present received images to generate the composite image. This means that the feature detection technique directed to not only specifies the use of a convolution neural network in the way the applicant asserts, but also can be relevant in a technique to also use the convolution neural network in removing scenes from historical and present day received images used to generate the composite image. When considering also that ¶13 also discloses that the compositing technique also includes use of images requested from a vehicle-to-vehicle module which a host vehicle receives requested captured images from other vehicles, it teaches an idea similar to what is being claimed, in that it takes into consideration also instances from other perspectives not captured from the host vehicle. When ¶20 and 13 are combined, use of a convolution neural network to semantic segment the current images, past images, and the images collected from vehicles other than the host vehicle is similar to what is being claimed. for this reason, the examiner maintains that Herman discloses the claimed limitation. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 1,2,4-6,8,10,14,15,19,20,21 rejected under 35 U.S.C. 103 as being unpatentable over Baldwin; Leo Benedict (US 9264598 B1) in view of Herman; David Michael et al. (US 20190188901 A1)
Regarding claim 1, Baldwin teaches, 
A non-transitory computer readable medium storing computer code executable (11:27-49 and Fig.7-704, “executing instructions that can be stored in at least one memory device or element 704”) by a processor (11:27-49 and Fig.7-702, “central processor 702 for executing instructions”) to perform a method comprising: 
receiving, at a system (11:4-26, “centralized data store” which receives the uploaded “images captured” of the “collaborative system”) from a plurality of user devices, (11:4-26, “captured image may be immediately uploaded automatically to a centralized data store” captured from users that has “device that has opted into the image capture moment”) a plurality of instances of video of an environment, (11:4-26 and 4:45-65,”a sufficient number of images have been uploaded” from one or more user devices such as “image capturing component 106, in full-frame video mode”) each instance of the video captured by a user device (11:4-26 and 10:14-51, “multiple images are captured at substantially the same time” obtained from “participating user devices” where the “centralized computing device(s) of the system may determine a coordinated time for image capture 506”) of the plurality of user devices from a perspective of the user device; (11:4-26, 3:10-13, 4:5-44, and Fig. 5-516, “user device that has opted into the image capture moment may capture an image 516 such that multiple images are captured” from “each user 102” shown as “dispersed in bleachers or seating” which captures “image or video of the action” at “multiple different angles” which is then received by “centralized computing device(s)” based on a “coordinated time”) 
generating, by the system, (11:2-26, “collaborative system”) a volumetric video (11:4-26 and 2:30-43, “composited 3-D transformation” which is a stitching to create a composite “video, such as a panorama or a 3-D model of the subject matter”) using the plurality of instances of video of the environment, (11:4-26 and 7:13-36, “multiple images may be rendered as a composited 3-D transformation of the aggregated image data” which is a process of “compositing several related, potentially overlapping images (or video frames) into a single image (or video frame)”), wherein the volumetric video presents the environment in 3-dimensions (3D) (2:30-43, 3:1-13 and Fig. 1-104, “multiple images can be stitched together to create a composite image and/or video, such as a panorama or a 3-D model of the subject matter of the images” such as of a “basketball court 104”) 
	But does not explicitly teach, 
using machine learning is used to process the plurality of instances of video to infer additional instances of video of the environment from other perspectives not captured by the user devices; 
generating a volumetric video using the plurality of instances of video of environment and the additional instances of video of the environment,
includes an interactive feature that allows a viewer of the volumetric video to change perspectives from which the environment is viewed; and 
distributing the volumetric video for consumption by one or more consumers. 
	However, Herman teaches additionally, 
using machine learning (¶21, “image compositor 110 uses feature detection” such as a “convolution neural network” to compared and contrast features in the images) is used to process the plurality of instances of video (¶21, “Using semantic segmentation of current images and past data collections”) to infer additional instances of video of the environment (¶21 and 13, “enable the image compositor 110 to use historical and present day received images to generate the composite image” which would also include “capture one or more images of the host vehicle” captured from other vehicles) from other perspectives not captured by the user devices; (¶21 and 13, “generate the composite image” when occupant of vehicle requests using a “vehicle-to-vehicle (V2V) module” that other vehicles and/or inter-vehicle communication enabled infrastructure modules “capture one or more images of the host vehicle and send the captured images to the host vehicle”)
generating a volumetric video (¶13 and 21, “host vehicle progressively generates a composite image using a process of three-dimensional scene stitching” by “positioning of the feature points and a three dimensional model of the host vehicle 100”) using the plurality of instances of video of environment and the additional instances of video of the environment, (¶13 and 21, generates a composite image using “one or more images of the host vehicle” from other vehicles” and “use historical and present day received images”)
includes an interactive feature (¶24 and Fig. 2, “human machine interface (HMI) 200”) that allows a viewer of the volumetric video to change perspectives from which the environment is viewed; (¶24, “to display and manipulate the composite image 202” where the user “interacts with the HMI 200 via a touch screen interface”)
distributing the volumetric video for consumption by one or more consumers. (¶24 and fig. 2, “HMI 200 is displayed on a console of the infotainment head unit 106. Additionally or alternatively, in some examples, the HMI 200 is displays on a screen of a mobile device via an application executing on the mobile device”)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the collaborative capturing of Baldwin with the inter-vehicle cooperation of Herman which can composite an image of a host using a neural network processed set of received images from multiple sources and using past images. This technique allows the system to gather image from a diverse set of angles, distances, and/or heights. 

Regarding claim 2, Baldwin with Herman teaches the limitation of claim 1,
	Baldwin teaches additionally,
user devices include mobile phones. (2:62-67 and 3:1-13, “user 102 is utilizing a computing device 100 that may include at least one image capturing component 106” which can include “cellular phones”)

Regarding claim 4, Baldwin with Herman teaches the limitation of claim 1,
	Baldwin teaches additionally, 
plurality of instances of video are of a same event occurring within the environment. (4:5-44 and 7:13-36, “centralized computing device(s) of the collaborative system or may be specified by a user creating a collaborative image capturing event” which then commands “coordinated time in which all participating user devices will simultaneously release their shutters to capture an image at substantially the same time” which is then used in “compositing several related, potentially overlapping images (or video frames) into a single image (or video frame)”)

Regarding claim 5, Baldwin with Herman teaches the limitation of claim 1,
Baldwin teaches additionally,
plurality of instances of video are of a same scene within the environment. (7:13-36, “Image registration can be thought of as matching the features in multiple images or otherwise aligning the images that contain overlapping scenes” which is used for “compositing several related, potentially overlapping images (or video frames) into a single image (or video frame)”)

Regarding claim 6, Baldwin with Herman teaches the limitation of claim 1,
Baldwin teaches additionally,
perspective of the user device includes a rotational orientation of the user device with respect to the environment. (7:13-36, “Image registration” can also compensate for “rotation” which can “offset differences in orientation between images,” such as when a user is holding “an image capturing device at an askew angle or when the user is standing on uneven ground” to minimize the differences between an ideal lens model and the lens that was used to capture the image)

Regarding claim 8, Baldwin with Herman teaches the limitation of claim 1,
	Baldwin teaches additionally, 
receiving, by the system in association with the plurality of instances of video, metadata from the plurality of user devices. (6:9-37, when “videos are uploaded” to the centralized data store(s), that is part of the collaborative image capturing system, “metadata associated with the respective images or videos can also be collected”)	 

Regarding claim 10, Baldwin with Herman teaches the limitation of claim 8,
	Baldwin teaches additionally,
metadata received from each user device of the plurality of user devices (6:9-37, “metadata for each image or video” captured from the computing device) indicates an orientation of the user device. (6:9-37, metadata may include “orientation and/or position data such as obtained by a gyroscope, accelerometer, digital compass, or other inertial sensing components of the computing device”)

Regarding claim 14, Baldwin with Herman teaches the limitation of claim 13,
	Herman teaches additionally,
available point of view from which the consumer can view the environment within the volumetric video corresponds to one of the perspectives of the user devices. (¶24 and 25, human machine interface (HMI) 200 to display user selected “orientation of a viewport of the virtual camera” to a “scene that is rendered in the composite image 202” and “image composite adjustment tool 208 may be used to specify a timeframe from which to use received images to generate a sequential set of composite images 202”)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the collaborative capturing of Baldwin with the inter-vehicle cooperation of Herman which can composite an image of a host using a neural network processed set of received images from multiple sources and using past images. This technique allows the system to gather image from a diverse set of angles, distances, and/or heights. 

Regarding claim 15, Baldwin with Herman teaches the limitation of claim 14,
Baldwin teaches additionally,
perspectives of the user devices each include a rotational orientation of the user device with respect to the environment. (7:13-36, “Image registration” can also compensate for “rotation” which can “offset differences in orientation between images,” such as when a user is holding “an image capturing device at an askew angle or when the user is standing on uneven ground” to minimize the differences between an ideal lens model and the lens that was used to capture the image)

Regarding claim 19, it is the method claim of non-transitory computer readable medium claim 1. Refer to the rejection of claim 1 to teach the limitation of claim 19. 

Regarding claim 20, it is the system claim of non-transitory computer readable medium claim 1. 
Baldwin teaches additionally, 
A system (11:4-26, “collaborative system”) comprising:
a non-transitory memory storing instructions; (11:27-49 and Fig.7-704, “executing instructions that can be stored in at least one memory device or element 704”) and 
one or more processors in communication with the non-transitory memory that execute the instructions to perform a method (11:27-49 and Fig.7-704, “central processor 702 for executing instructions that can be stored in at least one memory device or element 704”)
Refer to the rejection of claim 1 to teach the limitation of claim 19. 

Regarding claim 21, Baldwin with Herman teaches the limitation of claim 1,
	Baldwin teaches additionally, 
At least two of the perspectives of the user devices from which the plurality of instances of video (7:13-36, “images” from the “cameras” of the “multiple users” utilizing devices with at least one “image capturing component 106”) are captured include different distances from the environment (7:13-36, “differences in object distance between images”) 
	Herman teaches additionally, 
different distances from the environment that allow the viewer to zoom in or out with respect to the environment. (¶24 and Fig. 2, “HMI 200 includes a zoom tool 204” which facilitates “changing a draw distance by changing the position of the virtual camera”)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the collaborative capturing of Baldwin with the inter-vehicle cooperation of Herman which can composite an image of a host using a neural network processed set of received images from multiple sources and using past images. This technique allows the system to gather image from a diverse set of angles, distances, and/or heights. 

Claim 3 rejected under 35 U.S.C. 103 as being unpatentable over Baldwin; Leo Benedict (US 9264598 B1) in view of Herman; David Michael et al. (US 20190188901 A1) in view of GOSWAMI; NABARUN (US 20180054659 A1)
Regarding claim 3, Baldwin with Herman teaches the limitation of claim 1,
	But does not explicitly teach the additional limitation of claim 3,
	However, Goswami teaches,
user devices (¶26 and fig. 1, “plurality of image-capturing devices 108”) include drones. (¶26 and fig. 1, “plurality of image-capturing devices 108” may include “a drone-camera” configured to “capture video feed from a plurality of FOVs in the pre-defined area 104”)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the collaborative capturing of Baldwin with the inter-vehicle cooperation of Herman with the multi-dimensional video generation of Goswami which uses video capture from drone cameras. The drone cameras would be capable of communicating wirelessly and still be human controlled. 

Claim 7,16 rejected under 35 U.S.C. 103 as being unpatentable over Baldwin; Leo Benedict (US 9264598 B1) in view of Herman; David Michael et al. (US 20190188901 A1) in view of Adachi; Yoshikazu (US 20140015937 A1)
Regarding claim 7, Baldwin with Herman teaches the limitation of claim 1,
	But does not explicitly teach the additional limitation of claim 3,
	However, Adachi teaches,
perspective of the user device (¶46 and Fig. 4, “mobile phone 1”) includes a distance (¶46 and Fig. 4, “mobile phone 1 acquires distance information”) of the user device from the environment.  (¶46 and Fig. 4, “mobile phone 1 acquires distance information (distance from the mobile phone 1, depth information) of a subject”)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the collaborative capturing of Baldwin with the inter-vehicle cooperation of Herman with the electronic device of Adachi which gives a camera phone the ability to acquire distance information. This allows for relating disparity information or determining three-dimensional information for a three-dimensional image. 

Regarding claim 16, Baldwin with Herman teaches the limitation of claim 14,
	But does not explicitly teach the additional limitation of claim 16,
	However, Adachi teaches,
perspectives of the user devices (¶46 and Fig. 4, “mobile phone 1”) each include a distance (¶46 and Fig. 4, “mobile phone 1 acquires distance information”) of the user device from the environment. (¶46 and Fig. 4, “mobile phone 1 acquires distance information (distance from the mobile phone 1, depth information) of a subject”)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the collaborative capturing of Baldwin with the inter-vehicle cooperation of Herman with the electronic device of Adachi which gives a camera phone the ability to acquire distance information. This allows for relating disparity information or determining three-dimensional information for a three-dimensional image. 

Claim 11 rejected under 35 U.S.C. 103 as being unpatentable over Baldwin; Leo Benedict (US 9264598 B1) in view of Herman; David Michael et al. (US 20190188901 A1) in view of Molina; Gabriel D. et al. (US 20200106959 A1)
Regarding claim 11, Baldwin with Herman teaches the limitation of claim 8,
	But does not explicitly teach the additional information of claim 11,
	However, Molina teaches additionally, 
metadata received from each user device (¶35, “camera position and orientation information from motion and position sensing technology of the device 200, may also be captured as metadata 206 for the frames 204”) of the plurality of user devices indicates movement of the user device while the user device is capturing the instance of the video (¶35, as the “device 200 that captures multiple images or video frames (frames 204) automatically as the user moves the device 200”), and wherein different portions of an instance of the video having metadata indicating movement of the user device are correlated with different perspectives at different locations and associated times. (¶35 and 36, “optical flow information, real-time depth estimation, motion detection information, etc., and may be included in metadata 206” for the frames 204 while the user interface guides “user as to where to place or move the device 200 to ensure sufficient data (e.g., a sufficient number of frames to cover the scene)” which is “visual-inertial camera tracking information regarding estimated positions of the frames 206 when captured”)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the collaborative capturing of Baldwin with the inter-vehicle cooperation of Herman with the metadata of Molina which is generated from motion of the device. This information can be used in the process of rendering the image which was captured in motion. 

Claim 9 rejected under 35 U.S.C. 103 as being unpatentable over Baldwin; Leo Benedict (US 9264598 B1) in view of Herman; David Michael et al. (US 20190188901 A1) in view of HUR; Hyejung et al. (US 20200107008 A1)
Regarding claim 9, Baldwin with Herman teaches the limitation of claim 8,
Baldwin teaches the use of image data and metadata to perform various tasks, followed by using image data for compositing into a panoramic image or video which still associates each segment of the collaborative image with a provided metadata, 6:38-67 and 7:1-12.
Baldwin with Herman teaches additionally,
metadata received from each user device of the plurality of user devices indicates a location of the user device. (¶15, “inter-vehicle communication may be combined with technology that facilitates “the vehicles communicating their position, speed, heading, relative position to other objects and to exchange information with others vehicles”) 
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the collaborative capturing of Baldwin with the inter-vehicle cooperation of Herman which can composite an image of a host using a neural network processed set of received images from multiple sources and using past images. This technique allows the system to gather image from a diverse set of angles, distances, and/or heights. 
	But does not explicitly teach the additional limitation of claim 9,
	However, Hur teaches additionally, 
volumetric video is further generated using the metadata. (¶148, “stitcher may receive necessary metadata from the metadata processor and use the metadata for the stitching operation”) 
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the collaborative capturing of Baldwin with the inter-vehicle cooperation of Herman with the metadata of Hur which uses metadata in the stitching process. This can indicate the kind of stitching to do or whether stitching is necessary. 

Claim 17 rejected under 35 U.S.C. 103 as being unpatentable over Baldwin; Leo Benedict (US 9264598 B1) in view of Herman; David Michael et al. (US 20190188901 A1) in view of Oh; Sejin et al. (US 20190379884 A1)
Regarding claim 17, Baldwin with Herman teaches the limitation of claim 14,
	But does not explicitly teach the additional limitation of claim 17,
	However, Oh teaches additionally,
volumetric video provides the consumer with a 360 degree view of the environment surrounding (¶57, “providing 360-degree content in order to provide virtual reality (VR) to users”) the selected point of view. (¶236, ”generate information about the viewpoint of a user's ROI, a viewing position, and a viewing orientation” in a 360-degree video received by a reception unit to “select or extract an ROI”)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the collaborative capturing of Baldwin with the inter-vehicle cooperation of Herman with the 360-degree video of Oh which provides a region of interest from the 360 degree view. This allows for effective use of bandwidth to reconstruct the viewpoints when viewing a video represented in 3D space.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIMMY S LEE whose telephone number is (571)270-7322. The examiner can normally be reached Monday thru Friday 10AM-8PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Joseph G. Ustaris can be reached on (571) 272-7383. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JOSEPH G USTARIS/Supervisory Patent Examiner, Art Unit 2483                                                                                                                                                                                                        

/JIMMY S LEE/Examiner, Art Unit 2483