DETAILED ACTIONNotice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Applicant Response to Official Action
The response filed on 6/17/2022 has been entered and made of record.
Acknowledgment 
Claims 2, 13, and 18, canceled on 6/17/2022, are acknowledged by the examiner. 
Claims 1, 3-6, 10-12, 14-17, and 19, amended on 6/17/2022, are acknowledged by the examiner.  
Claims 21-22, added on 6/17/2022, are acknowledged by the examiner.
Response to Arguments
Applicant’s arguments with respect to claims 1, 12, 17, and their dependent claims have been considered but they are moot in view of the new grounds of rejection necessitated by amendments initiated by the applicant.  Examiner addresses the main arguments of the Applicant as below.
Regarding the 35 U.S.C. 112(b) rejection, the amendment filed on 6/17/2022 addresses the issue.  As a result, the 35 U.S.C. 112(b) rejection is withdrawn.
  
Objections 
Claim 12 is objected.  The claim limitation “wherein the computer-executable instructions further include instructions for ---” should be read “wherein the computer-executable instructions further include instructions for”.  An appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.   
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under pre-AIA  35 U.S.C. 103(a) are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
            This application currently names joint inventors. In considering patentability of the claims under pre-AIA  35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA  35 U.S.C. 103(c) and potential pre-AIA  35 U.S.C. 102(e), (f) or (g) prior art under pre-AIA  35 U.S.C. 103(a).

Claims 1, 4-5, 7, 9, 12, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Hong (US Patent 10,121,064 B2), (“Hong”), in view of Claret et al. (US Patent 10,832,429 B2), (“Claret”).

Regarding claim 1, Hong meets the claim limitations, as follows:
A method (i.e. Systems and methods) [Hong: col. 2, line 12] of determining a depth of a scene (i.e. The behavioral classification system 100 includes a imaging system 102 that is capable of capturing image data including depth information. Depth information typically refers to a measurement of a distance from a reference viewpoint to one point or many points in a scene) [Hong: col. 10, line 23-28], the method (i.e. methods) [Hong: col. 2, line 12] comprising: capturing depth data of the scene (i.e. capturing image data including depth information) [Hong: col. 10, line 25-26] with a depth sensor (i.e. depth sensor is used to acquire depth information) [Hong: col. 10, line 30-31]; capturing image data of the scene ((i.e. a camera is used to acquire images of one or more subjects) [Hong: col. 10, line 29-30]; (i.e. identifies (508) the subjects and their locations within the imaged scene) [Hong: col. 17, line 58-59]) with a plurality of cameras ((i.e. Many of the behavioral classification systems described above synchronize image data captured by one or more cameras and one or more depth sensors) [Hong: col. 30, line 61-63]; (i.e. image data captured by one or more conventional video cameras, as well as depth sensors) [Hong: col. 10, line 11-13]; (i.e. Other depth sensors could be utilized to obtain depth information such as, but not limited to, a plurality of cameras configured to capture information in color channels including the near-IR color channel in a multiview stereo configuration in combination with an illumination source configured to project texture onto the scene.) [Hong: col. 16, line 30-36]); generating a point cloud representative of the scene based on the depth data ((i.e. point clouds or meshes generated using image data) [Hong: col. 20, line 23-24]; (i.e. image data including depth information) [Hong: col. 10, line 25-26]); identifying a missing region of the point cloud ((i.e. identify a region or regions within the image data) [Hong: col. 18, line 4-5]; (i.e. detects contours of objects) [Hong: col. 16, line 21-22]; (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]) in which the point cloud includes no data or sparse data (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]; (i.e. skeleton representations) [Hong: col. 18, line 67]); generating depth data for the missing region (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19] based on the image data ((i.e. During image data capture, data is acquired synchronously by all three devices to produce simultaneous depth information and top and side view grayscale videos) [Hong: col. 15, line 52-54; Figs. 3D-9]; (i.e. video recordings from the top view camera are projected into the viewpoint of the depth information captured by the depth sensor to create a common coordinate framework) [Hong: col. 17, line 25-28]; (i.e. Compute the average depth ZiR(t) within a square region) [Hong: col. 21, line 46]); and merging the depth data ((i.e. combines information from the video and depth camera recordings) [Hong: col. 16, line 66-67]; (i.e. Systems and methods in accordance with various embodiments of the invention utilize integrated hardware and software systems that combine video tracking, depth sensing, machine vision and machine learning, for automatic detection and quantification) [Hong: col. 1, line 60-65]; (i.e. classification can be performed based upon raw image data, detected pose and raw 3D trajectory information, and/or any combination of raw data, pose data, trajectory data, and/or parameters appropriate to the requirements of a specific application) [Hong: col. 13, line 44-48]; (i.e. the average depth ZiR(t) within a square region) [Hong: col. 21, line 46]; (i.e. As is discussed further below, the use of depth information as an additional modality in combination with conventional video data can significantly enhance the accuracy and robustness of automated behavioral classification processes) [Hong: col. 11, line 13-16]) for the missing region (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19] with the depth data from the depth sensor (i.e. a behavioral classification system in accordance with an embodiment of the invention was constructed that records behavior using synchronized conventional video cameras and a time-of-flight depth sensor) [Hong: col. 15, line 36-38; Figs. 3A-3D] to generate a merged point cloud representative of the scene ((i.e. Depth information can be obtained using any of a variety of depth sensors including (but not limited to) a time of flight depth sensor, a structured illumination depth sensor, a Light Detection and Ranging (LIDAR) sensor, a Sound Navigation and Ranging (SONAR) sensor, an array of two or more conventional cameras in a multiview stereo configuration, and/or an array of two or more conventional cameras in a multiview stereo configuration in combination with an illumination source that projects texture onto a scene to assist with parallax depth information recovery on otherwise textureless surfaces) [Hong: col. 10, line 44-54; Figs. 3A-9]; (i.e. In the illustrated experimental apparatus, the top view camera 304 and the depth sensor 306 are mounted as close together as possible (see FIG. 3D) to limit occlusions (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera). In the illustrated embodiment, the depth sensor is a time-of-flight depth sensor that includes an IR illumination source 320 and an IR camera 322 and detects contours of objects in the depth or z-plane by measuring the time-of flight of an infrared light signal generated by the IR illumination source 320 between the depth sensor and object surfaces for each point of the depth image generated by the time-of-flight depth sensor, in a manner analogous to SONAR) [Hong: col. 16, line 13-27; Figs. 3A-3D]; (i.e. As is discussed further below, the use of depth information as an additional modality in combination with conventional video data can significantly enhance the accuracy and robustness of automated behavioral classification processes) [Hong: col. 11, line 13-16]).
Hong does not explicitly disclose the following claim limitations (Emphasis added).
A method of determining a depth of a scene, the method comprising: capturing depth data of the scene with a depth sensor; capturing image data of the scene with a plurality of cameras; generating a point cloud representative of the scene based on the depth data; identifying a missing region of the point cloud in which the point cloud includes no data or sparse data; generating depth data for the missing region based on the image data; and merging the depth data for the missing region with the depth data from the depth sensor to generate a merged point cloud representative of the scene.      
However, in the same field of endeavor Claret further discloses the claim limitations and the deficient claim limitations, as follows:
((i.e. a sparse depth map showing three objects at different depths) [Claret: col. 13, line 14-15; Fig. 8]; (i.e. the sparse depth map obtained in the previous stage contains a lot of empty positions (dx,dy )) [Claret: col. 26, line 15-17]); generating depth data for the missing region based on the image data ((i.e. a sparse depth map showing three objects at different depths) [Claret: col. 13, line 14-15; Fig. 8]; (i.e. The method may comprise an additional stage to generate a sparse depth map considering the slope of the epipolar lines obtained in the previous stage. The sparse depth map is obtained by assigning depth values (dz) of objects in the real world to the edges calculated before (dx dy).) [Claret: col. 24, line 42-46]); and merging the depth data for the missing region ((i.e. a sparse depth map showing three objects at different depths) [Claret: col. 13, line 14-15; Fig. 8]; (i.e. The method may comprise an additional stage to generate a sparse depth map considering the slope of the epipolar lines obtained in the previous stage. The sparse depth map is obtained by assigning depth values (dz) of objects in the real world to the edges calculated before (dx dy).) [Claret: col. 24, line 42-46]) 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Claret to use sparse depth map representation in the application. 
Therefore, the combination of Hong with Claret will enable the system to show objects in different depths [Claret: col. 13, line 14-15; col. 24, line 42-46; Fig. 8].

Regarding claim 4, Hong meets the claim limitations as set forth in claim 1.Hong further meets the claim limitations as follow.
The method of claim 1 (i.e. methods) [Hong: col. 2, line 12] wherein wherein identifying the missing region of the point cloud includes identifying (i.e. identify a region or regions within the image data) [Hong: col. 18, line 4-5] a hole in the point cloud (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19] that is larger than a user-defined threshold.    
Hong does not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 1 wherein identifying the missing region of the point cloud includes identifying a hole in the point cloud that is larger than a user-defined threshold.  .     
However, in the same field of endeavor Claret further discloses the claim limitations and the deficient claim limitations, as follows:
wherein identifying the missing region of the point cloud includes identifying a hole in the point cloud that is larger than a user-defined threshold  ((i.e. incomplete disparity
maps (holes produced by occlusions where it is not possible to find the same object point in both images) or have depth discontinuity regions where disparities among neighbouring pixels have experienced gaps larger than one pixel (in stereo
vision, when a depth map is estimated, inaccuracies accumulate over the calculation of disparities among corresponding points at subpixel level; at some point, these inaccuracies may be greater than a pixel, causing a gap between two consecutive points and leaving a point with no depth estimation)) [Claret: col. 2, line 61 – col. 3, line 4]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Claret to use sparse depth map representation in the application. 
Therefore, the combination of Hong with Claret will enable the system to show objects in different depths [Claret: col. 13, line 14-15; col. 24, line 42-46; Fig. 8].

Regarding claim 5, Hong meets the claim limitations as set forth in claim 1.Hong further meets the claim limitations as follow.
The method of claim 1 (i.e. methods) [Hong: col. 2, line 12] wherein generating the depth data for the missing region ((i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]; (i.e. occludes at least a portion of the secondary subject) [Hong: col. 31, line 54-55])  is further based on a portion of the depth data captured by the depth sensor  (i.e. depth sensor is used to acquire depth information) [Hong: col. 10, line 30-31] that surrounds the missing region. 
Hong does not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 1 wherein generating the depth data for the missing region is further based on a portion of the depth data captured by the depth sensor that surrounds the missing region.     
However, in the same field of endeavor Claret further discloses the claim limitations and the deficient claim limitations, as follows:
that surrounds the missing region ((i.e. detecting the slope of the lines of an epipolar image it is possible to generate a depth map of the scene) [Claret: col. 4, line 52-54]; (i.e. the pixels that form a valid epipolar line (510, 511) within an epipolar image, must necessarily be in neighbouring positions (i.e. the points that form a valid epipolar line must be connected)) [Claret: col. 19, line 55-58] ; (i.e. In a preferred embodiment only the neighbouring positions are considered when looking for edges in an epipolar image to form a valid epipolar line (starting from the central pixel detected as edge, the arrows in FIGS. 6A-6C represent the neighbouring positions which are considered for determining the connected edge pixels that form the epipolar line)) [Claret: col. 19, line 64 – col. 20, line 3]; (i.e. When a valid epipolar line is detected, the slope of this line is computed. This slope value may be then directly converted into a depth value, since there is a direct relation between slopes and distance values. Once the slopes of the analysed epipolar lines are calculated, according to an embodiment the output of the method is a sparse two-dimensional depth map containing the depth values (dz) of the edges of the objects of the scene captured by a plenoptic camera. The coordinates ( dx,dy) of the depth map indicate the lateral position of the corresponding object points (i.e. the two-dimensional coordinates of the object world), whereas the depth values (dz) represent the depth of the corresponding coordinates (dx,dy) in the object world.) [Claret: col. 24, line 25-37]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Claret to use sparse depth map representation in the application. 
Therefore, the combination of Hong with Claret will enable the system to show objects in different depths [Claret: col. 13, line 14-15; col. 24, line 42-46; Fig. 8].

Regarding claim 7, Hong meets the claim limitations as set forth in claim 1.Hong further meets the claim limitations as follow.
The method of claim 1 (i.e. Systems and methods) [Hong: col. 2, line 12] wherein
the method further comprises (i.e. Systems and methods) [Hong: col. 2, line 12]  generating a three-dimensional mesh representative of the scene (i.e. point clouds or meshes generated using image data) [Hong: col. 20, line 23-24] based on the merged point cloud ((i.e. Depth information can be obtained using any of a variety of depth sensors including (but not limited to) a time of flight depth sensor, a structured illumination depth sensor, a Light Detection and Ranging (LIDAR) sensor, a Sound Navigation and Ranging (SONAR) sensor, an array of two or more conventional cameras in a multiview stereo configuration, and/or an array of two or more conventional cameras in a multiview stereo configuration in combination with an illumination source that projects texture onto a scene to assist with parallax depth information recovery on otherwise textureless surfaces) [Hong: col. 10, line 44-54; Figs. 3A-9]; (i.e. In the illustrated experimental apparatus, the top view camera 304 and the depth sensor 306 are mounted as close 15 together as possible (see FIG. 3D) to limit occlusions (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera). In the illustrated embodiment, the depth sensor is a time-of-flight depth sensor that includes an IR illumination source 320 and an IR camera 322 and detects contours of objects in the depth or z-plane by measuring the time-of flight of an infrared light signal generated by the IR illumination source 320 between the depth sensor and object surfaces for each point of the depth image generated by the time-of-flight depth sensor, in a manner analogous to SONAR) [Hong: col. 16, line 13-27; Figs. 3A-3D]).  

Regarding claim 9, Hong meets the claim limitations as set forth in claim 1.Hong further meets the claim limitations as follow.
The method of claim 1 (i.e. Systems and methods) [Hong: col. 2, line 12] wherein
the plurality of cameras ((i.e. Many of the behavioral classification systems described above synchronize image data captured by one or more cameras and one or more depth sensors) [Hong: col. 30, line 61-63]; (i.e. image data captured by one or more conventional video cameras, as well as depth sensors) [Hong: col. 10, line 11-13]; (i.e. Other depth sensors could be utilized to obtain depth information such as, but not limited to, a plurality of cameras configured to capture information in color channels including the near-IR color channel in a multiview stereo configuration in combination with an illumination source configured to project texture onto the scene.) [Hong: col. 16, line 30-36]) each have a different position and orientation relative to the scene ((i.e. a plurality of cameras in a multiview stereo configuration) [Hong: col. 4, line 5; Figs. 3A-8F]; (i.e. an array of two or more conventional cameras in a multiview stereo configuration, and/or an array of two or more conventional cameras in a multiview stereo configuration in combination with an illumination source that projects texture onto a scene to assist with parallax depth information recovery on otherwise textureless surfaces) [Hong: col. 10, line 50-54; Figs. 3A-9]; (i.e. The behavioral classification system illustrated in FIGS. 3A-3D can be used to track animal trajectories and orientations in 3D in the context of an animal's home cage and detect specific social behaviors, including attack, mounting and close investigation in different orientations (head-to-head, head-to-tail, head-to-side, etc).) [Hong: col. 16, line 40-45]; (i.e. position and pose information is passed through a set of feature extractors to obtain a low-dimensional representation from which machine learning algorithms can be used to train classifiers to detect specific behaviors. In other embodiments, the raw position and pose information can be passed directly to the classifier) [Hong: col. 13, line 8-13]), and wherein the image data  (i.e. image data captured by one or more conventional video cameras, as well as depth sensors) [Hong: col. 10, line 11-13] is light field image data.  
Hong does not explicitly disclose the following claim limitations (Emphasis added).  
The method of claim 1 wherein the plurality of cameras each have a different position and orientation relative to the scene, and wherein the image data is light field image data.
However, in the same field of endeavor Claret further discloses the claim limitations and the deficient claim limitations, as follows:
wherein the image data is light field image data ((i.e. Once the plenoptic camera has captured the light field and the conventional cameras the corresponding images 1412, the epipolar images of the plenoptic camera light field are analysed) [Claret: col. 42, line 12-15; Fig. 8]; (i.e. generating a plurality of epipolar images from a light field captured by a light field acquisition device) [Claret: col. 44, line 20-21]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Claret to use the light field technology in the system. 
Therefore, the combination of Hong with Claret will enable the system to obtaining depth information from a light field [Claret: col. 1, line 24-25; col. 44, line 18-21].

Regarding claim 12, Hong meets the claim limitations, as follows:
A system (i.e. Systems and methods) [Hong: col. 2, line 12] for imaging a scene (i.e. The behavioral classification system 100 includes a imaging system 102 that is capable of capturing image data including depth information. Depth information typically refers to a measurement of a distance from a reference viewpoint to one point or many points in a scene) [Hong: col. 10, line 23-28], comprising: multiple cameras ((i.e. Many of the behavioral classification systems described above synchronize image data captured by one or more cameras and one or more depth sensors) [Hong: col. 30, line 61-63]; (i.e. image data captured by one or more conventional video cameras, as well as depth sensors) [Hong: col. 10, line 11-13]; (i.e. a plurality of cameras configured to capture information in color channels including the near-IR color channel in a multiview stereo configuration in combination with an illumination source configured to project texture onto the scene.) [Hong: col. 16, line 32-36]) arranged at different positions and orientations relative to the scene ((i.e. a plurality of cameras in a multiview stereo configuration) [Hong: col. 4, line 5; Figs. 3A-8F]; (i.e. an array of two or more conventional cameras in a multiview stereo configuration, and/or an array of two or more conventional cameras in a multiview stereo configuration in combination with an illumination source that projects texture onto a scene to assist with parallax depth information recovery on otherwise textureless surfaces) [Hong: col. 10, line 50-54; Figs. 3A-9]; (i.e. The behavioral classification system illustrated in FIGS. 3A-3D can be used to track animal trajectories and orientations in 3D in the context of an animal's home cage and detect specific social behaviors, including attack, mounting and close investigation in different orientations (head-to-head, head-to-tail, head-to-side, etc).) [Hong: col. 16, line 40-45; Figs. 3A-8F]; (i.e. position and pose information is passed through a set of feature extractors to obtain a low-dimensional representation from which machine learning algorithms can be used to train classifiers to detect specific behaviors. In other embodiments, the raw position and pose information can be passed directly to the classifier) [Hong: col. 13, line 8-13]) and configured to capture image data of the scene ((i.e. a camera is used to acquire images of one or more subjects) [Hong: col. 10, line 29-30]; (i.e. identifies (508) the subjects and their locations within the imaged scene) [Hong: col. 17, line 58-59]); a depth sensor (i.e. depth sensor) [Hong: col. 10, line 30] configured to capture depth data about a depth of the scene ((i.e. depth sensor is used to acquire depth information) [Hong: col. 10, line 30-31]; (i.e. capturing image data including depth information) [Hong: col. 10, line 25-26]); and a computing device (i.e. a microprocessor) [Hong: col. 31, line 38] communicatively coupled to the cameras and the depth sensor ((i.e. a plurality of 3D imaging systems and a behavioral classification computer system including at least one memory and at least one microprocessor directed by at least a classification application stored in the at least one memory to: control the plurality of 3D imaging systems to each acquire a sequence of frames of image data including depth information; and store at least a portion of each of the sequences of frames of image data including depth information in the at least one memory.) [Hong: col. 5, line 4-14; Fig. 1]; (i.e. the 3D imaging system is selected from the group consisting of: a time of flight depth sensor and at least one camera; a structured light depth sensor and at least one camera; a LIDAR depth sensor and at least one camera; a SONAR depth sensor and at least one camera; a plurality of cameras in a multiview stereo configuration; and a plurality of cameras in multiview stereo configuration and an illumination source that projects texture) [Hong: col. 3, line 66 – col. 4, line 8; Fig. 1]), wherein the computing device (i.e. a microprocessor) [Hong: col. 31, line 38] has a memory containing computer-executable instructions and a processor for executing the computer-executable instructions contained in the memory (i.e. Machine readable instructions stored in memory 108 can be used to control the operations performed by the processor 106) [Hong: col. 11, line 63-65], and wherein the computer-executable instructions include instructions (i.e. Machine readable instructions) [Hong: col. 11, line 63] for receiving the image data from the cameras ((i.e. The computer system includes a processor 106 that receives image data including depth information from the imaging system) [Hong: col. 11, line 35-37]; (i.e. the classification application further directs the microprocessor to: control the 3D imaging system to acquire the sequence of frames of image data including depth information and video image data in at least one color channel; and store the sequence of frames of image data including depth information in memory) [Hong: col. 3, line 60-65]); receiving the depth data from the depth sensor ((i.e. receives image data including depth information from the imaging system) [Hong: col. 11, line 35-37]; (i.e. acquire the sequence of frames of image data including depth information and video image data in at least one color channel) [Hong: col. 3, line 52-64]); generating a point cloud representative of the scene based on the depth data ((i.e. point clouds or meshes generated using image data) [Hong: col. 20, line 23-24]; (i.e. image data including depth information) [Hong: col. 10, line 25-26]); identifying a missing region of the point cloud ((i.e. identify a region or regions within the image data) [Hong: col. 18, line 4-5]; (i.e. detects contours of objects) [Hong: col. 16, line 21-22] ; (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]) in which the point cloud includes no data or sparse data ((i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]; (i.e. skeleton representations) [Hong: col. 18, line 67]); generating depth data for the missing region based on the image data  ((i.e. During image data capture, data is acquired synchronously by all three devices to produce simultaneous depth information and top and side view grayscale videos) [Hong: col. 15, line 52-54; Figs. 3D-9]; (i.e. video recordings from the top view camera are projected into the viewpoint of the depth information captured by the depth sensor to create a common coordinate framework) [Hong: col. 17, line 25-28]; ((i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]; (i.e. skeleton representations) [Hong: col. 18, line 67]; (i.e. Compute the average depth ZiR(t) within a square region) [Hong: col. 21, line 46]); and -24- 151021760.1Attorney Docket No.: 134429-8008.US01 merging the depth data ((i.e. combines information from the video and depth camera recordings) [Hong: col. 16, line 66-67]; (i.e. Systems and methods in accordance with various embodiments of the invention utilize integrated hardware and software systems that combine video tracking, depth sensing, machine vision and machine learning, for automatic detection and quantification of social behaviors.) [Hong: col. 1, line 60-65]; (i.e. As is discussed further below, the use of depth information as an additional modality in combination with conventional video data can significantly enhance the accuracy and robustness of automated behavioral classification processes) [Hong: col. 11, line 13-16]; (i.e. classification can be performed based upon raw image data, detected pose and raw 3D trajectory information, and/or any combination of raw data, pose data, trajectory data, and/or parameters appropriate to the requirements of a specific application) [Hong: col. 13, line 44-48]; (i.e. Compute the average depth ZiR(t) within a square region) [Hong: col. 21, line 46]) for the missing region ((i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]; (i.e. skeleton representations) [Hong: col. 18, line 67]) with the depth data from the depth sensor (i.e. a behavioral classification system in accordance with an embodiment of the invention was constructed that records behavior using synchronized conventional video cameras and a time-of-flight depth sensor) [Hong: col. 15, line 36-38; Figs. 3A-3D] to generate a merged point cloud representative of the scene ((i.e. Depth information can be obtained using any of a variety of depth sensors including (but not limited to) a time of flight depth sensor, a structured illumination depth sensor, a Light Detection and Ranging (LIDAR) sensor, a Sound Navigation and Ranging (SONAR) sensor, an array of two or more conventional cameras in a multiview stereo configuration, and/or an array of two or more conventional cameras in a multiview stereo configuration in combination with an illumination source that projects texture onto a scene to assist with parallax depth information recovery on otherwise textureless surfaces) [Hong: col. 10, line 44-54; Figs. 3A-9]; (i.e. In the illustrated experimental apparatus, the top view camera 304 and the depth sensor 306 are mounted as close 15 together as possible (see FIG. 3D) to limit occlusions (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera). In the illustrated embodiment, the depth sensor is a time-of-flight depth sensor that includes an IR illumination source 320 and an IR camera 322 and detects contours of objects in the depth or z-plane by measuring the time-of flight of an infrared light signal generated by the IR illumination source 320 between the depth sensor and object surfaces for each point of the depth image generated by the time-of-flight depth sensor, in a manner analogous to SONAR) [Hong: col. 16, line 13-27; Figs. 3A-3D] ]; (i.e. As is discussed further below, the use of depth information as an additional modality in combination with conventional video data can significantly enhance the accuracy and robustness of automated behavioral classification processes) [Hong: col. 11, line 13-16] ; (i.e. classification can be performed based upon raw image data, detected pose and raw 3D trajectory information, and/or any combination of raw data, pose data, trajectory data, and/or parameters appropriate to the requirements of a specific application) [Hong: col. 13, line 44-48]).
Hong does not explicitly disclose the following claim limitations (Emphasis added).
A system for imaging a scene, comprising: multiple cameras arranged at different positions and orientations relative to the scene and configured to capture image data of the scene; a depth sensor configured to capture depth data about a depth of the scene; and a computing device communicatively coupled to the cameras and the depth sensor, wherein the computing device has a memory containing computer-executable instructions and a processor for executing the computer-executable instructions contained in the memory, and wherein the computer-executable instructions include instructions for receiving the image data from the cameras; receiving the depth data from the depth sensor; generating a point cloud representative of the scene based on the depth data; identifying a missing region of the point cloud in which the point cloud includes no data or sparse data; generating depth data for the missing region based on the image data; and merging the depth data for the missing region with the depth data from the depth sensor to generate a merged point cloud representative of the scene.        
However, in the same field of endeavor Claret further discloses the claim limitations and the deficient claim limitations, as follows:
((i.e. a sparse depth map showing three objects at different depths) [Claret: col. 13, line 14-15; Fig. 8]; (i.e. the sparse depth map obtained in the previous stage contains a lot of empty positions (dx,dy )) [Claret: col. 26, line 15-17]); generating depth data for the missing region based on the image data ((i.e. a sparse depth map showing three objects at different depths) [Claret: col. 13, line 14-15; Fig. 8]; (i.e. The method may comprise an additional stage to generate a sparse depth map considering the slope of the epipolar lines obtained in the previous stage. The sparse depth map is obtained by assigning depth values (dz) of objects in the real world to the edges calculated before (dx dy).) [Claret: col. 24, line 42-46]); and merging the depth data for the missing region ((i.e. a sparse depth map showing three objects at different depths) [Claret: col. 13, line 14-15; Fig. 8]; (i.e. The method may comprise an additional stage to generate a sparse depth map considering the slope of the epipolar lines obtained in the previous stage. The sparse depth map is obtained by assigning depth values (dz) of objects in the real world to the edges calculated before (dx dy).) [Claret: col. 24, line 42-46]) 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Claret to use sparse depth map representation in the application. 
Therefore, the combination of Hong with Claret will enable the system to show objects in different depths [Claret: col. 13, line 14-15; col. 24, line 42-46; Fig. 8].

Regarding claim 14, Hong meets the claim limitations as set forth in claim 12.Hong further meets the claim limitations as follow.
The system of claim 12 (i.e. Systems and methods) [Hong: col. 2, line 12] wherein the missing region (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]; (i.e. skeleton representations) [Hong: col. 18, line 67]) of the point cloud is user-selected ((i.e. data is transferred to cloud storage and may be processed either immediately upon uploading or at a later time by a cloud service that makes the results of the analysis available to a user that maintains an account with the cloud service) [Hong: col. 11, line 46-50]; (i.e. the machine learning application 116 utilizes supervised learning to train one or more of the classifiers and generates an interactive user interface (or offloads the recorded image data to a cloud service that generates in interactive user interface) to prompt a user to annotate one or more sequences of image data to continuously expand a training data set of ground truth data for the purposes of training the one or more classifiers.) [Hong: col. 12, line 33-41]).

Claims 3 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Hong (US Patent 10,121,064 B2), (“Hong”), in view of Claret et al. (US Patent 10,832,429 B2), (“Claret”), in view of Stenger et al. (US Patent 10,097,813 B2), (“Stenger”).

Regarding claim 3, Hong meets the claim limitations as set forth in claim 1.Hong further meets the claim limitations as follow.
The method of claim 1 (i.e. Systems and methods) [Hong: col. 2, line 12] wherein identifying the missing region of the point cloud includes determining (i.e. identify a region or regions within the image data) [Hong: col. 18, line 4-5] that the missing region of the point cloud ((i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]; (i.e. occludes at least a portion of the secondary subject) [Hong: col. 31, line 54-55]) has fewer than a predetermined threshold number of data points (i.e. points in the scene) [Hong: col. 10, line 28].  
Hong and Claret do not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 1 wherein identifying the missing region of the point cloud includes determining that the missing region of the point cloud has fewer than a predetermined threshold number of data points.     
However, in the same field of endeavor Stenger further discloses the claim limitations and the deficient claim limitations, as follows:
the missing region of the point cloud has fewer than a predetermined threshold number of data point ((i.e. the same image is generated using the hypothesized matrix Vp along with the already calculated normal map and lighting directions, n and
L. A pixel supports the hypothesized matrix Vp if it's value in the two synthesized images is sufficiently similar, mathematically if |cf – (VpL)n| < T, where T is a threshold value and cf is the pixel value from If) [Stenger: col. 9, line 28-36]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong and Claret with Stenger to implement photometric stereo method for capturing 3D surface geometry of Stenger. 
Therefore, the combination of Hong and Claret with Stenger will enable the system to produce depth maps with more details [Stenger: col. 2, line 61 – col. 3, line 6].

Regarding claim 6, Hong meets the claim limitations as set forth in claim 1.Hong further meets the claim limitations as follow.
The method of claim 1 (i.e. Systems and methods) [Hong: col. 2, line 12] wherein the depth data for the missing region ((i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]; (i.e. FIG. 7C shows raw monochrome image data acquired by a top view camera) [Hong: col. 7, line 11-12; Fig. 7C]) has a greater resolution than the depth data captured with the depth sensor (i.e. FIG. 7A shows raw depth image data acquired by a depth sensor) [Hong: col. 7, line 4-5; Fig. 7A].  
Hong and Claret do not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 1 wherein the depth data for the missing region has a greater resolution than the depth data captured with the depth sensor.    
However, in the same field of endeavor Stenger further discloses the claim limitations and the deficient claim limitations, as follows:
wherein the depth data for the missing region has a greater resolution than the depth data captured with the depth sensor ((i.e. Typically, the depth sensor will produce a depth map with rather lower frequency resolution in 2D Fourier space than that produced from first video camera in combination the three light sources. The first video camera in combination with the three light sources operates together to produce a normal field using so-called photometric stereo methods. Such methods generally produce images with good high frequency resolution in 2D Fourier space. In other words, they produce a normal field which can be converted into a depth map with a lot of detail of the scene being imaged. In an embodiment, a depth sensor will be used which produces a depth map with lower frequency resolution in 2-D Fourier space than that produced by a photometric stereo method) [Stenger: col. 2, line 61 – col. 3, line 6]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong and Claret with Stenger to implement photometric stereo method for capturing 3D surface geometry of Stenger. 
Therefore, the combination of Hong and Claret with Stenger will enable the system to produce depth maps with more details [Stenger: col. 2, line 61 – col. 3, line 6].

Claims 8, 10, 15, and 17-22 are rejected under 35 U.S.C. 103 as being unpatentable over Hong (US Patent 10,121,064 B2), (“Hong”), in view of Claret et al. (US Patent 10,832,429 B2), (“Claret”), in view of Srimohanarajah et al. (US Patent 11,045,257 B2), (“Srimohanarajah”).

Regarding claim 8, Hong meets the claim limitations as set forth in claim 1.Hong further meets the claim limitations as follow.
The method of claim 1 (i.e. Systems and methods) [Hong: col. 2, line 12] The method of claim 1 wherein the scene (i.e. identifies (508) the subjects and their locations within the imaged scene) [Hong: col. 17, line 58-59] is a surgical scene.
Hong and Claret do not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 1 wherein the scene is a surgical scene.
However, in the same field of endeavor Srimohanarajah further discloses the claim limitations and the deficient claim limitations, as follows:
wherein the scene is a surgical scene (i.e. the operation room) [Srimohanarajah: col. 2, line 35; Fig. 2].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Srimohanarajah to install the 3D scanner system and a camera of the medical navigation system into the imaging system. 
Therefore, the combination of Hong with Srimohanarajah will enable the system to generate 3D scanning medical data of the patient [Srimohanarajah: col. 3, line 27-39; Fig. 2; col. 10, line 53-65; col. 16, line 21-23; Fig. 9]. 

Regarding claim 10, Hong meets the claim limitations as set forth in claim 1.Hong further meets the claim limitations as follow.
The method of claim 1 (i.e. Systems and methods) [Hong: col. 2, line 12] wherein the method further comprises: processing the image data (i.e. The captured image data that includes depth information is then analyzed via an image processing pipeline) [Hong: col. 2, line 4-6] and the merged point cloud ((i.e. combines information from the video and depth camera recordings) [Hong: col. 16, line 66-67]; (i.e. Systems and methods in accordance with various embodiments of the invention utilize integrated hardware and software systems that combine video tracking, depth sensing, machine vision and machine learning, for automatic detection and quantification of social behaviors.) [Hong: col. 1, line 60-65]; (i.e. As is discussed further below, the use of depth information as an additional modality in combination with conventional video data can significantly enhance the accuracy and robustness of automated behavioral classification processes) [Hong: col. 11, line 13-16]) to synthesize an output image of the scene corresponding to a virtual camera perspective; and transmitting the output image (i.e. to transfer data captured by one or more imaging systems) [Hong: col. 12, line 62-63] to a display for display (i.e. a variety of output devices including (but not limited to) a heads up display) [Hong: col. 30, line 34-35] to a user (i.e. the machine learning application 116 utilizes supervised learning to train one or more of the classifiers and generates an interactive user interface (or offloads the recorded image data to a cloud service that generates in interactive user interface) to prompt a user to annotate one or more sequences of image data to continuously expand a training data set of ground truth data for the purposes of training the one or more classifiers.) [Hong: col. 12, line 33-41].
Hong and Claret do not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 1 wherein the method further comprises: processing the image data and the merged point cloud to synthesize an output image of the scene corresponding to a virtual camera perspective; and transmitting the output image to a display for display to a user.
However, in the same field of endeavor Srimohanarajah further discloses the claim limitations and the deficient claim limitations, as follows:
processing the image data and the merged point cloud to synthesize an output image of the scene corresponding to a virtual camera perspective (i.e. Referring now to FIG. 5, a registration process, similar to that which may be used in block 456 of FIG. 4B, is shown for creating a common coordinate space composed of amalgamated virtual and actual coordinate spaces. The common coordinate space may be composed of both an actual coordinate space and a virtual coordinate space, where the actual coordinate space contains actual objects that exist in space and the virtual coordinate space contains virtual objects that are generated in a virtual space. The common coordinate space containing both the aforementioned actual and virtual objects may be produced) [Srimohanarajah: col. 9, line 21-31; Fig. 5].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Srimohanarajah to install the 3D scanner system and a camera of the medical navigation system into the imaging system. 
Therefore, the combination of Hong with Srimohanarajah will enable the system to generate 3D scanning medical data [Srimohanarajah: col. 10, line 53-65; col. 16, line 21-23; Fig. 9]. 

Regarding claim 15, Hong meets the claim limitations as set forth in claim 12.Hong further meets the claim limitations as follow.
The system of claim 12 (i.e. Systems and methods) [Hong: col. 2, line 12] further comprising a display (i.e. a variety of output devices including (but not limited to) a heads up display) [Hong: col. 30, line 34-35], wherein the computing device (i.e. a microprocessor) [Hong: col. 31, line 38] is communicatively coupled to the display ((i.e. a variety of output devices including (but not limited to) a heads up display) [Hong: col. 30, line 34-35]; (i.e. Still another further embodiment again also includes an output device, where the classification application further directs the microprocessor to generate an alert via the output device) [Hong: col. 6, line 9-12]), and wherein the computer-executable instructions further include instructions (i.e. Machine readable instructions stored in memory 108 can be used to control the operations performed by the processor 106) [Hong: col. 11, line 63-65] for:processing the image data (i.e. The captured image data that includes depth information is then analyzed via an image processing pipeline) [Hong: col. 2, line 4-6] and the merged point cloud ((i.e. combines information from the video and depth camera recordings) [Hong: col. 16, line 66-67]; (i.e. Systems and methods in accordance with various embodiments of the invention utilize integrated hardware and software systems that combine video tracking, depth sensing, machine vision and machine learning, for automatic detection and quantification of social behaviors.) [Hong: col. 1, line 60-65]; (i.e. As is discussed further below, the use of depth information as an additional modality in combination with conventional video data can significantly enhance the accuracy and robustness of automated behavioral classification processes) [Hong: col. 11, line 13-16]) to synthesize an output image of the scene corresponding to a virtual camera perspective; and transmitting the output image (i.e. to transfer data captured by one or more imaging systems) [Hong: col. 12, line 62-63] to the display for display (i.e. a variety of output devices including (but not limited to) a heads up display) [Hong: col. 30, line 34-35] to a user (i.e. the machine learning application 116 utilizes supervised learning to train one or more of the classifiers and generates an interactive user interface (or offloads the recorded image data to a cloud service that generates in interactive user interface) to prompt a user to annotate one or more sequences of image data to continuously expand a training data set of ground truth data for the purposes of training the one or more classifiers.) [Hong: col. 12, line 33-41].
Hong and Claret do not explicitly disclose the following claim limitations (Emphasis added).
The system of claim 12, further comprising a display, wherein the computing device is communicatively coupled to the display, and wherein the computer-executable instructions further include instructions for: processing the image data and the merged point cloud to synthesize an output image of the scene corresponding to a virtual camera perspective; and transmitting the output image to the display for display to a user.
However, in the same field of endeavor Srimohanarajah further discloses the claim limitations and the deficient claim limitations, as follows:
 of the scene ((i.e. a digital camera and a depth sensor, synched to the projector, captures the scene with the light reflected by the object for at least the timeframe of one frame of the 3D scan.) [Srimohanarajah: col. 2, line 27-29]; (i.e. output point cloud collected from the 3D scanner) [Srimohanarajah: col. 11, line 19-20]) corresponding to a virtual camera perspective (i.e. Referring now to FIG. 5, a registration process, similar to that which may be used in block 456 of FIG. 4B, is shown for creating a common coordinate space composed of amalgamated virtual and actual coordinate spaces. The common coordinate space may be composed of both an actual coordinate space and a virtual coordinate space, where the actual coordinate space contains actual objects that exist in space and the virtual coordinate space contains virtual objects that are generated in a virtual space. The common coordinate space containing both the aforementioned actual and virtual objects may be produced) [Srimohanarajah: col. 9, line 21-31; Fig. 4A-5].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Srimohanarajah to install the 3D scanner system and a camera of the medical navigation system into the imaging system. 
Therefore, the combination of Hong with Srimohanarajah will enable the system to generate 3D scanning medical data [Srimohanarajah: col. 10, line 53-65; col. 16, line 21-23; Fig. 9].

Regarding claim 17, Hong meets the claim limitations as follows:
A method (i.e. Systems and methods) [Hong: col. 2, line 12] of determining the depth of a scene (i.e. The behavioral classification system 100 includes a imaging system 102 that is capable of capturing image data including depth information. Depth information typically refers to a measurement of a distance from a reference viewpoint to one point or many points in a scene) [Hong: col. 10, line 23-28], the method (i.e. Systems and methods) [Hong: col. 2, line 12] comprising: capturing depth data of the scene (i.e. capturing image data including depth information) [Hong: col. 10, line 25-26] with a depth sensor (i.e. depth sensor is used to acquire depth information) [Hong: col. 10, line 30-31]; generating a point cloud representative of the scene based on the depth data ((i.e. point clouds or meshes generated using image data) [Hong: col. 20, line 23-24]; (i.e. image data including depth information) [Hong: col. 10, line 25-26]); identifying a missing region of the point cloud ((i.e. identify a region or regions within the image data) [Hong: col. 18, line 4-5]; (i.e. detects contours of objects) [Hong: col. 16, line 21-22] ; (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]) in which the point cloud includes no data or sparse data (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]; (i.e. skeleton representations) [Hong: col. 18, line 67]);  registering the point cloud (i.e. FIGS. 6A-6D show image data captured by a top view
camera and a depth sensor during a registration process) [Hong: col. 6, line 66-67; Figs. 6A-6D] with three-dimensional (3D) medical scan data; and merging at least a portion of the 3D medical scan data ((i.e. combines information from the video and depth camera recordings) [Hong: col. 16, line 66-67]; (i.e. Systems and methods in accordance with various embodiments of the invention utilize integrated hardware and software systems that combine video tracking, depth sensing, machine vision and machine learning, for automatic detection and quantification of social behaviors.) [Hong: col. 1, line 60-65]; (i.e. As is discussed further below, the use of depth information as an additional modality in combination with conventional video data can significantly enhance the accuracy and robustness of automated behavioral classification processes) [Hong: col. 11, line 13-16]) with the depth data from the depth sensor (i.e. a behavioral classification system in accordance with an embodiment of the invention was constructed that records behavior using synchronized conventional video cameras and a time-of-flight depth sensor) [Hong: col. 15, line 36-38; Figs. 3A-3D] to generate a merged point cloud representative of the scene ((i.e. Depth information can be obtained using any of a variety of depth sensors including (but not limited to) a time of flight depth sensor, a structured illumination depth sensor, a Light Detection and Ranging (LIDAR) sensor, a Sound Navigation and Ranging (SONAR) sensor, an array of two or more conventional cameras in a multiview stereo configuration, and/or an array of two or more conventional cameras in a multiview stereo configuration in combination with an illumination source that projects texture onto a scene to assist with parallax depth information recovery on otherwise textureless surfaces) [Hong: col. 10, line 44-54; Figs. 3A-9]; (i.e. In the illustrated experimental apparatus, the top view camera 304 and the depth sensor 306 are mounted as close 15 together as possible (see FIG. 3D) to limit occlusions (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera). In the illustrated embodiment, the depth sensor is a time-of-flight depth sensor that includes an IR illumination source 320 and an IR camera 322 and detects contours of objects in the depth or z-plane by measuring the time-of flight of an infrared light signal generated by the IR illumination source 320 between the depth sensor and object surfaces for each point of the depth image generated by the time-of-flight depth sensor, in a manner analogous to SONAR) [Hong: col. 16, line 13-27; Figs. 3A-3D] ; (i.e. As is discussed further below, the use of depth information as an additional modality in combination with conventional video data can significantly enhance the accuracy and robustness of automated behavioral classification processes) [Hong: col. 11, line 13-16] ; (i.e. classification can be performed based upon raw image data, detected pose and raw 3D trajectory information, and/or any combination of raw data, pose data, trajectory data, and/or parameters appropriate to the requirements of a specific application) [Hong: col. 13, line 44-48]).
Hong does not explicitly disclose the following claim limitations (Emphasis added).
A method of determining the depth of a scene, the method comprising: capturing depth data of the scene with a depth sensor; generating a point cloud representative of the scene based on the depth data; identifying a missing region of the point cloud in which the point cloud includes no data or sparse data;  registering the point cloud with three-dimensional (3D) medical scan data; and merging at least a portion of the 3D medical scan data with the depth data from the depth sensor to generate a merged point cloud representative of the scene.   
However, in the same field of endeavor Claret further discloses the claim limitations and the deficient claim limitations, as follows:
((i.e. a sparse depth map showing three objects at different depths) [Claret: col. 13, line 14-15; Fig. 8]; (i.e. the sparse depth map obtained in the previous stage contains a lot of empty positions (dx,dy )) [Claret: col. 26, line 15-17]); 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Claret to use sparse depth map representation in the application. 
Therefore, the combination of Hong with Claret will enable the system to show objects in different depths [Claret: col. 13, line 14-15; col. 24, line 42-46; Fig. 8].
Hong and Claret do not explicitly disclose the following claim limitations (Emphasis added).
A method of determining the depth of a scene, the method comprising: capturing depth data of the scene with a depth sensor; generating a point cloud representative of the scene based on the depth data; identifying a region of the point cloud in which the point cloud includes no data or sparse data;registering the point cloud with three-dimensional (3D) medical scan data; and merging at least a portion of the 3D medical scan data with the depth data from the depth sensor to generate a merged point cloud representative of the scene.   
However, in the same field of endeavor
However, in the same field of endeavor Srimohanarajah further discloses the claim limitations and the deficient claim limitations, as follows:
registering the point cloud with three-dimensional (3D) medical scan data ((i.e. registering the tracking system to create the single unified virtual coordinate space for the 3D scan data, the medical image data) [Srimohanarajah: col. 16, line 21-23]; (i.e. Using a dense point cloud provided by the 3D scanner 309, this point cloud may be mapped to the extracted surface of the MR/CT volumetric scan data ( e.g., the pre-op image data 354) to register the patient's physical position to the volumetric data. The tracking system 321 (e.g., part of the navigation system 200) has no reference to the point cloud data. Therefore a tool may be provided that is visible to both the tracking system 321 and the 3D scanner 309. A transformation between the tracking system's camera space and the 3D scanner space may be identified so that the point cloud provided by the 3D scanner 309 and the tracking system 321 can be registered to the patient space.) [Srimohanarajah: col. 10, line 53-65; Fig. 9]; (i.e. the method 900 generates and receives 3D scan data from the 3D scanner 309 that is representative of a 3D scan of at least a portion of the patient 202.) [Srimohanarajah: col. 13, line 24-26]); and ((i.e. a three dimensional (3D) scanner system of a medical navigation system and a camera of the medical navigation system) [Srimohanarajah: col. 3, line 11-13]; (i.e. 3D scan data from the 3D scanner representative of a 3D scan of at least a portion of the patient) [Srimohanarajah: col. 3, line 26-29]) 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Srimohanarajah to install the 3D scanner system and a camera of the medical navigation system into the imaging system. 
Therefore, the combination of Hong with Srimohanarajah will enable the system to generate 3D scanning medical data [Srimohanarajah: col. 10, line 53-65; col. 16, line 21-23; Fig. 9]. 

Regarding claim 18, Hong meets the claim limitations as set forth in claim 17.Hong further meets the claim limitations as follow.
The method of claim 17 (i.e. Systems and methods) [Hong: col. 2, line 12] wherein the region of the point cloud (i.e. identify a region or regions within the image data) [Hong: col. 18, line 4-5] is a missing region of the point cloud ((i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth
camera) [Hong: col. 16, line 16-19]; (i.e. occludes at least a portion of the secondary subject) [Hong: col. 31, line 54-55]) in which the point cloud includes no data or sparse data (i.e. skeleton representations) [Hong: col. 18, line 67].  
Hong and Srimohanarajah do not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 17 wherein the region of the point cloud is a missing region of the point cloud in which the point cloud includes no data or sparse data.     
However, in the same field of endeavor Claret further discloses the claim limitations and the deficient claim limitations, as follows:
((i.e. a sparse depth map showing three objects at different depths) [Claret: col. 13, line 14-15; Fig. 8]; (i.e. The method may comprise an additional stage to generate a sparse depth map considering the slope of the epipolar lines obtained in the previous stage. The sparse depth map is obtained by assigning depth values (dz) of objects in the real world to the edges calculated before (dx dy).) [Claret: col. 24, line 42-46]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong and Srimohanarajah with Claret to use sparse depth map representation in the application. 
Therefore, the combination of Hong and Srimohanarajah with Claret will enable the system to show objects in different depths [Claret: col. 13, line 14-15; col. 24, line 42-46; Fig. 8].

Regarding claim 19, Hong meets the claim limitations as set forth in claim 17.Hong further meets the claim limitations as follow.
The method of claim 17 (i.e. Systems and methods) [Hong: col. 2, line 12] wherein the scene (i.e. The behavioral classification system 100 includes a imaging system 102 that is capable of capturing image data including depth information. Depth information typically refers to a measurement of a distance from a reference viewpoint to one point or many points in a scene) [Hong: col. 10, line 23-28] is a medical scene including a portion of a patient, wherein the missing region of the point cloud (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19] corresponds to the portion of the patient, and wherein the portion of the 3D medical scan data portion corresponds to the same portion of the patient.   
Hong and Claret does not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 17 wherein the scene is a medical scene including a portion of a patient, wherein the missing region of the point cloud corresponds to the portion of the patient, and wherein the portion of the 3D medical scan data portion corresponds to the same portion of the patient.  
However, in the same field of endeavor Srimohanarajah further discloses the claim limitations and the deficient claim limitations, as follows:
wherein the scene is a medical scene (i.e. the operation room) [Srimohanarajah: col. 2, line 35; Fig. 2] including a portion of a patient ((i.e. the method 900 generates and receives 3D scan data from the 3D scanner 309 that is representative of a 3D scan of at least a portion of the patient 202.) [Srimohanarajah: col. 13, line 24-26]; (i.e. mapping navigation space to patient space in a medical procedure) [Srimohanarajah: col. 1, line 9-10; Fig. 2]), ((i.e. During a medical procedure, navigation systems require a registration to transform between the physical position of the patient in the operating room and the volumetric image set (e.g., MRI/CT)) [Srimohanarajah: col. 2, line 33-36; Fig. 2]; ((i.e. the method 900 generates and receives 3D scan data from the 3D scanner 309 that is representative of a 3D scan of at least a portion of the patient 202.) [Srimohanarajah: col. 13, line 24-26]; (i.e. The method comprises generating and receiving 3D scan data from the 3D scanner representative of a 3D scan of at least a portion of the patient, the 3D scan including distinct identifiable portions of the apparatus visible by the 3D scanner system; generating and receiving image data from the camera, the image data including reflective surface portions of the apparatus visible by the camera; loading saved medical image data, the saved medical data including preoperative image data saved during a previous scan of at least a portion of the patient; and performing a transformation mapping to create a single unified virtual coordinate space based on the 3D scan data, the image data, and the medical image data) [Srimohanarajah: col. 3, line 27-39; Fig. 2]), and wherein the portion of the 3D medical scan data portion (i.e. registering the tracking system to create the single unified virtual coordinate space for the 3D scan data, the medical image data) [Srimohanarajah: col. 16, line 21-23] corresponds to the same portion of the patient ((i.e. the method 900 generates and receives 3D scan data from the 3D scanner 309 that is representative of a 3D scan of at least a portion of the patient 202.) [Srimohanarajah: col. 13, line 24-26]; (i.e. The method comprises generating and receiving 3D scan data from the 3D scanner representative of a 3D scan of at least a portion of the patient, the 3D scan including distinct identifiable portions of the apparatus visible by the 3D scanner system; generating and receiving image data from the camera, the image data including reflective surface portions of the apparatus visible by the camera; loading saved medical image data, the saved medical data including preoperative image data saved during a previous scan of at least a portion of the patient; and performing a transformation mapping to create a single unified virtual coordinate space based on the 3D scan data, the image data, and the medical image data) [Srimohanarajah: col. 3, line 27-39; Figs. 2, 9]. 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong and Claret with Srimohanarajah to install the 3D scanner system of a medical navigation system and a camera of the medical navigation system into the imaging system. 
Therefore, the combination of Hong and Claret with Srimohanarajah will enable the system to generate 3D scanning medical data of the patient [Srimohanarajah: col. 3, line 27-39; Fig. 2; col. 10, line 53-65; col. 16, line 21-23; Fig. 9]. 

Regarding claim 20, Hong meets the claim limitations as set forth in claim 17.Hong further meets the claim limitations as follow.
The method of claim 17 (i.e. Systems and methods) [Hong: col. 2, line 12] wherein
the 3D medical scan data is a computed tomography (CT) data.
Hong and Claret do not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 17 wherein the 3D medical scan data is a computed tomography (CT) data.
However, in the same field of endeavor Srimohanarajah further discloses the claim limitations and the deficient claim limitations, as follows:
wherein the 3D medical scan data (i.e. registering the tracking system to create the single unified virtual coordinate space for the 3D scan data, the medical image data) [Srimohanarajah: col. 16, line 21-23] is a computed tomography (CT) data (i.e. This modality is often used in conjunction with other modalities such as Ultrasound ("US"), Positron Emission Tomography ("PET") and Computed X-ray Tomography ("CT')) [Srimohanarajah: col. 1, line 33-36].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Srimohanarajah to install the 3D scanner system and a camera of the medical navigation system into the imaging system. 
Therefore, the combination of Hong with Srimohanarajah will enable the system to generate 3D scanning medical data [Srimohanarajah: col. 10, line 53-65; col. 16, line 21-23; Fig. 9]. 

Regarding claim 21, Hong meets the claim limitations as set forth in claim 1.Hong further meets the claim limitations as follow.
The method of claim 1 (i.e. methods) [Hong: col. 2, line 12] wherein identifying the missing region of the point cloud ((i.e. identify a region or regions within the image data) [Hong: col. 18, line 4-5]; (i.e. detects contours of objects) [Hong: col. 16, line 21-22]; (i.e. skeleton representations) [Hong: col. 18, line 67]) includes determining that the missing region of the point cloud has fewer than a predetermined threshold number of data points  ((i.e. A classification threshold was chosen that optimized the frame-wise precision and recall; the frame-wise precision, recall, fallout, and accuracy rates at the classification threshold are shown in FIG. 13K. The classifiers showed an overall prediction accuracy of 99% for attack, 99% for mounting, and 92% for close-investigation) [Hong: col. 24, line 26-31]; (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19] – No data available is below any positive threshold), wherein the plurality of cameras ((i.e. Many of the behavioral classification systems described above synchronize image data captured by one or more cameras and one or more depth sensors) [Hong: col. 30, line 61-63]; (i.e. image data captured by one or more conventional video cameras, as well as depth sensors) [Hong: col. 10, line 11-13]; (i.e. Other depth sensors could be utilized to obtain depth information such as, but not limited to, a plurality of cameras configured to capture information in color channels including the near-IR color channel in a multiview stereo configuration in combination with an illumination source configured to project texture onto the scene.) [Hong: col. 16, line 30-36]) each have a -5-different position and orientation relative to the scene ((i.e. a plurality of cameras in a multiview stereo configuration) [Hong: col. 4, line 5; Figs. 3A-8F]; (i.e. an array of two or more conventional cameras in a multiview stereo configuration, and/or an array of two or more conventional cameras in a multiview stereo configuration in combination with an illumination source that projects texture onto a scene to assist with parallax depth information recovery on otherwise textureless surfaces) [Hong: col. 10, line 50-54; Figs. 3A-9]; (i.e. The behavioral classification system illustrated in FIGS. 3A-3D can be used to track animal trajectories and orientations in 3D in the context of an animal's home cage and detect specific social behaviors, including attack, mounting and close investigation in different orientations (head-to-head, head-to-tail, head-to-side, etc).) [Hong: col. 16, line 40-45]; (i.e. position and pose information is passed through a set of feature extractors to obtain a low-dimensional representation from which machine learning algorithms can be used to train classifiers to detect specific behaviors. In other embodiments, the raw position and pose information can be passed directly to the classifier) [Hong: col. 13, line 8-13]), wherein the image data (i.e. image data captured by one or more conventional video cameras, as well as depth sensors) [Hong: col. 10, line 11-13] is light field image data, and wherein the method further comprises (i.e. methods) [Hong: col. 2, line 12]: processing the image data (i.e. The captured image data that includes depth information is then analyzed via an image processing pipeline) [Hong: col. 2, line 4-6] and the merged point cloud ((i.e. combines information from the video and depth camera recordings) [Hong: col. 16, line 66-67]; (i.e. Systems and methods in accordance with various embodiments of the invention utilize integrated hardware and software systems that combine video tracking, depth sensing, machine vision and machine learning, for automatic detection and quantification of social behaviors.) [Hong: col. 1, line 60-65]; (i.e. As is discussed further below, the use of depth information as an additional modality in combination with conventional video data can significantly enhance the accuracy and robustness of automated behavioral classification processes) [Hong: col. 11, line 13-16]) to synthesize an output image of the scene corresponding to a virtual camera perspective; and transmitting the output image (i.e. to transfer data captured by one or more imaging systems) [Hong: col. 12, line 62-63]  to a display for display (i.e. a variety of output devices including (but not limited to) a heads up display) [Hong: col. 30, line 34-35] to a user (i.e. the machine learning application 116 utilizes supervised learning to train one or more of the classifiers and generates an interactive user interface (or offloads the recorded image data to a cloud service that generates in interactive user interface) to prompt a user to annotate one or more sequences of image data to continuously expand a training data set of ground truth data for the purposes of training the one or more classifiers.) [Hong: col. 12, line 33-41].
Hong does not explicitly disclose the following claim limitations (Emphasis added).  
The method of claim 1 wherein identifying the missing region of the point cloud includes determining that the missing region of the point cloud has fewer than a predetermined threshold number of data points, wherein the plurality of cameras each have a -5- 155867223.1Application No.: 17/154,670Docket No.: 134429-8008.US01 Response to Office Action dated December 21, 2021 different position and orientation relative to the scene, wherein the image data is light field image data, and wherein the method further comprises: processing the image data and the merged point cloud to synthesize an output image of the scene corresponding to a virtual camera perspective; and transmitting the output image to a display for display to a user.
However, in the same field of endeavor Claret further discloses the claim limitations and the deficient claim limitations, as follows:
wherein the image data is light field image data ((i.e. Once the plenoptic camera has captured the light field and the conventional cameras the corresponding images 1412, the epipolar images of the plenoptic camera light field are analysed) [Claret: col. 42, line 12-15; Fig. 8]; (i.e. generating a plurality of epipolar images from a light field captured by a light field acquisition device) [Claret: col. 44, line 20-21]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Claret to use the light field technology in the system. 
Therefore, the combination of Hong with Claret will enable the system to obtaining depth information from a light field [Claret: col. 1, line 24-25; col. 44, line 18-21].
Hong and Claret do not explicitly disclose the following claim limitations (Emphasis added).
processing the image data and the merged point cloud to synthesize an output image of the scene corresponding to a virtual camera perspective; 
However, in the same field of endeavor Srimohanarajah further discloses the claim limitations and the deficient claim limitations, as follows:
processing the image data and the merged point cloud to synthesize an output image of the scene corresponding to a virtual camera perspective (i.e. Referring now to FIG. 5, a registration process, similar to that which may be used in block 456 of FIG. 4B, is shown for creating a common coordinate space composed of amalgamated virtual and actual coordinate spaces. The common coordinate space may be composed of both an actual coordinate space and a virtual coordinate space, where the actual coordinate space contains actual objects that exist in space and the virtual coordinate space contains virtual objects that are generated in a virtual space. The common coordinate space containing both the aforementioned actual and virtual objects may be produced) [Srimohanarajah: col. 9, line 21-31; Fig. 5].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Srimohanarajah to install the 3D scanner system and a camera of the medical navigation system into the imaging system. 
Therefore, the combination of Hong with Srimohanarajah will enable the system to generate 3D scanning medical data [Srimohanarajah: col. 10, line 53-65; col. 16, line 21-23; Fig. 9]. 

Regarding claim 22, Hong meets the claim limitations as set forth in claim 12.Hong further meets the claim limitations as follow.
The system of claim 12 (i.e. systems) [Hong: col. 2, line 12] wherein identifying the missing region of the point cloud ((i.e. identify a region or regions within the image data) [Hong: col. 18, line 4-5]; (i.e. detects contours of objects) [Hong: col. 16, line 21-22]; (i.e. skeleton representations) [Hong: col. 18, line 67]) includes determining that the missing region of the point cloud has fewer than a predetermined threshold number of data points  ((i.e. A classification threshold was chosen that optimized the frame-wise precision and recall; the frame-wise precision, recall, fallout, and accuracy rates at the classification threshold are shown in FIG. 13K. The classifiers showed an overall prediction accuracy of 99% for attack, 99% for mounting, and 92% for close-investigation) [Hong: col. 24, line 26-31]; (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19] – No data available is below any positive threshold), wherein the image data (i.e. image data captured by one or more conventional video cameras, as well as depth sensors) [Hong: col. 10, line 11-13] is light field image data, wherein the computing device (i.e. a microprocessor) [Hong: col. 31, line 38] is communicatively coupled to the display ((i.e. to transfer data captured by one or more imaging systems) [Hong: col. 12, line 62-63]; (i.e. Still another further embodiment again also includes an output device, where the classification application further directs the microprocessor to generate an alert via the output device) [Hong: col. 6, line 9-12]) (i.e. a variety of output devices including (but not limited to) a heads up display) [Hong: col. 30, line 34-35]), and wherein the computer-executable instructions further include instructions for (i.e. Machine readable instructions stored in memory 108 can be used to control the operations performed by the processor 106) [Hong: col. 11, line 63-65]: processing the image data (i.e. The captured image data that includes depth information is then analyzed via an image processing pipeline) [Hong: col. 2, line 4-6] and the merged point cloud ((i.e. combines information from the video and depth camera recordings) [Hong: col. 16, line 66-67]; (i.e. Systems and methods in accordance with various embodiments of the invention utilize integrated hardware and software systems that combine video tracking, depth sensing, machine vision and machine learning, for automatic detection and quantification of social behaviors.) [Hong: col. 1, line 60-65]; (i.e. As is discussed further below, the use of depth information as an additional modality in combination with conventional video data can significantly enhance the accuracy and robustness of automated behavioral classification processes) [Hong: col. 11, line 13-16]) to synthesize an output image of the scene corresponding to a virtual camera perspective; and transmitting the output image (i.e. to transfer data captured by one or more imaging systems) [Hong: col. 12, line 62-63]  to a display for display (i.e. a variety of output devices including (but not limited to) a heads up display) [Hong: col. 30, line 34-35] to a user (i.e. the machine learning application 116 utilizes supervised learning to train one or more of the classifiers and generates an interactive user interface (or offloads the recorded image data to a cloud service that generates in interactive user interface) to prompt a user to annotate one or more sequences of image data to continuously expand a training data set of ground truth data for the purposes of training the one or more classifiers.) [Hong: col. 12, line 33-41].
Hong does not explicitly disclose the following claim limitations (Emphasis added).  
The system of claim 12 wherein identifying the missing region of the point cloud includes determining that the missing region of the point cloud has fewer than a predetermined threshold number of data points, wherein the image data is light field image data, and further comprising a display, wherein the computing device is communicatively coupled to the display, and wherein the computer-executable instructions further include instructions for processing the image data and the merged point cloud to synthesize an output image of the scene corresponding to a virtual camera perspective; and transmitting the output image to the display for display to a user.processing the image data and the merged point cloud to synthesize an output image of the scene corresponding to a virtual camera perspective; and transmitting the output image to a display for display to a user.
However, in the same field of endeavor Claret further discloses the claim limitations and the deficient claim limitations, as follows:
wherein the image data is light field image data ((i.e. Once the plenoptic camera has captured the light field and the conventional cameras the corresponding images 1412, the epipolar images of the plenoptic camera light field are analysed) [Claret: col. 42, line 12-15; Fig. 8]; (i.e. generating a plurality of epipolar images from a light field captured by a light field acquisition device) [Claret: col. 44, line 20-21]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Claret to use the light field technology in the system. 
Therefore, the combination of Hong with Claret will enable the system to obtaining depth information from a light field [Claret: col. 1, line 24-25; col. 44, line 18-21].
Hong and Claret do not explicitly disclose the following claim limitations (Emphasis added).
processing the image data and the merged point cloud to synthesize an output image of the scene corresponding to a virtual camera perspective; 
However, in the same field of endeavor Srimohanarajah further discloses the claim limitations and the deficient claim limitations, as follows:
processing the image data and the merged point cloud to synthesize an output image of the scene corresponding to a virtual camera perspective (i.e. Referring now to FIG. 5, a registration process, similar to that which may be used in block 456 of FIG. 4B, is shown for creating a common coordinate space composed of amalgamated virtual and actual coordinate spaces. The common coordinate space may be composed of both an actual coordinate space and a virtual coordinate space, where the actual coordinate space contains actual objects that exist in space and the virtual coordinate space contains virtual objects that are generated in a virtual space. The common coordinate space containing both the aforementioned actual and virtual objects may be produced) [Srimohanarajah: col. 9, line 21-31; Fig. 5].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong with Srimohanarajah to install the 3D scanner system and a camera of the medical navigation system into the imaging system. 
Therefore, the combination of Hong with Srimohanarajah will enable the system to generate 3D scanning medical data [Srimohanarajah: col. 10, line 53-65; col. 16, line 21-23; Fig. 9]. 

Claims 11 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Hong (US Patent 10,121,064 B2), (“Hong”), in view of Claret et al. (US Patent 10,832,429 B2), (“Claret”), in view of Srimohanarajah et al. (US Patent 11,045,257 B2), (“Srimohanarajah”), in view of Bronder et al. (US Patent 10,395,418 B2), (“Bronder”).

Regarding claim 11, Hong meets the claim limitations as set forth in claim 10.Hong further meets the claim limitations as follow.
The method of claim 10 (i.e. Systems and methods) [Hong: col. 2, line 12] the display is a head-mounted display (i.e. a variety of output devices including (but not limited to) a heads up display) [Hong: col. 30, line 34-35] worn by the user, and wherein identifying the missing region of the scene ((i.e. identify a region or regions within the image data) [Hong: col. 18, line 4-5]; (i.e. identifies (508) the subjects and their locations within the imaged scene) [Hong: col. 17, line 58-59]; (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]) is based on at least one of a position and an orientation (i.e. position and pose information is passed through a set of feature extractors to obtain a low-dimensional representation from which machine learning algorithms can be used to train classifiers to detect specific behaviors. In other embodiments, the raw position and pose information can be passed directly to the classifier) [Hong: col. 13, line 8-13] of the head-mounted display.
Hong, Claret and Srimohanarajah do not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 10 wherein the display is a head-mounted display worn by the user, and wherein identifying the missing region of the scene is based on at least one of a position and an orientation of the head-mounted display.      
However, in the same field of endeavor Bronder further discloses the claim limitations and the deficient claim limitations, as follows:
((i.e. a head-mounted display (e.g., in VR)) [Bronder: col. 4, line 8]; (i.e. the VR device mounted on the user's head) [Bronder: col. 1, line 24]), and wherein identifying the missing region of the scene is based on at least one of a position and an orientation of the head-mounted display  ((i.e. given one or more parameters regarding a position of a head-mounted display (e.g., in VR), possible alterations in a rendering parameter by the time the display is updated can be predicted. For example, the rendering parameter may correspond to whether to render a portion (e.g., pixel) of an image. In this example, some portions of the image that are outside of the target shape of the VR device can be predicted as more probable to be within a displayed image than other portions. The probability field can be generated (or modified) based on predicted change in scene orientation (e.g., based on predicted change in position or orientation of the head-mounted display).) [Bronder: col. 4, line 7-18]; (i.e. an image to be produced on a display device can be oriented or modified based on user input (e.g., movement of a gamepad button or stick to cause movement of the orientation of the scene, introduction of items into the scene, etc.). Similarly, in VR devices, the image to be produced on a display device can be oriented or modified based on user input, where the input may include detecting movement of the user's head (e.g., detected movement of the VR device mounted on the user's head).  In any case, the device may detect a desired change of orientation at any given time (e.g., by detecting change in head position)) [Bronder: col. 1, line 16-27]. 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong, Claret and Srimohanarajah with Bronder to include virtual reality (VR) devices, which use a graphics processing unit (GPU) to render graphics from a computing device to a display device. 
Therefore, the combination of Hong, Claret and Srimohanarajah with Bronder will enable the system adjust the scene based on the position or the orientation of the HMD [Bronder: col. 1, line 16-27].

Regarding claim 16, Hong meets the claim limitations as set forth in claim 15.Hong further meets the claim limitations as follow.
The system of claim 15 (i.e. Systems and methods) [Hong: col. 2, line 12] wherein identifying the missing region of the scene ((i.e. identify a region or regions within the image data) [Hong: col. 18, line 4-5]; (i.e. identifies (508) the subjects and their locations within the imaged scene) [Hong: col. 17, line 58-59]; (i.e. pixels in the images captured by the top view camera for which depth information is not available due to the occlusion of that pixel location in the field of view of the depth camera) [Hong: col. 16, line 16-19]) is based on at least one of a position and an orientation (i.e. position and pose information is passed through a set of feature extractors to obtain a low-dimensional representation from which machine learning algorithms can be used to train classifiers to detect specific behaviors. In other embodiments, the raw position and pose information can be passed directly to the classifier) [Hong: col. 13, line 8-13] of the display.
Hong, Claret and Srimohanarajah do not explicitly disclose the following claim limitations (Emphasis added).
The system of claim 15 wherein identifying the missing region of the scene is based on at least one of a position and an orientation of the display.    
However, in the same field of endeavor Bronder further discloses the claim limitations and the deficient claim limitations, as follows:
wherein identifying the missing region of the scene is based on at least one of a position and an orientation of the head-mounted display  ((i.e. given one or more parameters regarding a position of a head-mounted display (e.g., in VR), possible alterations in a rendering parameter by the time the display is updated can be predicted. For example, the rendering parameter may correspond to whether to render a portion (e.g., pixel) of an image. In this example, some portions of the image that are outside of the target shape of the VR device can be predicted as more probable to be within a displayed image than other portions. The probability field can be generated (or modified) based on predicted change in scene orientation (e.g., based on predicted change in position or orientation of the head-mounted display).) [Bronder: col. 4, line 7-18]; (i.e. an image to be produced on a display device can be oriented or modified based on user input (e.g., movement of a gamepad button or stick to cause movement of the orientation of the scene, introduction of items into the scene, etc.). Similarly, in VR devices, the image to be produced on a display device can be oriented or modified based on user input, where the input may include detecting movement of the user's head (e.g., detected movement of the VR device mounted on the user's head).  In any case, the device may detect a desired change of orientation at any given time (e.g., by detecting change in head position)) [Bronder: col. 1, line 16-27]. 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hong, Claret and Srimohanarajah with Bronder to include virtual reality (VR) devices, which use a graphics processing unit (GPU) to render graphics from a computing device to a display device. 
Therefore, the combination of Hong, Claret and Srimohanarajah with Bronder will enable the system adjust the scene based on the position or the orientation of the HMD [Bronder: col. 1, line 16-27].
                                                                               
Reference Notice 
Additional prior arts, included in the Notice of Reference Cited, made of record and not relied upon is considered pertinent to applicant's disclosure.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Philip Dang whose telephone number is (408) 918-7529.  The examiner can normally be reached on Monday-Thursday between 8:30 am - 5:00 pm (PST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sath Perungavoor can be reached on 571-272-7455.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Philip P. Dang/Primary Examiner, Art Unit 2488