DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Responsive to the communication dated 06/15/2022.
Claims 1, 7, 8, 9, 10, 11, 17 are amended.
Claims 1-20 are presented for examination.

Final Action
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Response to Arguments
Claim Rejections - 35 USC § 112
The Applicant has amended the claim in order to overcome the rejection under 35 UC 112. The rejections are withdrawn.


Claim Rejections – 35 USC § 103

1. The Applicant states that the independent claims have been amended to recite “modeling a surface characteristic of the object” and using the modeled surface characteristic in the generation of synthetic data” and that “this is a feature not taught or suggested by the cited art of record.”

In response the argument is not persuasive.

The independent claim have been amended to recite: “... modeling environmental illumination of an environment containing the pattern based structured light sensor; modeling a surface characteristic of the object; and generating... based on... the environmental illumination model and object surface characteristic model.”

While the Applicant asserts that the art of record doe not make these limitations obvious, the Examiner disagrees. Schlette_2014 illustrates this in Figure 3 on page 748. In Figure 3 the input data clearly includes “light position” “lighting model” “lighting” which are combined with the Scene Geometry data in the “light shader”. On page 748 it states: “... the input data consists of the geometric description of the scene and lighting conditions (direction, color and lighting model) which are combined in an appropriate lighting shader to ensure real-time visualization...”. The Examiner notes that this “real-time visualization is provided to the “shader stack” and that that the shader stack visualization is illustrated in Figure 4. Figure 4 includes shadows resulting from the environmental illumination of the environment. This makes it obvious that the lighting model, position, and lighting include environmental illumination.

Also, the input data includes scene geometry data and Fig. 4 clearly shows that the scene includes the geometric shape of surface of various objects. It is also noted that the surface shapes of the objects in Fig. 4 have shadows. Therefore the environment illumination model includes modeling a surface characteristic of the object, because a surface characteristic includes opacity and opaque objects create shadows. Additionally; Schlette_2014 teaches to include texture of objects. Section 3.2 “... scene with an added simulated texture...”. Fig. 9 illustarates objects with texture. Figure 2: “texture” and page 746 teaches “texture an object to improve the stereo processing, since object structure improves correspondence findings.” Therefore; it would have been obvious to “modeling a surface characteristic of the object” where the surface characteristic may be opacity and/or texture.

Finally; the Examiner also notes that Schletter_2014 clearly teaches that the modeled surface characteristics of the objects and the lighting model of the environment are used to generate the synthetic depth data because in Fig. 3 these models precede and flow into the generation of the depth data. Therefore; the depth data output uses (i.e., is based upon) lighting models and geometric model (including texture/opacity) as input.

Therefore, the Examiner finds that the argument is not persuasive. 

2. The Applicant argues that the dependent claims are allowable due to their dependency from an amended independent claim.

In response the argument is not persuasive due to the reason outlined above for item 1. 

End Response to Arguments


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


(1)  Claims 1, 7, 8 are rejected under 35 U.S.C. 103 as being unpatentable over Handa_2016 (Understanding Real World Indoor Scenes With Synthetic Data, June 2016) in view of Schlette_2014 (A New Benchmark for Pose Estimation with Ground Truth from Virtual Reality, Prod. Eng. Res. Devel. (2014)) in view of Geng_2011 (Structured-light 3D Surface Imaging: A Tutorial, IEEE Intelligent Transport System Society, 2011) in view of Houlton_2011 (2011/0221752).


Claim 1. Handa_2016 makes obvious “A method for synthetic depth data generation, the method comprising: receiving (1401), three-dimensional computer-aided design (CAD) data of an object; [virtualizing a camera]; 

 and generating (1405) synthetic depth data using the  [virtual camera], the synthetic depth data based on three-dimensional CAD data” (abstract: “… in the work… [we] show the potential of computer graphics to generate… labelled data from synthetic 3D scenes. By carefully synthesizing training data with appropriate noise models…”; page 2 par 1: “… synthetic data is already used for many computer vision problems in the context of robotics. We believe the role of synthetic data… will continue to grow in providing training data with further advances in machine learning… our main contribution is this work is to show the potential of synthesized ground truth depth data generated from annotated 3D scenes…”; page 2 section 3 par 1 – 2: “… large scale repositories of CAD models… containing a significant number of manually labelled 3D models… generate synthetic data from random poses… SceneNet contains 3D models… all the 3D models are metrically accurate… We use OpenGL to place virtual cameras in the synthetic scenes to generate ground truth data…”; page 3: “… we generate new physically realistic scenes from object models downloaded from various online object repositories…” Table 1: “Archive3D” NOTE: the above quotations teach to download (i.e., receiving) 3D CAD models from large repositories of synthetic 3D scenes and then to use OpenGL to place a virtual camera into the scene and to generate synthetic data from the cameras point of view which is then used as training data for machine learning systems in computer vision for autonomously navigating robots equipped with cameras (see introduction par 1).) 

While Handa_2016 teaches to generate synthetic depth data; Handa_2016 does not teach this is done in “real-time.”

While Handa_2016 teaches downloading 3D CAD models from a 3D model repository and while this makes obvious receiving the downloaded 3D CAD models, Handa_2016 does not explicitly teach the 3D model is received at an interface. Therefore Handa_2016 does not explicitly teach “at an interface.”

While Handa_2016 teaches a virtual camera  and while it may properly be found that one of ordinary skill in the art might properly infer that  a virtual camera is one which is modeled, Handa_2016 does not teach “a multi-shot pattern based structured light sensor” as the camera. Therefore Handa_2016 does not explicitly teach “modeling (1403) a multi-shot pattern based structured light sensor” nor “multi-shot pattern based structured light sensor model” nor “... modeling environmental illumination of an environment containing the pattern based structured light sensor; modeling a surface characteristic of the object” nor “the environmental illumination model and object surface characteristic model.”


Schlette_2014; however, makes obvious “real-time” (page 748 par 1: “… to achieve real-time simulation… GPUs for hardware accelerated real-time rendering… the camera simulation the provides a real-time simulation… it allows for simulating various optical and electronic effects in real-time… to ensure real-time visualization…”) and a virtual “a pattern based structured light sensor” (page: 245: “… the benchmark platform is equipped with a multi-sensor setup consisting of stereo camera and depth scanning devices… following the eRobotics methodology, a simulated 3D representation of the platform was modelled in virtual reality. Based on a detailed camera and sensor simulation, we generated a set of benchmark images and point cloud with controlled levels of noise as well as ground truth data such as object positions…”; page 746: “… sensors for deriving depth information. Since depth information is crucial for evaluating 3D object poses as well as spatial relations, sensors based on scanning mechanisms to directly generate depth information are widely used…” page 746 section 1.1 par 1: “the benchmark platform is equipped with… four RGB stereo cameras, three “Microsoft Kinect” RGB-D devices and two projectors for shedding structured light on the scene…”; page 746 – 747 section 1.2: “Our benchmark provides RGB and RGB-D data sets which have been generated using simulatable 3D representations of this benchmark platform in a VR system… Following the eRobotics methodology… in particular the detailed camera and sensor simulation…”; page 747 – 748 section 2.3: “Camera Simulation The VR system features a camera and sensor simulation… the camera simulation the provides a real-time simulation of various optical and electronic effects… the input data consists of the geometric description of the scene and lighting conditions…”) and “... modeling environmental illumination of an environment containing the pattern based structured light sensor; modeling a surface characteristic of the object” and generate based on “the environmental illumination model and object surface characteristic model.” Figure 3 on page 748. In Figure 3 the input data clearly includes “light position” “lighting model” “lighting” which are combined with the Scene Geometry data in the “light shader”. On page 748 it states: “... the input data consists of the geometric description of the scene and lighting conditions (direction, color and lighting model) which are combined in an appropriate lighting shader to ensure real-time visualization...”. The Examiner notes that this “real-time visualization is provided to the “shader stack” and that that the shader stack visualization is illustrated in Figure 4. Figure 4 includes shadows resulting from the environmental illumination of the environment. This makes it obvious that the lighting model, position, and lighting include environmental illumination.

Also, the input data includes scene geometry data and Fig. 4 clearly shows that the scene includes the geometric shape of surface of various objects. It is also noted that the surface shapes of the objects in Fig. 4 have shadows. Therefore the environment illumination model includes modeling a surface characteristic of the object, because a surface characteristic includes opacity and opaque objects create shadows. Additionally; Schlette_2014 teaches to include texture of objects. Section 3.2 “... scene with an added simulated texture...”. Fig. 9 illustarates objects with texture. Figure 2: “texture” and page 746 teaches “texture an object to improve the stereo processing, since object structure improves correspondence findings.” Therefore; it would have been obvious to “modeling a surface characteristic of the object” where the surface characteristic may be opacity and/or texture.

Finally; the Examiner also notes that Schletter_2014 clearly teaches that the modeled surface characteristics of the objects and the lighting model of the environment are used to generate the synthetic depth data because in Fig. 3 these models precede and flow into the generation of the depth data. Therefore; the depth data output uses (i.e., is based upon) lighting models and geometric model (including texture/opacity) as input.)

Handa_2016 and Schlette_2014 are analogous art because they are from the same field of endeavor called sensors. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Handa_2016 and Schlette_2014. The rationale for doing so would have been that Handa_2016 teaches to create synthetic ground truth data for depth sensors for use in an autonomous robot during naviation. Schlette_2014 teaches use a GPU to calculate synthetic ground truth data according to the eRobotics methodology in real-time because this is “important for a realistic visualization of a scenario” (page 748). Therefore it would have been obvious to combine Handa_2016 and Schlette_2014 for the benefit of getting data for training the robot and allowing it to operate in real-time to obtain the invention as specified in the claims.

Handa_2016 and Schlette_2014 does not explicitly teach “at an interface” nor “multi-shot” sensor.

Geng_2011; however, teaches “multi-shot” structured light sensors (Figure 3).

Handa_2016 and Schlette_2014 and Geng_2011 are analogous art because they are from the same field of endeavor called sensors. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Geng_2011
The rationale for doing so would have been Schlette_2014 teaches to simulate in virtual reality a structured light sensor. Geng_2011 teaches a structured light sensor can be Multi-shot structured light sensor. It would have been obvious to substitute the multi-shot structured light sensor taught by Geng_2011 for the structured light sensor taught by Schlette_2014 to obtain predictable results. The findings in support of the conclusion are:

The prior art contained a device (method, product, etc.) which differed from the claimed device by the substitution of some component (step, element, etc. with other components as taught by Schlette_2014.
The substituted component and their functions were known in the art as taught by Geng_2011.
One of ordinary skill in the art could have substituted one known element for another and the result would have been predictable.

Handa_2016 and Schlette_2014 and Geng_2011 does not explicitly teach “at an interface.”

Houlton_2011 teaches “at an interface” (FIG. 2; par 15: “… controller 22 could also communicate with the PCH 20 to provide support for user interface devices such as a display, keypad, mouse, etc.”; par 17: “… the GPU 26 and graphics memory 28 might be installed on a graphics/video card, wherein the GPU 26 could communicate with the CPU 12 via a graphics bus such as… Accelerated Graphics Port (e.g., AGP V3.0 Interface…”)

Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 are analogous art because they are from the same field of endeavor called computer graphics. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Houlton_2011
The rationale for doing so would have been that Schlette_2014 teaches to send data to a GPU and Houlton_2011 teaches to send data to a GPU using a port which they also call an “interface.”
Therefore it would have been obvious to combine Schlette_2014 and Houlton_2011 for the benefit of having an interface to communicate with the GPU to obtain the invention as specified in the claims.

Claim 7. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 make obvious all the limitations of claim 1. Geng_2011 makes obvious “wherein modeling the multi-shot pattern based structured light sensor comprises modeling the pattern modeling” (Figure 3, 4). The Examiner notes: Schlette_2014 teaches to simulate/model structured light sensors/cameras and that the simulation/model provides a real-time simulation of various optical and electronic effects and Geng_2011 teaches that a multi-shot structured light sensor/camera functions with a sequence of pattern projections. Therefore it would be obvious to model the acquisition of multi-shot structured light sensor data comprises modeling the influence of a number of pattern exposures.

Claim 8. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 make obvious all the limitations of claim 7. Schlette_2014 makes obvious “wherein modeling the pattern modeling comprises modeling the effect of light sources” (Fig. 3 “light position” “lighting model” “lighting” “lighting shader”).


(2)  Claims 2, 3, 5, 6 are rejected under 35 U.S.C. 103 as being unpatentable over Handa_2016  in view of Schlette_2014) in view of Geng_2011  in view of Houlton_2011 in view of Andreas_2016 (SyB3R: A Realistic Synthetic Benchmark for 3D Reconstruction from Images, 2016).

Claim 2. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 make obvious all the limitations of claim 1. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 does not explicitly teach “wherein modeling (1403) the multi-shot pattern based structured light sensor comprises modeling the effect of motion between exposures on acquisition of multi-shot structured light sensor data.”

Andreas_2016 makes obvious “wherein modeling (1403) the multi-shot pattern based structured light sensor comprises modeling the effect of motion between exposures on acquisition of multi-shot structured light sensor data” (page 7: “camera rotation motion blur” “automatic exposure control” page 13: “… model… object motion blur, as well as depth of field and implement camera motion blur, radial distortion, chromatic aberrations, auto exposure, camera sensor noise, nonlinear tone mapping, and JPG compression … highly dependent camera effects such as camera motion blur, sensor noise, and tone mapping are modeled…”).

Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 and Andreas_2016 are analogous art because they are from the same field of endeavor called graphics. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Andreas_2016. The rationale for doing so would have been Schlette_2014 teaches to model a camera and teaches to include extensions as they are needed (page 748). Andreas_2015 teaches to extend a camera model by including various effects which increase the reality of the rendered image.
Therefore it would have been obvious to combine Schlette_2014 and Andreas_2016 for the benefit of improving the reality of the produced dataset to obtain the invention as specified in the claims.

Claim 3. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 make obvious all the limitations of claim 1. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 does not explicitly teach “wherein modeling the effect of motion between exposures on acquisition of multi-shot structured light sensor data comprises modeling the influence of exposure time.”

Andreas_2016 makes obvious “wherein modeling the effect of motion between exposures on acquisition of multi-shot structured light sensor data comprises modeling the influence of exposure time.” (Page 7: “… short exposure times, however, camera motion blur stems from small rotations of the camera… allows experimentation with different exposure times…”).

Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 and Andreas_2016 are analogous art because they are from the same field of endeavor called graphics. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Andreas_2016. The rationale for doing so would have been Schlette_2014 teaches to model a camera and teaches to include extensions as they are needed (page 748). Andreas_2015 teaches to extend a camera model by including various effects which increase the reality of the rendered image.
Therefore it would have been obvious to combine Schlette_2014 and Andreas_2016 for the benefit of improving the reality of the produced dataset to obtain the invention as specified in the claims.

Claim 5. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 make obvious all the limitations of claim 1. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 does not explicitly teach “wherein the modeling the effect of motion between exposures on acquisition of multi-shot structured light sensor data comprises modeling motion blur.”

Andreas_2016 makes obvious “wherein the modeling the effect of motion between exposures on acquisition of multi-shot structured light sensor data comprises modeling motion blur” (page 7: “camera rotation motion blur” “automatic exposure control” page 13: “… model… object motion blur, as well as depth of field and implement camera motion blur, radial distortion, chromatic aberrations, auto exposure, camera sensor noise, nonlinear tone mapping, and JPG compression … highly dependent camera effects such as camera motion blur, sensor noise, and tone mapping are modeled…”).

Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 and Andreas_2016 are analogous art because they are from the same field of endeavor called graphics. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Andreas_2016. The rationale for doing so would have been Schlette_2014 teaches to model a camera and teaches to include extensions as they are needed (page 748). Andreas_2015 teaches to extend a camera model by including various effects which increase the reality of the rendered image.
Therefore it would have been obvious to combine Schlette_2014 and Andreas_2016 for the benefit of improving the reality of the produced dataset to obtain the invention as specified in the claims.

Claim 6. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 make obvious all the limitations of claim 1. Geng_2011 makes obvious  “wherein modeling the effect  (Figure 3, 4). The Examiner notes: Schlette_2014 teaches to simulate/model structured light sensors/cameras and that the simulation/model provides a real-time simulation of various optical and electronic effects and Geng_2011 teaches that a multi-shot structured light sensor/camera functions with a sequence of pattern projections. Therefore it would be obvious to model the acquisition of multi-shot structured light sensor data comprises modeling the influence of a number of pattern exposures.

Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011; however, does not explicitly teach modeling the effect “of motion.”

Andreas_2016 makes obvious to model the effect of motion (page 7: “camera rotation motion blur” “automatic exposure control” page 13: “… model… object motion blur, as well as depth of field and implement camera motion blur, radial distortion, chromatic aberrations, auto exposure, camera sensor noise, nonlinear tone mapping, and JPG compression … highly dependent camera effects such as camera motion blur, sensor noise, and tone mapping are modeled…”).

Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 and Andreas_2016 are analogous art because they are from the same field of endeavor called graphics. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Andreas_2016. The rationale for doing so would have been Schlette_2014 teaches to model a camera and teaches to include extensions as they are needed (page 748). Andreas_2015 teaches to extend a camera model by including various effects which increase the reality of the rendered image.
Therefore it would have been obvious to combine Schlette_2014 and Andreas_2016 for the benefit of improving the reality of the produced dataset to obtain the invention as specified in the claims.





(3)  Claims 4 are rejected under 35 U.S.C. 103 as being unpatentable over Handa_2016  in view of Schlette_2014 in view of Geng_2011  in view of Houlton_2011 in view of Wissman_2011 (Fast and low-cost structured light pattern sequence projection, 2011 optical society of America).

Claim 4. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 make obvious all the limitations of claim 1. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 does not explicitly teach “wherein modeling the effect of motion between exposures on acquisition of multi-shot structured light sensor data comprises modeling an interval between exposures.”

Wissman_2011 makes obvious  “wherein modeling the effect of motion between exposures on acquisition of multi-shot structured light sensor data comprises modeling an interval between exposures” (page 3: “pattern witching is achieved by synchronized timing of camera exposre intervals, integrating light modulated by a pattern segment during rotationat a constant speed” page 3: “… it is evident that in the spatially ordered distribution of translucent and opaque regions, camera exposure intervals are constrained to the full angle of each pattern segment…”; page 12: “… light source and synchronize camera exposure intervals…”).

Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 and Wissman_2011 are analogous art because they are from the same field of endeavor called cameras. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Wissman_2011. he rationale for doing so would have been that Schlette_2014 teaches to model and simulate a camera sensor and to “allow for high performance simulation of different optical and electronic effect” and Wissman_2011 teaches to constrain the exposure intervals such that it is synchronized with changing the structured light pattern. Therefore it would have been obvious to combine Schlette_2014 and Wissman_2011 for the benefit of ensuring that the simulated camera properly simulates the exposure such that it occurs at the proper timing to obtain the invention as specified in the claims.

(4)  Claims 9 are rejected under 35 U.S.C. 103 as being unpatentable over Handa_2016  in view of Schlette_2014 in view of Geng_2011  in view of Houlton_2011 in view of Medeiros_2014 (Using Physically Based Rendering to Benchmark Structured Light Scanners, Pacific Graphics 2014 Volume 33 (2014), Number 7).

Claim 9. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 make obvious all the limitations of claim 8. Schlette_2014 “wherein modeling the effect of light source comprises modeling the effect of 

Schlette_2014 does not explicitly teach “ambient” light.

Medeiros_2014 teaches “ambient” light (page 16: “… ambient light…”).

Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 and Medeiros_2014 are analogous art because they are from the same field of endeavor called cameras. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Medeiros_2014. The rationale for doing so would have been the Schlette_2014 teaches to model lighting and Medeiros_2014 teaches that lighting includes ambient light. Therefore it would have been obvious to combine Schlette_2014 and Medeiros_2014 for the benefit of modeling light sources to improve the reality of the rendering to obtain the invention as specified in the claims.

(5)  Claims 10 are rejected under 35 U.S.C. 103 as being unpatentable over Handa_2016  in view of Schlette_2014 in view of Geng_2011  in view of Houlton_2011 in view of Ringaby_2012 (Geometric Computer Vision for Rolling-Shutter and Push-Broom sensors, Linkoping University, 2012).

Claim 10. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 make obvious all the limitations of claim 7. Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 does not teach “wherein modeling the pattern modeling comprises modeling the effect of a rolling shutter or a global shutter.”

Ringaby_2012 makes obvious “wherein modeling the pattern modeling comprises modeling the effect of a rolling shutter or a global shutter” (abstract: “… almost all CMOS sensors make use of what is called a rolling shutter. Compared to a global shutter, which image all the pixels at the same time, a rolling shutter camera exposes the image row-by-row. This leads to geometric distortions in the image when either the camera or the object is moving…”).

Handa_2016 and Schlette_2014 and Geng_2011 and Houlton_2011 and Ringaby_2012 are analogous art because they are from the same field of endeavor called graphics or cameras. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Ringaby_2012. The rationale for doing so would have been that Schlette_2014 teaches to model and simulate a camera sensor and to “allow for high performance simulation of different optical and electronic effect” and Ringaby_2012 teaches that rolling shutter is an electronic effect resulting from the way sensors operate. Therefore it would have been obvious to combine Schlette_2014 and Ringaby_2012 for the benefit of having a more realistic camera simulation to obtain the invention as specified in the claims.


(6)  Claims 11, 12, 13, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Handa_2016  in view of Schlette_2014  in view of Houlton_2011 

Claim 11. Handa_2016 teaches “A system (page 5: “… we also compare our results to the state-of-the-art system of Eigen and Fergus… Since we use only depth, our system is not directly comparable… but we obtain competitive performance…” NOTE: this quotation refers to the authors “system”) for synthetic depth data generation , the system comprising: [repository/database] configured to store a three-dimensional simulation of an object; and receive depth data of the object captured by a sensor of a mobile device;generate a [simulation] of the sensor of the mobile device;generate synthetic depth data based on the stored three-dimensional simulation of an object and the  [simulation] of the sensor of the mobile device; (abstract: “… in the work… [we] show the potential of computer graphics to generate… labelled data from synthetic 3D scenes. By carefully synthesizing training data with appropriate noise models…”; page 2 par 1: “… synthetic data is already used for many computer vision problems in the context of robotics. We believe the role of synthetic data… will continue to grow in providing training data with further advances in machine learning… our main contribution is this work is to show the potential of synthesized ground truth depth data generated from annotated 3D scenes…”; page 2 section 3 par 1 – 2: “… large scale repositories of CAD models… containing a significant number of manually labelled 3D models… generate synthetic data from random poses… SceneNet contains 3D models… all the 3D models are metrically accurate… We use OpenGL to place virtual cameras in the synthetic scenes to generate ground truth data…”; page 3: “… we generate new physically realistic scenes from object models downloaded from various online object repositories…” Table 1: “Archive3D” NOTE: the above quotations teach to download (i.e., receiving) 3D CAD models from large repositories of synthetic 3D scenes and then to use OpenGL to place a virtual camera into the scene and to generate synthetic data from the cameras point of view which is then used as training data for machine learning systems in computer vision for autonomously navigating robots equipped with cameras (see introduction par 1).) train an algorithm based on the generated synthetic depth data (abstract: “… synthesize training data…”; page 1 introduction par 3: “… training data… in this work, we focus on the challenge of obtaining the desired training data for scene understanding…”; page 2 par 1: “… synthetic data.. in providing training data…”; page 2 section 3: “synthesizing training data… obtaining the desired training data…”); 

Handa_2016 does not teach “a memory” nor “a processor (1510)” nor “model” nor “and estimate, using the trained algorithm, a pose of the object based on the received depth data of the object.”

Schlette_2014 teaches “model” (page 745: “… following the eRobotics methodology, a simulated 3D representation of this platform was modelled in virtual reality. Based on a detailed camera and sensor simulation…”) and  “and estimate, using the trained algorithm, a pose of the object based on the received depth data of the object” (page 746 “in general, the platform allows for complex manipulation of objects as well as pose estimation with high precision”)

Handa_2016 and Schlette_2014 are analogous art because they are from the same field of endeavor called sensors. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Handa_2016 and Schlette_2014. The rationale for doing so would have been that Handa_2016 teaches to create synthetic ground truth data for depth sensors for use in an autonomous robot during naviation. Schlette_2014 teaches use a GPU to calculate synthetic ground truth data according to the eRobotics methodology in real-time because this is “important for a realistic visualization of a scenario” (page 748). Therefore it would have been obvious to combine Handa_2016 and Schlette_2014 for the benefit of getting data for training the robot and allowing it to operate in real-time to obtain the invention as specified in the claims.

Handa_2016 and Schlette_2014 does not explicitly teach “a memory” nor “a processor (1510)”

Houlton_2011 teaches “a memory” and “a processor (1510)” (FIG. 2)

Handa_2016 and Schlette_2014 and Houlton_2011 are analogous art because they are from the same field of endeavor called computer graphics. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Houlton_2011
The rationale for doing so would have been that Schlette_2014 teaches to send data to a GPU and Houlton_2011 teaches to send data to a GPU using a port which they also call an “interface.”
Therefore it would have been obvious to combine Schlette_2014 and Houlton_2011 for the benefit of having an interface to communicate with the GPU to obtain the invention as specified in the claims.


Claim 12. Handa_2016 and Schlette_2014 and Houlton_2011 teach all the limitations of claim 11. Handa_2016 makes obvious “wherein the processor (1504) is further configured to receive data indicative of the sensor of the mobile device” (page 1 introduction par 1: “autonomous navigating robots equipped with cameras”).

Claim 13. Handa_2016 and Schlette_2014 and Houlton_2011 teach all the limitations of claim 11. Handa_2016 makes obvious “wherein the generated synthetic data comprises labeled ground-truth poses” (abstract: “potential of computer graphics to generate virtually unlimited labelled data from synthetic 3D scenes…”; page 2 section 2: “… labelled training data with perfect ground truth for per-pixel labelling…”; Figure 2).

Claim 14. Handa_2016 and Schlette_2014 and Houlton_2011 teach all the limitations of claim 11. Schlette_2014 makes obvious “wherein generating the model of the sensor of the mobile device comprises: modeling the projector of the sensor; and modeling a perspective camera of the sensor” (page 746 section 1.1 par 1: “the benchmark platform is equipped with… three Microsoft Kinect RGB-D devices and two projectors for shedding structured light on the scene…”; page 745: “…following the eRobotics methodology, a simulated 3D representation of this platform was modelled in virtual reality…”).

(7)  Claims 15 are rejected under 35 U.S.C. 103 as being unpatentable over Handa_2016 in view of Schlette_2014 in view of Houlton_2011 in view of Andreas_2016.

Claim 15. Handa_2016 and Schlette_2014 and Houlton_2011 teach all the limitations of claim 11. Schlette_2014 makes obvious “wherein generating the synthetic depth data comprises:
rendering synthetic pattern images based on the model of the sensor; applying pre-processing effects to the synthetic pattern images (page 747 – 748 section 2.3: “… input data consists of the geometric description of the scene and lighting conditions (direction, color, and lighting model) which are combined in an appropriate lighting shader… lens distortion has to be added before various noise effect are rendered. Therefore a shaderstack combines the different shaders in the right order and processes the rendered image…”; Figure 3; NOTE: the shader stack may be considered the processing and therefore anything that happens prior is considered pre-processing.) and constructing point cloud data from the processed synthetic pattern images (page 752 section 4: “…the data was recorded and played back in simulation, where images and point clouds were generated from the simulated data with selected levels of noise…”).

Handa_2016 and Schlette_2014 and Houlton_2011 does not explicitly teach “applying post-processing effects to the synthetic pattern images.”

Andreas_2016 makes obvious “applying post-processing effects to the synthetic pattern images” (page 1: “… many real world effects of image acquisition (such as motion blur and noise) are simulated during image rendering or post-processing…”; page 5: “… and a post-processing that implements remaining effects in image space…”; page 6 – 9 section 2.2 image post-processing).

Handa_2016 in view of Schlette_2014 in view of Houlton_2011 in view of Andreas_2016.are analogous art because they are from the same field of endeavor called cameras. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and andreas_2016. The rationale for doing so would have been Schlette_2014 teaches to model a camera and teaches to include extensions as they are needed (page 748). Andreas_2015 teaches to extend a camera model by including various effects which increase the reality of the rendered image. Therefore it would have been obvious to combine Schlette_2014 and Andreas_2016 for the benefit of improving the reality of the produced dataset to obtain the invention as specified in the claims.


(8)  Claims 16 are rejected under 35 U.S.C. 103 as being unpatentable over Handa_2016 in view of Schlette_2014 in view of Houlton_2011 in view of Andreas_2016 in view of Ringaby_2012 in view of Medeiros_2014 in view of Landau_2016 (Simulating Kinect Infrared and Depth Images, IEEE transactions on Cybernetics, VOL. 46, No. 12, December 2016).

Claim 16. Handa_2016 in view of Schlette_2014 in view of Houlton_2011 teach all the limitations of claim 11. Schlette_2014 also makes obvious “wherein: applying pre-processed effects comprises lens distortion, and noise (page 748 fig 3 “distortion” page 748 section 2.3: “… the sequential arrangement of different optical effects is not interchangeable and needs to be computed in the right order, e.g., lens distortion has to be added before various noise effects…”); 

Handa_2016 in view of Schlette_2014 in view of Houlton_2011 does not explicitly teach “shutter effects” nor “lens scratch and grain” nor “and wherein applying post processing comprises smoothing trimming and hole filling.”

Ringaby_2012 makes obvious “shutter effects” (page iii abstract: “rolling shutter”)

Handa_2016 in view of Schlette_2014 in view of Houlton_2011 and Ringaby_2012 are analogous art because they are from the same field of endeavor called graphics or cameras. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Ringaby_2012. The rationale for doing so would have been that Schlette_2014 teaches to model and simulate a camera sensor and to “allow for high performance simulation of different optical and electronic effect” and Ringaby_2012 teaches that rolling shutter is an electronic effect resulting from the way sensors operate. Therefore it would have been obvious to combine Schlette_2014 and Ringaby_2012 for the benefit of having a more realistic camera simulation to obtain the invention as specified in the claims.

Handa_2016 in view of Schlette_2014 in view of Houlton_2011 and Ringaby_2012 do not explicitly teach “lens scratch and grain” nor “and wherein applying post processing comprises smoothing, , and hole filling.”

Landau_2016 makes obvious “and wherein applying post processing comprises smoothing, and hole filling (page 3025: “… our model utilizes post filtered IR images…”; page3030: “… estimation step can be improved by incorporating the suggested local smoothing/region growing algorithm which would account for the observed smaller depth errors…”; page 3020: “… Kinect uses 2X2 binning to downsample the IR image to 640X512 pixels, where a transmitted dot then fills a single pixel on an ideal surface… for simplicity, the shape and size of each dot fills a single cell (pixel) of the constructed grid…”).

Handa_2016 in view of Schlette_2014 in view of Houlton_2011 and Ringaby_2012 and Landau_2016 are analogous art because they are from the same field of endeavor called cameras. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Landau_2016. The rationale for doing so would have been Schlette_2014 teaches to simulate a Kinect camera and Landau_2016 also teaches to simulate a Kinect camera. Therefore it would have been obvious to combine Schlette_2014 and Landau_2016 for the benefit of simulating more features to have a more realistic model/simulation to obtain the invention as specified in the claims.

Handa_2016 in view of Schlette_2014 in view of Houlton_2011 and Ringaby_2012 and Landau_2016 does not explicitly teach “lens scratch and grain” nor “trimming.”

Medeiros_2014 makes obvious “trimming” (page 6: “… low pass filter of the original pattern with a kernel whose support depends on the depth of the pixel…” Figure 7: “… depth of the scene as seen by the projector. The focal plane is placed halfway between the maximum and minimum depth; (middle) final scene as seen by the camera; (right) details of the rectangle highlighted in the middle image…” NOTE: the filter is trimming the depending on the depth of the pixels. In figure 7 the depth is half way between the max and min depth and the image is trimmed accordingly.

Handa_2016 in view of Schlette_2014 in view of Houlton_2011 and Ringaby_2012 and Landau_2016 and Medeiros_2014 are analogous art because they are from the same field of endeavor called cameras. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Medeiros_2014. The rationale for doing so would have been that Schlette_2014 teaches to simulate structured light scanners/cameras to obtain generated scans and ground truth and Medeiros_2014 teaches to benchmark structured light cameras with synthetic san and ground truth and simulate illumination defocus by trimming the depth range. Therefore it would have been obvious to combine Schlette_2014 and Medeiros_2014 for the benefit of simulating more features to have a more realistic simulation capable of simulating defocus to obtain the invention as specified in the claims.

Handa_2016 in view of Schlette_2014 in view of Houlton_2011 and Ringaby_2012 and Landau_2016 and Medeiros_2014 does not explicitly teach “lens scratch and grain” however; this would have been obvious to one of ordinary skill in the art because Schlette_2014 teaches to simulate distortions caused by the lens and a scratch on the lens and the grain of the lens causes distortions in the image.

(9)  Claims 17 are rejected under 35 U.S.C. 103 as being unpatentable over Schlette_2014 in view of Handa_2016 

Claim 17.   Schletter_2014 teaches “A method for data generation, the method comprising: 
simulating (101) a sensor for capturing depth data of a target object;
simulating (103) environmental illumination for capturing depth data of the target object;
simulationg (105) analytical processing of captured depth data of the target object; and
generating (107) synthetic depth data of the target object based on the simulated sensor, environmental illuminations and analytical processing (Fig. 3, page 745: “… platform is equipped with a multi-sensor setup consisting of stereo cameras and depth scanning devices… Following the eRobotics methodology, a simulatable 3D representation of this platform was modelled in virtual reality. Based on detailed camera and sensor simulation, we generated a set of benchmark images and point clouds with controlled levels of noise as well as ground truth data such as object positions…”; page 746: “… sensors… for deriving depth information, Since depth information is crucial for evaluating 3D object poses as well as spatial relations, sensors based on scanning mechanisms to directly generate depth information are widely used… associating the measurements with (quantized) reference depths then results in clouds of points in sensor coordinates… the benchmark platform is equipped with… three Microsoft Kinect RGB-D devices and two projectors for shedding structured light on the scene… using simulatable 3D representations of the benchmark platform in a VR system (see fig 1) following the eRobotics methodology… the detailed camera and sensor simulation…”; page 747 – 748 section 2.3: “the VR system features a camera and sensor simulation… GPUs for hardware accelerated real-time rendering… simulation the provides a real-time simulationg of various optical and electronic effects… allows extensions of further effects as they are needed… the input data consists of the geometric descriptions of the scene and lighting conditions (direction, color, and lighting model) which are combined to ensure real-time visualization… a realistic visualization of a scenario… depth of field is supported by our simulation…” page 749 section 3, 3.1: “pose estimation”, Fig. 8; NOTE: Schlette_2014 teaches to follow the eRobotics methodology which means to simulate the robot and a set of simulated components which each perform their tasks realisitics. Schlette_2014 teaches that the depth data is crucial for pose estimation and section 3 teaches to perform simulate pose estimation using the data generated rendered images from the simulated cameras and sensors. Therefore the pose estimation is processing the rendered images and performing an analysis to obtain the estimated pose and therefore the pose estimation is an analytical processing.)

While Schlette_2014 teaches to generate data from a simulation and while this may properly be found to imply to one of ordinary skill in the art that the generated data is “synthetic” data, Schlette_2014 does not explicitly recite “synthetic.” Therefore it is found that Schlette_2014 does not explicitly teach “synthetic” data.

Handa_2016; however, explicitly teaches “synthetic” data and depth data (title: “understanding real world indoor scenes with synthetic data”; abstract: “synthetic”; page 2 “synthetic data is already used for many computer vision problems… synthetic ground truth data…”; page 2 – 3 section 3).

Schlette_2014 and Handa_2016 are analogous art because they are from the same field of endeavor called finite cameras. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Handa_2016. The rationale for doing so would have been that Schlette_2014 teaches to generate depth data using computer calculations and models. Handa_2016 teaches that generated data is synthetic data. Therefore it would have been obvious to combine Schlette_2014 and Handa_2016 for the benefit of understanding that generated data is synthetic to obtain the invention as specified in the claims.

(10)  Claims 18 are rejected under 35 U.S.C. 103 as being unpatentable over Schlette_2014 in view of Handa_2016  in view of andreas_2016 in view of  Ringaby_2012.

Claim 18. Schlette_2014 and Handa_2016 teach all the limitations of claim 17. Schlette_2014 makes obvious “lens distortion” (page 748: “lens distortion” Figure 3: “distortion”).  Schlette_2014 and Handa_2016 does not explicitly teach “wherein simulating (101) the sensor comprises simulating quantization effects, noise, motion, and shutter effects.”

andreas_2016  however makes obvious “wherein simulating (101) the sensor comprises simulating quantization effects (page 4 section 2 : “digitaization and quantization”) noise, motion (page 7 section 2.2 Fig. 4: “motion blur”, “radial distortion”, “sensor noise”; page 13 section 4 par 2), 

Schlette_2014 and Handa_2016 and andreas_2016 are analogous art because they are from the same field of endeavor called cameras. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and andreas_2016. The rationale for doing so would have been Schlette_2014 teaches to model a camera and teaches to include extensions as they are needed (page 748). Andreas_2015 teaches to extend a camera model by including various effects which increase the reality of the rendered image. Therefore it would have been obvious to combine Schlette_2014 and Andreas_2016 for the benefit of improving the reality of the produced dataset to obtain the invention as specified in the claims.

Schlette_2014 and Handa_2016 and andreas_2016 does not explicitly teach “and shutter effects.”

Ringaby_2012 makes obvious “and shutter effects.” (abstract: “… almost all CMOS sensors make use of what is called a rolling shutter. Compared to a global shutter, which image all the pixels at the same time, a rolling shutter camera exposes the image row-by-row. This leads to geometric distortions in the image when either the camera or the object is moving…”).

Schlette_2014 and Ringaby_2012 are analogous art because they are from the same field of endeavor called graphics or cameras. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Ringaby_2012. The rationale for doing so would have been that Schlette_2014 teaches to model and simulate a camera sensor and to “allow for high performance simulation of different optical and electronic effect” and Ringaby_2012 teaches that rolling shutter is an electronic effect resulting from the way sensors operate. Therefore it would have been obvious to combine Schlette_2014 and Ringaby_2012 for the benefit of having a more realistic camera simulation to obtain the invention as specified in the claims.


(11)  Claims 19 are rejected under 35 U.S.C. 103 as being unpatentable over Schlette_2014 in view of Handa_2016 in view of in view of Medeiros_2014.

Claim 19. Schlette_2014 in view of Handa_2016 make obvious all the limitations of claim 17 as outlined above. Schlette_2014 makes obvious “wherein simulating (103) environmental illuminations comprises simulating  light and light sources” (Fig. 3 “light position” “lighting model” “lighting” “lighting shader”).

Schlette_2014 in view of Handa_2016 does not explicitly teach “ambient” light.

Medeiros_2014 teaches “ambient” light (page 16: “… ambient light…”).

Handa_2016 and Schlette_2014 and Medeiros_2014 are analogous art because they are from the same field of endeavor called cameras. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Medeiros_2014. The rationale for doing so would have been the Schlette_2014 teaches to model lighting and Medeiros_2014 teaches that lighting includes ambient light. Therefore it would have been obvious to combine Schlette_2014 and Medeiros_2014 for the benefit of modeling light sources to improve the reality of the rendering to obtain the invention as specified in the claims.


(12)  Claims 20 are rejected under 35 U.S.C. 103 as being unpatentable over Schlette_2014 in view of Handa_2016 in view of Landau_2016 and Medeiros_2014.

Claim 20. Schlette_2014 in view of Handa_2016 teaches all the limitations of claim 17. Schlette_2014 and Handa_2016 does not explicitly teach “wherein simulating (105) comprises simulating smoothing, trimming, and hole-filling.”

Landau_2016 makes obvious “wherein simulating (105) comprises simulating smoothing, and hole-filling.” (page 3025: “… our model utilizes post filtered IR images…”; page3030: “… estimation step can be improved by incorporating the suggested local smoothing/region growing algorithm which would account for the observed smaller depth errors…”; page 3020: “… Kinect uses 2X2 binning to downsample the IR image to 640X512 pixels, where a transmitted dot then fills a single pixel on an ideal surface… for simplicity, the shape and size of each dot fills a single cell (pixel) of the constructed grid…”).

Schlette_2014 in view of Handa_2016 in view of Landau_2016 are analogous art because they are from the same field of endeavor called cameras. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Landau_2016. The rationale for doing so would have been Schlette_2014 teaches to simulate a Kinect camera and Landau_2016 also teaches to simulate a Kinect camera. Therefore it would have been obvious to combine Schlette_2014 and Landau_2016 for the benefit of simulating more features to have a more realistic model/simulation to obtain the invention as specified in the claims.

Schlette_2014 in view of Handa_2016 in view of Landau_2016does not explicitly teach “trimming.”

Medeiros_2014 makes obvious “trimming” (page 6: “… low pass filter of the original pattern with a kernel whose support depends on the depth of the pixel…” Figure 7: “… depth of the scene as seen by the projector. The focal plane is placed halfway between the maximum and minimum depth; (middle) final scene as seen by the camera; (right) details of the rectangle highlighted in the middle image…” NOTE: the filter is trimming the depending on the depth of the pixels. In figure 7 the depth is half way between the max and min depth and the image is trimmed accordingly.

Schlette_2014 in view of Handa_2016 in view of Landau_2016 and Medeiros_2014 are analogous art because they are from the same field of endeavor called cameras. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Schlette_2014 and Medeiros_2014. The rationale for doing so would have been that Schlette_2014 teaches to simulate structured light scanners/cameras to obtain generated scans and ground truth and Medeiros_2014 teaches to benchmark structured light cameras with synthetic san and ground truth and simulate illumination defocus by trimming the depth range. Therefore it would have been obvious to combine Schlette_2014 and Medeiros_2014 for the benefit of simulating more features to have a more realistic simulation capable of simulating defocus to obtain the invention as specified in the claims.





CONCLUSION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN S COOK whose telephone number is (571)272-4276. The examiner can normally be reached 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamini S. Shah can be reached on 571-272-2279. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRIAN S COOK/Primary Examiner, Art Unit 2146