Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner’s Comment
This action supersedes and replaces the previous Non-Final Office Action dated February 3, 2022 as it inadvertently omitted to list the prior art in regards to #12.
Priority
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Claim limitation “a SLAM trainer module” and “an application module” has/have been interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because it uses/they use a generic placeholder  “unit” coupled with functional language “configured to” without reciting sufficient structure to achieve the function.  Furthermore, the generic placeholder is not preceded by a structural modifier.
Since the claim limitation(s) invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, claim(s) 1 and 12 has/have been interpreted to cover the corresponding structure described in the specification that achieves the claimed function, and equivalents thereof.  
A review of the specification shows that the following appears to be the corresponding structure described in the specification for the 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph limitation: the computer processor of paragraph 0088 as published and/or the camera of Figure 4.  
If applicant wishes to provide further explanation or dispute the examiner’s interpretation of the corresponding structure, applicant must identify the corresponding structure with reference to the specification by page and line number, and to the drawing, if any, by reference characters in response to this Office action. 
If applicant does not intend to have the claim limitation(s) treated under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112 , sixth paragraph, applicant may amend the claim(s) so that it/they will clearly not invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, or present a sufficient showing that the claim recites/recite sufficient structure, material, or acts for performing the claimed function to preclude application of 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance With 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


The claimed invention is directed to non-statutory subject matter.  Claim 21 is directed to a program, and programs per se are not patent eligible subject matter. See MPEP 2106 I:  “Non-limiting examples of claims that are not directed to one of the statutory categories: 
vi. a computer program per se, Gottschalk v. Benson, 409 U.S. at 72.”
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 12-16 and 20-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over The MORPH Project: Actual Results to Kalwa et al., hereinafter, “Kalwa” in view of SfM-Net: Learning of Structure and Motion from Video to Vijayanarasimhan et al., hereinafter, “Vijayanarasimhan”.
Claim 1. A system for operating a remotely operated vehicle (ROV) using simultaneous localization and mapping (SLAM) comprising: Kalwa [page 1] teaches it is for the above reasons that today only remotely controlled vehicles (ROVs) are employed in these rough and truly three-dimensional terrains. ROVs are connected to a surface support ship via an umbilical cable that allows for remote control and immediate access to all payload data. However (and the same happens with currently existing AUVs), accurate global positioning is virtually impossible to achieve when the ROV operates close to complex underwater terrain features (e.g. vertical walls or walls with a negative slope)… Proposing a solution to this problem is the main focus of European Project MORPH [2]. In general, the term “morphing” denotes the physical transformation of the appearance of an object. In the scope of this project we introduce “morphing” as the concept of physically shaping an underwater robot composed of distributed single robotic nodes, each of them offering limited but possibly complementary capabilities. For orientation and safe motion in rugged terrain an ideal vehicle must have a certain desistance from any objects. Its “eyes” scan the environment for possible threats such as overhanging cliffs or protruding rocks. On the other side, the vehicle should be very close to these objects in order to map fauna and flora of the local area.
Kalwa [page 5] teaches it has been found that an Octomap representation of the environment extracted from single scanned swath is sufficient for obstacle avoidance, although it will differ from the real world because of the drift error inherent in any dead-reckoning navigation [9]. An improved solution for this problem is to implement a real-time SLAM framework to build and update the octree map. Although future work at the UdG will deal with this topic, at this point only Terrain Based Navigation (localization with a known map) has been implemented as a preliminary step before attempting a full SLAM.
a ROV with (i) a video camera operable to output real video and (ii) a positional sensor operable to output position data; Kalwa [page 3] teaches the requirements above lead to the fact that multibeam echosounders and high definition cameras are the payload sensors of choice… The core system for mapping the environment is made of one multibeam echosounder onboard of a local sonar vehicle (LSV) and two cameras onboard of camera vehicles (C1V and C2V).
a SLAM engine comprising: a video dataset operable to store video data and real images coming from the ROV; Kalwa [page 2] teaches because they are long-lived, slow-growing, and fragile, cold-water corals are particularly vulnerable to impacts from human activities such as bottom fisheries, hydrocarbon drilling and seabed mining. In order to produce a detailed scientific analysis of the habitat characteristics and an understanding of factors regulating the CWC presence/abundance there is the need to record following data:
• a high-resolution digital terrain map (resolution: 1- 10 cm) of the cliffs, overhangs and neighboring leveled areas;
• a high-resolution geo-referenced imagery dataset of the same area (1-5 mm/pixel);
• a high-resolution geo-referenced acoustic backscatter dataset of the same area;
• 3D re-construction of the cliffs with draped imagery;
• geo-referenced near-bottom physical-chemical measurements (salinity, temperature, depth, light intensity (PAR radiation), pH, turbidity, current intensity);
• geo-referenced biological occurrences (i.e., coral colony positions and surface areas extracted automatically from imagery);
a depth dataset operable to store depth maps; Kalwa [page 2] teaches all vehicles use their own depth information to feed their Navigation System and improve the position estimate in the vertical axis.
a 3D model dataset operable to store 3D model data of a scene where an ROV may operate; Kalwa [page 2] teaches because they are long-lived, slow-growing, and fragile, cold-water corals are particularly vulnerable to impacts from human activities such as bottom fisheries, hydrocarbon drilling and seabed mining. In order to produce a detailed scientific analysis of the habitat characteristics and an understanding of factors regulating the CWC presence/abundance there is the need to record following data:
• a high-resolution digital terrain map (resolution: 1- 10 cm) of the cliffs, overhangs and neighboring leveled areas;
• a high-resolution geo-referenced imagery dataset of the same area (1-5 mm/pixel);
• a high-resolution geo-referenced acoustic backscatter dataset of the same area;
• 3D re-construction of the cliffs with draped imagery;
• geo-referenced near-bottom physical-chemical measurements (salinity, temperature, depth, light intensity (PAR radiation), pH, turbidity, current intensity);
• geo-referenced biological occurrences (i.e., coral colony positions and surface areas extracted automatically from imagery);
Kalwa [Figure 1]
and the model's weights dataset, wherein the application module is operable to smooth the position data, Kalwa [page 3] teaches the LSV does its own local navigation (with DVL) and carries an USBL transceiver to measure GCV’s position with good accuracy because the range is small. The LSV broadcasts that position periodically (at a slow rate) which allows for the GCVs to track the path of LSV.
and an application module communicatively coupled to the ROV and operable to receive the real video, the position data, reconstruct the scene, and display the scene on a graphical user interface. Kalwa [page 3] teaches the mapping task requires a high accuracy in cooperative navigation of the nodes a) in order to geo-reference the scientific data and b) to bring the nodes very close to each other. The only way of referencing the data with a global position is using GPS installed on a surface support vessel (SSV). In principle this vehicle could be used to measure positions to the underwater systems using a standard ultrashort baseline system (USBL). But accuracy decreases significantly with depth and sea state. A solution to the problem is to “anchor” the MORPH supra-vehicle to a special underwater node, far from the influence of sea waves, devoted to improving the navigation accuracy and relaying communications – the global communication vehicle (GCV). In the particular case of overhanging cliffs the usage of the GCV will be instrumental in establishing the link between the nodes under the cliff and the SSV –
Kalwa [page 4] teaches a user interface for programming and visualization is being created by IUT on the basis of QGIS (http://www.qgis.org). The geographical system allows using geo-referenced data as underlying graphics for mission planning. Such data may be any seachart or even sonardata or results from previous missions. A plug-in has been implemented which allows to initiate a goal oriented mission of the morph system. Further additions have been made for mission monitoring and evaluation.
a depth map simulator with access to the 3D model dataset and a set of camera parameters, wherein the depth map simulator is operable to synthesize a depth map for storage in the depth dataset; Kalwa [page 5] teaches it has been found that an Octomap representation of the environment extracted from single scanned swath is sufficient for obstacle avoidance, although it will differ from the real world because of the drift error inherent in any dead-reckoning navigation [9]. An improved solution for this problem is to implement a real-time SLAM framework to build and update the octree map.
Kalwa fails to explicitly teach camera parameters, however Vijayanarasimhan, in the field of motion estimation in videos, [Introduction, pages 1-2] teaches we propose SfM-Net, a neural network that is trained to extract 3D structure, ego-motion, segmentation, object rotations and translations in an end-to-end fashion in videos, by exploiting the geometry of image formation. Given a pair of frames and camera intrinsics, SfM-Net, depicted in Figure 1, computes depth, 3D camera motion, a set of 3D rotations and translations for the dynamic objects in the scene, and corresponding pixel assignment masks…
a model's weights dataset operable to store weights of the SLAM engine; Vijayanarasimhan [Introduction, pages 1-2] teaches in contrast to those, instead of optimizing directly over optical flow vectors, 3D point coordinates or camera rotation and translation, our model optimizes over neural network weights that, given a pair of frames, produce such 3D structure and motion.
Vijayanarasimhan [3.2. Supervision, page 4] teaches a learning-based solution, as opposed to direct optimization, has the advantage of learning to handle such ambiguities through partial supervision of their weights or appropriate pre-training, or simply because the same coefficients (network weights) need to explain a large abundance of video data consistently.
a SLAM trainer module with access to the video dataset and the depth dataset, wherein the SLAM trainer module is operable to run a SLAM-Net architecture; Vijayanarasimhan [Introduction, pages 1-2] teaches we propose SfM-Net, a neural network that is trained to extract 3D structure, ego-motion, segmentation, object rotations and translations in an end-to-end fashion in videos, by exploiting the geometry of image formation. Given a pair of frames and camera intrinsics, SfM-Net, depicted in Figure 1, computes depth, 3D camera motion, a set of 3D rotations and translations for the dynamic objects in the scene, and corresponding pixel assignment masks…
Vijayanarasimhan [3.2. Supervision, page 4] teaches a learning-based solution, as opposed to direct optimization, has the advantage of learning to handle such ambiguities through partial supervision of their weights or appropriate pre-training, or simply because the same coefficients (network weights) need to explain a large abundance of video data consistently.
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify a remotely operated vehicle (ROV) using simultaneous localization and mapping (SLAM) by Kalwa with Vijayanarasimhan’s teaching of a depth map simulator with access to the 3D model dataset and a set of camera parameters. One would have been motivated to perform this combination due to the fact that it allows SfM-Net learns to predict structure, object, and camera motion by training on realistic video sequences using limited ground-truth annotations (Vijayanarasimhan, [Introduction]). In combination, Kalwa is not altered in that Kalwa continues to analyze images acquired from a ROV. Vijayanarasimhan's teachings perform the same as they do separately of perform motion estimation.
Therefore one of ordinary skill in the art, such as an individual working in the field of object detection could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 1.
Claim 2. The system of claim 1, wherein the SLAM-Net architecture comprises a set of input frames. Vijayanarasimhan [Figure 1.] SfM-Net: Given a pair of frames as input, our model decomposes frame-to-frame pixel motion into 3D scene depth, 3D camera rotation and translation, a set of motion masks and corresponding 3D rigid rotations and translations. It backprojects the resulting 3D scene flow into 2D optical flow and warps accordingly to match pixels from one frame to the next. Forward-backward consistency checks constrain the estimated depth.
Claim 3. The system of claim 1, wherein the SLAM-Net architecture comprises a depth map, a set of camera motions represented as transformation matrices, segmentation masks, and a plurality of convolutional neural networks. Vijayanarasimhan [Figure 2]
Claim 4. The system of claim 3, wherein the SLAM-Net architecture comprises at least one skip connection. Vijayanarasimhan [Figure 2]
Claim 5. The system of claim 1, further comprising: a set of unlabeled videos stored in the video dataset; wherein the SLAM engine receives the set of unlabeled videos from the video dataset and minimizes photometric error between a target frame and a set of remaining frames. Vijayanarasimhan [Self-Supervision, pages 4-5]
Claim 6. A system of claim 1, wherein the SLAM engine segments a plurality of pixels from the video data. Vijayanarasimhan [Figure 2]
Claim 12. It is similarly recited as claim 1. Therefore claim 12 has been analyzed and reviewed in the same way as claim 1. See the above analysis. 
Claim 13. It was analyzed and reviewed in the same way as claim 2. See the above analysis. 
Claim 14. It was analyzed and reviewed in the same way as claim 3. See the above analysis. 
Claim 15. It was analyzed and reviewed in the same way as claim 5. See the above analysis. 
Claim 16. It was analyzed and reviewed in the same way as claim 6. See the above analysis. 
Claim 20. It differs from claim 1 in that it is a method performed by the system of claim 1. Therefore claim 20 has been analyzed and reviewed in the same way of claim 1. Se the above analysis. 
Claim 21. A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out the steps of the method of claim 20. See analysis of claim 20.
Claims 7-8 and 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over The MORPH Project: Actual Results to Kalwa et al., hereinafter, “Kalwa” in view of SfM-Net: Learning of Structure and Motion from Video to Vijayanarasimhan et al., hereinafter, “Vijayanarasimhan” and in further view of US 2019/0147220 A1 to Mccormac et al., hereinafter, “Mccormac”.
Claim 7. The system of claim 1, wherein the SLAM engine is operable to perform bilinear sampling by linearly interpolating an intensity value of four discrete pixel neighbors of a homogeneous pixel coordinate projection. Mccormac [0074] teaches frames of video data 415 may be rescaled to a 224 by 224 resolution using bilinear interpolation for RGB pixel values. In certain cases, an output of the image classifier 455 may also be rescaled to match a resolution of the correspondence data 460. For example, an output of the image classifier 455 may be rescaled to a 640 by 480 image resolution using a nearest neighbour upscaling method.
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify a depth map simulator with access to the 3D model dataset and a set of camera parameters by Kalwa and Vijayanarasimhan’s with Mccormac’s teaching of bilinear interpolation. One would have been motivated to perform this combination due to the fact that it allows processing video data to enable detection and labelling of objects deemed to be present in a scene (Mccormac, [0011]). In combination, Kalwa is not altered in that Kalwa continues to analyze images acquired from a ROV. Vijayanarasimhan's teachings perform the same as they do separately of perform motion estimation. Mccormac continues to detect and label of objects in a scene.
Therefore one of ordinary skill in the art, such as an individual working in the field of object detection could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 7.
Claim 8. The system of claim 7, wherein the SLAM engine tracks at least one point across a plurality of frames. Vijayanarasimhan [Figure 1.] teaches SfM-Net: Given a pair of frames as input, our model decomposes frame-to-frame pixel motion into 3D scene depth, 3D camera rotation and translation, a set of motion masks and corresponding 3D rigid rotations and translations. It backprojects the resulting 3D scene flow into 2D optical flow and warps accordingly to match pixels from one frame to the next. Forward-backward consistency checks constrain the estimated depth.
Claim 17. It was analyzed and reviewed in the same way as claim 7. See the above analysis. 
Claim 18. It was analyzed and reviewed in the same way as claim 8. See the above analysis. 
Claims 9-11 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over The MORPH Project: Actual Results to Kalwa et al., hereinafter, “Kalwa” in view of SfM-Net: Learning of Structure and Motion from Video to Vijayanarasimhan et al., hereinafter, “Vijayanarasimhan” and in further view of US 2019/0147220 A1 to Mccormac et al., hereinafter, “Mccormac” and US 2020/0041276 A1 to Chakravarty et al., hereinafter, “Chakravarty”.
Claim 9. The system of claim 8, wherein the SLAM engine uses a GAN to learn a depth prior to improve a depth map. Vijayanarasimhan [2. Related work] teaches they use synthetic data to pre-train the 2D to 3D mapping of their network.
Chakravarty [0045] teaches the vehicle controller may implement the result of the VAE -GAN 301 into a SLAM algorithm for computing simultaneous localization and mapping of the vehicle in real-time.
Chakravarty [0053] teaches the GAN generator 404 is pretrained offline before the GAN generator 404 receives an RGB image 402 from a monocular camera. In an embodiment, the GAN discriminator 408 is pretrained before the GAN generator 404 is trained and this may provide a clearer gradient.
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify remotely operated vehicle (ROV) using simultaneous localization and mapping (SLAM) by Kalwa, Vijayanarasimhan’s and Mccormac’s with Chakravarty’s teaching of bilinear interpolation. One would have been motivated to perform this combination due to the fact that it perform other avoidance or safety maneuvers of an autonomous vehicles (ROV) (Chakravarty, [0003]). In combination, Kalwa is not altered in that Kalwa continues to analyze images acquired from a ROV. Vijayanarasimhan's teachings perform the same as they do separately of perform motion estimation. Mccormac continues to detect and label of objects in a scene. Chakravarty continues to access dangers involved in driving. 
Therefore one of ordinary skill in the art, such as an individual working in the field of object detection could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 9.
Claim 10. The system of claim 9, wherein the GAN comprises a generator network operable to output at least one fake example and a discriminator network operable to distinguish between the at least one fake example and a real example. Vijayanarasimhan [2. Related work] teaches they use synthetic data to pre-train the 2D to 3D mapping of their network.
Vijayanarasimhan [Learning-based motion estimation, page 2] teaches Recent works [7, 20, 29] propose learning frame-to-frame motion fields with deep neural networks supervised with ground-truth motion obtained from simulation or synthetic movies.
Vijayanarasimhan [Supervising optical flow and object motion, page 5] teaches Ground-truth optical flow, object masks, or object motions require expensive human annotation on real videos. However, these signals are available in recent synthetic datasets [20].
Claim 11. The system of claim 10, wherein the SLAM engine synthesizes depth maps using a 3D model depiction of a real scene. Vijayanarasimhan [3.1. SfM-Net Architecture] 
Claim 19. It was analyzed and reviewed in the same way as claims 9 and 10. See the above analysis. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DELOMIA L GILLIARD whose telephone number is (571)272-1681. The examiner can normally be reached 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on 571 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DELOMIA L GILLIARD/Primary Examiner, Art Unit 2661