Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Response to Arguments
Applicant’s arguments, see Remarks, filed September 27, 2021, with respect to the rejection(s) of claim(s) 1 under 35 USC 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of SLAM-based Cooperative Calibration for Optical Sensors Array with GPS/IMU Aided to Wang et al.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 6, 9-12 and 14-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras to Mur-Artal et al., hereinafter, “Mur-Artal” in view of Improving stereo vision based SLAM by integrating inertial measurements for person indoor navigation to Albrecht et al., hereinafter, “Albrecht” and SLAM-based Cooperative Calibration for Optical Sensors Array with GPS/IMU Aided to Wang et al., hereinafter, “Wang”.
Claim 1. A device for performing simultaneous localization and mapping (SLAM), the device comprising at least one processor configured to: Mur-Artal [Abstract] teaches We present ORB-SLAM2 a complete SLAM system for monocular, stereo and RGB-D cameras, including map reuse, loop closing and relocalization capabilities.

sequentially process, in a second processing stage, each frame of the frame sequence based on the visual feature set and the sensor readings comprised in that frame in order to generate a sequence mapping graph; Mur-Artal [III. ORB-SLAM2] teaches the system maintains a covisibiliy graph [8] that links any two keyframes observing common points and a minimum spanning tree connecting all keyframes.

Mur-Artal [C. Bundle Adjustment with Monocular and Stereo Constraints]
preprocess, in a first processing stage, a received data sequence comprising multiple images recorded by a camera and sensor readings from multiple sensors in order to obtain a frame sequence, each frame of the frame sequence comprising a visual feature set related to one of the images at a determined time and the respective sensor readings from the determined time; Mur-Artal Fig.2 (b) Input pre-processing
Mur-Artal Fig.2 teaches the tracking thread pre-processes the stereo or RGB-D input so that the rest of the system operates independently of the input sensor. Although it is not shown in this figure, ORB-SLAM2 also works with a monocular input as in [1].

Mur-Artal [A. Monocular, Close Stereo and Far Stereo Keypoints] teaches ORB-SLAM2 as a feature-based method pre-processes the input to extract features at salient keypoint locations, as shown in Fig. 2b. The input images are then discarded and all system operations are based on these features, so that the system is independent of the sensor being stereo or RGB-D.

While Mur-Artal fails explicitly teach multiple sensors, Albrecht, in the field of simultaneous localization and mapping (SLAM) [Abstract] teaches stereo SLAM algorithms achieve remarkable results, nevertheless there are limits to what a single sensor type system can achieve in case of over- or underexposed images, highly dynamic camera movements and homogeneous environments. This paper proposes a straight forward yet very efficient method of integrating Inertial Measurement Unit (IMU) data to overcome those problems.

Albrecht [Introduction] teaches experimental evaluation is conducted on a person carried sensor platform including a stereo camera setup combined with an IMU based on Fiber Optic Gyroscopes (FOG) and Micro-Electro- Mechanical Systems (MEMS) accelerometers.

Albrecht [Figure 2.] Examiner understands the gyroscope and accelerometer to teach multiple sensors
and merge, in a third processing stage, the sequence mapping graph with at least one other graph in order to generate or update a full graph. Wang [3.4 Loop Closure Detection] and [3.5 Robust Pose Graph Optimization]
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify a device for performing simultaneous localization and mapping (SLAM) by Mur-Artal and  Albrecht’s teaching of a received data sequence comprising multiple images recorded by a camera and sensor readings from multiple sensors in order to obtain a frame sequence with Wang’s teaching of merging the sequence mapping graph with at least one other graph in order to generate or update a full graph . One would have been motivated to perform this combination due to the fact that it allows one to efficiently integrating sensors to improve stereo vision [Albrecht, Abstract]. In combination, Mur-Artal is not altered in that Mur-Artal continues to perform SLAM technology. Albrecht 's teachings perform the same as they do separately of using multiple sensors in SLAM technology. Wang continues to teach loop closure detection. 
Therefore one of ordinary skill in the art, such as an individual working in the field of simultaneous localization and mapping (SLAM) could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 1.

Claim 2. Mur-Artal further teaches wherein: the visual feature set comprises an image feature set comprising one or more 2D key points extracted from the related one of the images, descriptors corresponding to the 2D key points, and disparity or depth information of the 2D key points. Mur-Artal [A. Monocular, Close Stereo and Far Stereo Keypoints] teaches ORB-SLAM2 as a feature-based method pre-processes the input to extract features at salient keypoint locations, as shown in Fig. 2b. The input images are then discarded and all system operations are based on these features, so that the system is independent of the sensor being stereo or RGB-D…

Claim 3. Mur-Artal further teaches wherein the at least one processor is configured to, in the first processing stage: extract an image from the data sequence, the image being one of the multiple images; rectify the image; extract the 2D key points from the rectified image; and generate the image feature set based on the extracted 2D key points. Mur-Artal [A. Monocular, Close Stereo and Far Stereo Keypoints] teaches ORB-SLAM2 as a feature-based method pre-processes the input to extract features at salient keypoint locations, as shown in Fig. 2b. The input images are then discarded and all system operations are based on these features, so that the system is independent of the sensor being stereo or RGB-D…

Claim 6. Mur-Artal further teaches wherein the at least one processor is configured to, in the second processing stage: perform camera tracking based on the visual feature set included in a respective frame of the frame set by matching 2D key points in the visual feature set to locally stored 3D key points, in order to obtain a camera pose associated with the respective frame. Mur-Artal [C. Bundle Adjustment with Monocular and Stereo Constraints] teaches our system performs BA to optimize the camera pose in the tracking thread (motion-only BA), to optimize a local window of keyframes and points in the local mapping thread (local BA), and after a loop closure to optimize all keyframes and points (full BA).

Claim 9. Mur-Artal further teaches wherein the at least one processor is further configured to, in the third processing stage: detect a presence of one or more loops or overlapping areas shared among the sequence mapping graph and the at least one further graph; merge the sequence mapping graph and the at least one further graph in order to obtain an intermediate graph; and perform a graph optimization on the intermediate graph based on the detected loops or the overlapping areas in order to obtain the full graph. Mur-Artal Fig. 2. ORB-SLAM2 is composed of three main parallel threads: tracking, local mapping and loop closing, which can create a fourth thread to perform full BA after a loop closure.

Mur-Artal [D. Loop Closing and Full BA]
Claim 10. Mur-Artal further teaches wherein at least two of the first processing stage, the second processing stage, or the third processing stage are performed in different processors of the at least one processor. Mur-Artal Fig.2 teaches the tracking thread pre-processes the stereo or RGB-D input so that the rest of the system operates independently of the input sensor.
Claim 11. Mur-Artal further teaches wherein: the device is a distributed device and comprises at least one terminal device and at least one network device, a processor of the terminal device is configured to perform the first processing stage and transmit the obtained frame sequence to the network device, a processor of the network device is configured to perform the second and third processing stages, and the at least one processor comprises the processor of the terminal device and the processor of the network device. Mur-Artal [Abstract] teaches the system works in real-time on standard CPUs in a wide variety of environments from small hand-held indoors sequences
Mur-Artal Fig.2 teaches the tracking thread pre-processes the stereo or RGB-D input so that the rest of the system operates independently of the input sensor.
Claim 12. Mur-Artal further teaches wherein the processor of the terminal device is further configured to: perform a real-time localization based on the frame sequence obtained in the first processing stage. Mur-Artal [Abstract] teaches the system works in real-time on standard CPUs in a wide variety of environments from small hand-held indoors sequences

Claim 14. Mur-Artal further teaches wherein the terminal device is located in a vehicle, and the vehicle comprises the at least one camera comprising the camera and at least one of the multiple sensors. Mur-Artal [EuRoC Dataset] teaches the recent EuRoC dataset [21] contains 11 stereo sequences recorded from a micro aerial vehicle (MAV) flying around two different rooms and a large industrial environment.
Claim 15. It differs from claim 1 in that it is a method performed by the device of claim 1. Therefore claim 15 has been analyzed and reviewed in the same way as claim 1. See the above analysis.
Claims 4-5 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras to Mur-Artal et al., hereinafter, “Mur-Artal” in view of Improving stereo vision based SLAM by integrating inertial measurements for person indoor navigation to Albrecht et al., hereinafter, “Albrecht” and SLAM-based Cooperative Calibration for Optical Sensors Array with GPS/IMU Aided to Wang et al., hereinafter, “Wang” and in further view of US 2016/0247290 A1 to Liu et al., hereinafter, “Liu”.
Claim 4. While the combination of Mur-Artal, Albrecht and Wang fails to explicitly teach the limitations of claim 4, however Liu, in the field of stereo imaging, teaches wherein the at least one processor is configured to, in the first processing stage: assign one or more semantic labels to pixels of the rectified image; Liu [Abstract] teaches a method labels an image of a street view by first extracting, for each pixel, an appearance feature for inferring a semantic label, a depth feature for inferring a depth label. Then, a column-wise labeling procedure is applied to the features to jointly determine the semantic label and the depth label for each pixel using the appearance feature and the depth feature
Liu [0008] teaches the invention provide a four-layered model for a street scene and a method for labeling semantic classes for components, such as ground, e.g., a road, moving objects such as pedestrians and vehicles, buildings, and the sky.
Liu [0017] teaches as shown in FIG. 3, it is an objective of the invention to jointly estimate a semantic label and depth for each pixel in the street view image using appearance and three-dimensional information.
and filter the image feature set based on the semantic labels to remove the 2D key points from the image feature set related to objects labelled as dynamic objects. Liu [0017] teaches as shown in FIG. 3, it is an objective of the invention to jointly estimate a semantic label and depth for each pixel in the street view image using appearance and three-dimensional information. We use a layered image interpretation. An image is horizontally partitioned into one to four layers of different semantic and depth components. Layer-1 is the ground plane, e.g., a road. Layer-2 can include pedestrians (peds) and other dynamic objects such as vehicles and motorcycles. Layer-3 contains buildings, and layer-4 is sky.
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify a device for performing simultaneous localization and mapping (SLAM) by Mur-Artal, Albrecht and Wang with Liu teaching of assigning one or more semantic labels to pixels of the rectified image. One would have been motivated to perform this combination due to the fact that it allows one to efficiently segmenting and depth estimating from stereo camera can be performed using a unified energy minimization framework [Liu, 0005]. In combination, Mur-Artal is not altered in that Mur-Artal continues to perform SLAM technology. Albrecht's teachings perform the same as they do separately of using multiple sensors in SLAM technology. Wang continues to teach loop closure detection.  Liu continues assign semantic labels to pixels in image data. 
Therefore one of ordinary skill in the art, such as an individual working in the field of simultaneous localization and mapping (SLAM) could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 4.
Claim 5. Mur-Artal further teaches wherein the at least one processor is further configured to, in the first processing stage: generate the visual feature set by adding a bag-of-words descriptor to the filtered image feature set, and generate a respective frame of the frame sequence by combining the visual feature set with the sensor readings from a same time instance of the image. Mur-Artal [D. Loop Closing and Full BA] teaches the system has embedded a Place Recognition module based on DBoW2 [16] for relocalization, in case of tracking failure (e.g. an occlusion) or for reinitialization in an already mapped scene, and for loop detection. The system maintains a covisibiliy graph [8] that links any two keyframes observing common points and a minimum spanning tree connecting all keyframes. These graph structures allow to retrieve local windows of keyframes, so that tracking and local mapping operate locally, allowing to work on large environments, and serve as structure for the pose-graph optimization performed when closing a loop.
Claim 16. Mur-Artal further teaches wherein the at least one processor is further configured to, in the first processing stage: generate the visual feature set by adding a hash table for searching the 2D key points. Mur-Artal Fig.2. Map
Mur-Artal [III. ORB-SLAM2] teaches the system maintains a covisibiliy graph [8] that links any two keyframes observing common points and a minimum spanning tree connecting all keyframes. These graph structures allow to retrieve local windows of keyframes, so that tracking and local mapping operate locally, allowing to work on large environments, and serve as structure for the pose-graph optimization performed when closing a loop.
Mur-Artal [D. Loop Closing and Full BA] teaches the system has embedded a Place Recognition module based on DBoW2 [16] for relocalization, in case of tracking failure (e.g. an occlusion) or for reinitialization in an already mapped scene, and for loop detection. The system maintains a covisibiliy graph [8] that links any two keyframes observing common points and a minimum spanning tree connecting all keyframes. These graph structures allow to retrieve local windows of keyframes, so that tracking and local mapping operate locally, allowing to work on large environments, and serve as structure for the pose-graph optimization performed when closing a loop.
Mur-Artal [D. Timing Results] teaches the higher density of the covisibility graph makes the local map contain more keyframes and points and therefore local map tracking and local BA are also more expensive.
Per specification [0018] generate a visual feature set by adding a bag-of-words descriptor to the filtered image feature set, and optionally a hash table for searching the 2D key points, and generate a frame by combining the visual feature set with sensor readings from the same time instance of the image
Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras to Mur-Artal et al., hereinafter, “Mur-Artal” in view of Improving stereo vision based SLAM by integrating inertial measurements for person indoor navigation to Albrecht et al., hereinafter, “Albrecht” and SLAM-based Cooperative Calibration for Optical Sensors Array with GPS/IMU Aided to Wang et al., hereinafter, “Wang” and in further view of ORB-SLAM: a Versatile and Accurate Monocular SLAM System to Mur-Artal et al., hereinafter, “Mur-Artal2”.
Claim 7. While the combination of Mur-Artal, Albrecht and Wang fails to explicitly teach the limitations of claim 7, however, Mur-Artal2, in the field of SLAM technology, teaches wherein the at least one processor is configured to: determine whether the frame is a key frame based on a number of the matched 2D key points. Mur-Artal2 [E. New Keyframe Decision] teaches the last step is to decide if the current frame is spawned as a new keyframe… 3) Current frame tracks at least 50 points.

Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify a device for performing simultaneous localization and mapping (SLAM) by Mur-Artal, Albrecht and Wang with Mur-Artal2 teaching of determining whether the frame is a key frame based on a number of the matched 2D key points. One would have been motivated to perform this combination due to the fact that it allows one to reduce redundant keyframes [Mur-Artal2, E. New Keyframe Decision]. In combination, Mur-Artal is not altered in that Mur-Artal continues to perform SLAM technology. Albrecht's teachings perform the same as they do separately of using multiple sensors in SLAM technology. Wang continues to teach loop closure detection. Mur-Artal2 continues determine whether the frame is a key frame based on a number of the matched 2D key points.

Therefore one of ordinary skill in the art, such as an individual working in the field of simultaneous localization and mapping (SLAM) could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 7.

Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras to Mur-Artal et al., hereinafter, “Mur-Artal” in view of Improving stereo vision based SLAM by integrating inertial measurements for person indoor navigation to Albrecht et al., hereinafter, “Albrecht” and SLAM-based Cooperative Calibration for Optical Sensors Array with GPS/IMU Aided to Wang et al., hereinafter, “Wang” and in further view of ORB-SLAM: a Versatile and Accurate Monocular SLAM System to Mur-Artal et al., hereinafter, “Mur-Artal2” and US 2019/0387209 A1 to Yang et al., hereinafter, “Yang”.
Claim 8. While the combination of Mur-Artal, Albrecht, Wang and Mur-Artal2 fails to explicitly teach the limitations of claim 8, however, Yang, in the field of stereo odometry, teaches  wherein the at least one processor is further configured to, in the second processing stage, based upon determining that the frame is the key frame: perform a first local bundle adjustment (LBA) based on a camera pose in order to obtain visual odometry information and a LBA graph;  calculate a fused camera pose based on the visual odometry information and the sensor readings included in the frame;  and perform a second LBA based on the fused camera pose and the LBA graph in order to obtain the sequence mapping graph. Yang [0043] teaches FIG. 3 depicts an example of a deep virtual stereo odometry calculation module 125 that includes a joint optimization module 330. The deep virtual stereo odometry calculation module 125 may be included in a monocular visual odometry system, such as the monocular visual odometry system 110 described above with regard to FIG. 1. In addition, the deep virtual stereo odometry calculation module 125 may receive data (e.g., as described above with regard to FIG. 1), such as camera frame data 215 received by the camera sensor 105. In some cases, the camera frame data 215 may include one or more groups of camera frames, such as a group of keyframes 311 and a group of additional camera frames 313. Based on the received data, the joint optimization module 330 may modify pose data. For example, the joint optimization module 330 may modify coarse tracking associated with pose data based on the camera frame data 215, including the keyframes 311 and the additional frames 313. 

Yang [0044] teaches in some implementations, a coarse tracking module 340 that is included in the deep virtual stereo odometry calculation module 125 is able to adjust pose data based on one or more camera frames in the camera frame data 215. For example, the coarse tracking module 340 may receive an initial pose estimate 329, such as pose data that includes a current estimation of the monocular visual odometry system's position and location based on the camera frame data 215 (e.g., a set of image points extracted from camera images). The initial pose estimate 329 may be assigned based on a motion model of the camera sensor 105. The assignment of the estimated pose data 331 may be performed by assuming camera motion between a most recent time step t-1 and a current time step t is the same as between a time step t-2 and the most recent time step t-1. In addition, the coarse tracking module 340 may receive a current camera frame (e.g., having a timestamp indicating a recent time of recording by a camera sensor), and a current keyframe from the group of keyframes 311 (e.g., having the most recent timestamp from the group of keyframes 311). 

Yang [0047] teaches in some implementations, the joint optimization module 330 may perform a joint optimization of energy functions of pose data and image depths of sampled points jointly. For example, a factorization module 350 that is included in the joint optimization module 330 may receive the estimated pose data 331, some or all of the camera frame data 215 (such as the keyframes 311), and data associated with the depth map 217. The factorization module 350 may determine a joint optimization of energy functions associated with the estimated pose data 331 and the image depths of the depth map 217. 

Yang [0048] teaches In some implementations, the joint optimization module 330 includes a marginalization module 360. In an example, the marginalization module 360 removes old keyframes 311 from the deep virtual stereo odometry calculation module 125 by marginalization. The removal of the old keyframes 311 maintains a fixed size of an active processing window for the deep virtual stereo odometry calculation module 125. Additionally, parameter estimates (e.g., camera poses and depths in a marginalization prior factor) outside of the active window may also incorporated into the joint optimization module 330. 

Yang [0049] teaches based on the joint optimization of the estimated pose data 331, the factorization module 350 may determine a bundle adjustment to the estimated pose data 331. The bundle adjustment may indicate a change in the position or orientation of the monocular visual odometry system 110 based on one or more differences in visual data. In some examples, the joint optimization module 330 may generate modified pose data 335 based on the bundle adjustment determined by the factorization module 350. The modifications may include a joint optimization, such as a joint optimization that optimizes the estimated pose data 331 (e.g., in a given set of operations by the factorization module 350). 

Yang [0050] teaches in some implementations, one or more of a joint optimization or a coarse tracking pose adjustment are performed in an ongoing manner. For example, the coarse tracking module 340 may determine a pose adjustment for each camera frame that is included in the camera frame data 215. As images are recorded by the camera sensor 115, the images may be added to the camera frame data 215 as additional camera frames (e.g., included in the additional frames 313). The coarse tracking module 340 may determine a respective pose adjustment for each added image, and generate (or modify) the modified pose data 335 based on the respective adjustments. In addition, the estimated pose data 331 may be updated based on the modified pose data 335, such that the estimated pose data 331 is kept current based on a joint optimization pose adjustment as images are added to the camera frame data 215. 

Yang [0052] teaches in some implementations, a monocular visual odometry system is considered a deep virtual stereo odometry system. The deep virtual stereo odometry system may include (or be configured to communicate with) one or more of a deep virtual stereo odometry calculation module and a camera sensor. In addition, the deep virtual stereo odometry system may determine one or more positional parameters based on pose data determined from the camera sensor.

Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify a device for performing simultaneous localization and mapping (SLAM) by Mur-Artal, Albrecht, Wang and Mur-Artal2 with Yang teaching of perform a local bundle adjustment (LBA) based on a camera pose in order to obtain visual odometry information. One would have been motivated to perform this combination due to the fact that it allows one to accurately determine position and orientation of an object on which a camera of a monocular visual odometry system([Yang, 0002]). In combination, Mur-Artal is not altered in that Mur-Artal continues to perform SLAM technology. Albrecht's teachings perform the same as they do separately of using multiple sensors in SLAM technology. Wang continues to teach loop closure detection. Mur-Artal2 continues determine whether the frame is a key frame based on a number of the matched 2D key points. Yang teachings perform the same as they do separately of perform a local bundle adjustment (LBA) based on a camera pose. 

Therefore one of ordinary skill in the art, such as an individual working in the field of simultaneous localization and mapping (SLAM) in stereo images could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 8.

Claim 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras to Mur-Artal et al., hereinafter, “Mur-Artal” in view of Improving stereo vision based SLAM by integrating inertial measurements for person indoor navigation to Albrecht et al., hereinafter, “Albrecht” and SLAM-based Cooperative Calibration for Optical Sensors Array with GPS/IMU Aided to Wang et al., hereinafter, “Wang” and in further view of US 2019/0387209 A1 to Yang et al., hereinafter, “Yang”.
Claim 13. While the combination of Mur-Artal, Albrecht and Wang fails to explicitly teach the limitations of claim 13, however, Yang, in the field of stereo odometry, teaches wherein the processor of the terminal device is further configured to, in the second processing stage, based upon determining that a frame of the frame sequence is a key frame: perform a first local bundle adjustment (LBA) based on a camera pose in order to obtain visual odometry information and a LBA graph; calculate a fused camera pose based on the visual odometry information and the sensor readings included in the frame; and perform a fusion tracking procedure based on the fused camera pose, the LBA graph, and a current full graph in order to obtain a current camera pose Yang [0043] teaches FIG. 3 depicts an example of a deep virtual stereo odometry calculation module 125 that includes a joint optimization module 330. The deep virtual stereo odometry calculation module 125 may be included in a monocular visual odometry system, such as the monocular visual odometry system 110 described above with regard to FIG. 1. In addition, the deep virtual stereo odometry calculation module 125 may receive data (e.g., as described above with regard to FIG. 1), such as camera frame data 215 received by the camera sensor 105. In some cases, the camera frame data 215 may include one or more groups of camera frames, such as a group of keyframes 311 and a group of additional camera frames 313. Based on the received data, the joint optimization module 330 may modify pose data. For example, the joint optimization module 330 may modify coarse tracking associated with pose data based on the camera frame data 215, including the keyframes 311 and the additional frames 313. 

Yang [0044] teaches in some implementations, a coarse tracking module 340 that is included in the deep virtual stereo odometry calculation module 125 is able to adjust pose data based on one or more camera frames in the camera frame data 215. For example, the coarse tracking module 340 may receive an initial pose estimate 329, such as pose data that includes a current estimation of the monocular visual odometry system's position and location based on the camera frame data 215 (e.g., a set of image points extracted from camera images). The initial pose estimate 329 may be assigned based on a motion model of the camera sensor 105. The assignment of the estimated pose data 331 may be performed by assuming camera motion between a most recent time step t-1 and a current time step t is the same as between a time step t-2 and the most recent time step t-1. In addition, the coarse tracking module 340 may receive a current camera frame (e.g., having a timestamp indicating a recent time of recording by a camera sensor), and a current keyframe from the group of keyframes 311 (e.g., having the most recent timestamp from the group of keyframes 311). 
Yang [0047] teaches in some implementations, the joint optimization module 330 may perform a joint optimization of energy functions of pose data and image depths of sampled points jointly. For example, a factorization module 350 that is included in the joint optimization module 330 may receive the estimated pose data 331, some or all of the camera frame data 215 (such as the keyframes 311), and data associated with the depth map 217. The factorization module 350 may determine a joint optimization of energy functions associated with the estimated pose data 331 and the image depths of the depth map 217. 

Yang [0048] teaches in some implementations, the joint optimization module 330 includes a marginalization module 360. In an example, the marginalization module 360 removes old keyframes 311 from the deep virtual stereo odometry calculation module 125 by marginalization. The removal of the old keyframes 311 maintains a fixed size of an active processing window for the deep virtual stereo odometry calculation module 125. Additionally, parameter estimates (e.g., camera poses and depths in a marginalization prior factor) outside of the active window may also incorporated into the joint optimization module 330. 

Yang [0049] teaches based on the joint optimization of the estimated pose data 331, the factorization module 350 may determine a bundle adjustment to the estimated pose data 331. The bundle adjustment may indicate a change in the position or orientation of the monocular visual odometry system 110 based on one or more differences in visual data. In some examples, the joint optimization module 330 may generate modified pose data 335 based on the bundle adjustment determined by the factorization module 350. The modifications may include a joint optimization, such as a joint optimization that optimizes the estimated pose data 331 (e.g., in a given set of operations by the factorization module 350). 

Yang [0050] teaches in some implementations, one or more of a joint optimization or a coarse tracking pose adjustment are performed in an ongoing manner. For example, the coarse tracking module 340 may determine a pose adjustment for each camera frame that is included in the camera frame data 215. As images are recorded by the camera sensor 115, the images may be added to the camera frame data 215 as additional camera frames (e.g., included in the additional frames 313). The coarse tracking module 340 may determine a respective pose adjustment for each added image, and generate (or modify) the modified pose data 335 based on the respective adjustments. In addition, the estimated pose data 331 may be updated based on the modified pose data 335, such that the estimated pose data 331 is kept current based on a joint optimization pose adjustment as images are added to the camera frame data 215. 

Yang [0052] teaches in some implementations, a monocular visual odometry system is considered a deep virtual stereo odometry system. The deep virtual stereo odometry system may include (or be configured to communicate with) one or more of a deep virtual stereo odometry calculation module and a camera sensor. In addition, the deep virtual stereo odometry system may determine one or more positional parameters based on pose data determined from the camera sensor.

Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify a device for performing simultaneous localization and mapping (SLAM) by Mur-Artal and Albrecht’s with Yang teaching of perform a local bundle adjustment (LBA) based on a camera pose in order to obtain visual odometry information. One would have been motivated to perform this combination due to the fact that it allows one to accurately determine position and orientation of an object on which a camera of a monocular visual odometry system([Yang, 0002]) In combination, Mur-Artal is not altered in that Mur-Artal continues to perform SLAM technology. Albrecht's teachings perform the same as they do separately of using multiple sensors in SLAM technology. Wang continues to teach loop closure detection. Yang teachings perform the same as they do separately of perform a local bundle adjustment (LBA) based on a camera pose. 

Therefore one of ordinary skill in the art, such as an individual working in the field of simultaneous localization and mapping (SLAM) in stereo images could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 13.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DELOMIA L GILLIARD whose telephone number is (571)272-1681.  The examiner can normally be reached on 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on 571 272-8243.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/DELOMIA L GILLIARD/Primary Examiner, Art Unit 2661