DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s response, filed 4 March 2022, to the last office action has been entered and made of record. 
In response to the cancellation of claims 3, 6, and 9, they are acknowledged and made of record.
In response to the amendments to the claims, they are acknowledged, supported by the original disclosure, and no new matter is added.
In response to the amendments to the claims, specifically addressing the objection to claim 1 of the previous Office action, the amended language has overcome the respective objection, and the objection has been withdrawn.
Amendments to the independent claims 1, 4, and 7 have necessitated an updated ground of rejection over the applied prior art. Please see below for the updated interpretations and rejections.

Response to Arguments
Applicant's arguments filed 4 March 2022 have been fully considered but they are not persuasive.
In response to Applicant’s remark so p. 10-13 of Applicant’s reply, that the teachings of Fioraio, combined with the teachings of Chen, Marty, and Parkhiya, do not suggest the amended limitations of “performing, by the one or more hardware processors, a joint optimization by minimizing a resultant cost function to generate an optimized 3D map of the area of interest, wherein the joint optimization comprises adding constraints to the bundle adjustment by integrating the plurality of objects in the SLAM by applying the plurality of shape parameters of the objects and the plurality of poses corresponding to each of the plurality of objects detected in the input image sequence, wherein the resultant cost function comprises: a first cost function associated with the bundle adjustment for initializations of the plurality of successive 3D points and the plurality of successive initial poses; and a second cost function associated optimization of the shape parameters of the objects and the plurality of poses corresponding to each of the plurality of objects”, the Examiner respectfully disagrees.
Examiner notes the claims are treated with their broadest reasonable interpretations consistent with the specification. See MPEP 2111. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). Furthermore, the test for obviousness is what the combined teachings of the references would have suggested to those of ordinary skill in the art. See In re Keller, 642 F.2d 413, 208 USPQ871 (CCPA 1981). One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
The combined teachings of Chen, Maity, and Parkhiya are relied upon to teach a system and method which processes an input image sequence of an area of interest captured by an image sensing device, for performing an improved Simultaneous Localization And Mapping (SLAM) process, which performs edge detection and tracking of the images and object detection to estimate object shape and poses to be optimized in the SLAM model. 
Notably, Maity is relied upon to teach known techniques for implementing a feature based monocular visual SLAM, described as Edge SLAM (see Maity Abstract), which detects edge points from images, tracks those using optical flow and refine the point correspondences using geometrical  (see Maity sect. 3.2. Keyframe Selection, sect. 3.3. Two-view Initialization, and sect. 3.4.1 Incremental Pose Estimation & Mapping). Maity further teaches that the initial two keyframes are chosen in an incremental bundle adjustment, and minimizing a cost function associated with performing local bundle adjustment for the 3D points and the keyframes modeled by camera pose parameters (see Maity sect. sect. 3.3. Two-view Initialization, sect. 3.4.1 Incremental Pose Estimation & Mapping and Eq. (1)).
Parkhiya is relied upon to teach known techniques for implementing an object oriented SLAM with a monocular camera (see Parkhiya Abstract), which detects objects and extracts object keypoints to estimate object shape and pose parameters which are used to form additional factors in the SLAM factor graph in estimate the trajectory of a camera (see Parkhiya Fig. 2, sect. IV.B. Object Observation Factors, sect. IV. C. Object-SLAM, and V. A. Object Detection and Data Association). Parkhiya further teaches minimizing a keypoint reprojection error function for optimizing and estimating each object’s shape and pose in each frame (see Parkhiya sect. IV. B. Object Observation Factors and Eq. (5)).	
Fioraio is further relied upon to teach known techniques for implementing a semantic bundle adjustment framework in an integrated SLAM and object detection pipeline (see Fioraio Abstract and sect. 3. Semantic Bundle Adjustment) which performs a global semantic optimization by jointly optimizing over all camera poses and object poses, where all the camera pose vertexes with frame to frame constrains coming from the SLAM engine (see Fioraio sect. 3.3. Semantic SLAM; see also Fioraio sect. 3.2. The Object Detection Pipeline and sect. 3. Semantic Bundle Adjustment)
	

The combined teachings of Chen, Maity, Parkhiya, and Fioraio, notably Maity, Parkhiya, and Fioraio, would suggest to one of ordinary skill in the art that by applying Fioraio’s techniques to the 
While Fioraio is relied upon to teach performing a global semantic optimization by jointly optimizing the global validation graph over all camera poses and object poses and minimizing a cost function which includes both frame-frame constraints and frame-to-object constraints, the teachings of Maity and Parkhiya are relied upon to be combined with the teachings of Fioraio such that the frame-frame constraints would correspond to cost function associated with performing local bundle adjustment for the 3D points and the keyframes modeled by camera pose parameters, and the frame-object constraints would correspond to keypoint reprojection error function for optimizing and estimating each object’s shape and pose in each frame. Thus, the combined teachings of Chen, Marty, Parkhiya, and Fioraio are relied upon to suggest to those of ordinary skill in the art the joint optimization framework for tracked objects and camera trajectory tracked with 3D edge points in implementing integrated SLAM and object detection pipeline. 

	
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-2, 4-5, and 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 10,659,768, effectively filed 28 Feb. 2017), herein Chen, in view of Maity et al. (“Edge SLAM: Edge Points Based Monocular Visual SLAM”), herein Maity, Parkhiya et al. (“Constructing Category-Specific Models for Monocular Object-SLAM”), herein Parkhiya, and Fioraio et al. (“Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment”), herein Fioraio.
Regarding claim 1, Chen discloses a processor implemented method (300) for integrating objects in monocular simultaneous localization and mapping (SLAM), the method comprising: 
receiving, by one or more hardware processors (see Chen col. 4, ln. 40-55 and Fig. 1A, where the system includes special purpose processors, such as an image processor, a pose processor, and a virtual image processor), an input image sequence of an area of interest captured by an image sensing device (see Chen col. 4, ln. 55- col. 5, ln. 5, where sensors, such as cameras, capture images of a scene).
Chen does not explicitly disclose performing, by the one or more hardware processors steps for bundle adjustment, the steps comprising: a) identifying a plurality of key frames in the input image sequence based on edge correspondences of 2D points between successive images frames in the input image sequence; b) determining an initial pose of the image sensing device by obtaining rotation and translation of a second key-frame with respect to a first key-frame from the plurality of key-frames; c) determining an initial 3D map of the area of interest using the initial pose, wherein the initial 3D map provides a plurality of initial 3D points; d) obtaining a plurality of successive initial poses of the image sensing device based on a resection technique that utilizes the initial 3D map and edge correspondences of the 2D Points in each of successive key-frames among the plurality of key-frames; e) determining initializations of a plurality of successive 3D points for each of the successive key-frames using a 
Maity teaches in a related and pertinent edge based SLAM method (see Maity Abstract), which performs 
a) identifying a plurality of key frames in the input image sequence based on edge correspondences of 2D points between successive images frames in the input image sequence (see Maity sect. 3.2. Keyframe Selection, where a number of keyframes are selected from an image sequence of consecutive frames);
b) determining an initial pose of the image sensing device by obtaining rotation and translation of a second key-frame with respect to a first key-frame from the plurality of key-frames (see Maity sect. 3.3. Two-view Initialization, where the initial pair wise pose estimation between the initial seed pair keyframes is determined);
c) determining an initial 3D map of the area of interest using the initial pose, wherein the initial 3D map provides a plurality of initial 3D points (see Maity sect. 3.3. Two-view Initialization Fig. 4, where the initial 3D structure is generated based on the initial pair wise pose estimation);
d) obtaining a plurality of successive initial poses of the image sensing device based on a resection technique that utilizes the initial 3D map and edge correspondences of the 2D Points in each of successive key-frames among the plurality of key-frames (see Maity sect. 3.4.1 Incremental Pose Estimation & Mapping and Fig. 6, where the initial 3D structure from two-view bundle adjustment is used to add new cameras through resectioning);
(see Maity sect. 3.4.1 Incremental Pose Estimation & Mapping , where after two-view initialization, the next keyframes frames are found using keyframe selection method described in Section 3.2, and add these keyframes into the existing reconstruction by re-sectioning using 3D-2D correspondences followed by addition of new structure points through triangulation); and
f) performing the bundle adjustment for the SLAM based on the initializations of the plurality of successive 3D points and the plurality of successive initial poses (see Maity sect. 3.4.1 Incremental Pose Estimation & Mapping, where bundle adjustment is performed on the set new keyframes and co-visible keyframes and all 3D points visible to those keyframes), and
a first cost function associated with the bundle adjustment for initializations of the plurality of successive 3D points and the plurality of successive initial poses (see Maity sect. sect. 3.3. Two-view Initialization, where the initial two keyframes are chosen in an incremental bundle adjustment; see Maity sect. 3.4.1 Incremental Pose Estimation & Mapping and Eq. (1), where a local bundle adjustment to be minimized is disclosed).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Maity to the teachings of Chen, such that the edge based SLAM method is implemented on a similar system comprising hardware processors processing an input image sequence of an area of interest captured by an image sensing device. This modification is rationalized as an application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Chen disclose a base system and method comprising hardware processors processing an input image sequence of an area of interest captured by an image sensing device, for performing a visual simultaneous localization and mapping (SLAM) process. Maity teaches known techniques for 
	Chen and Maity do not explicitly disclose simultaneously performing, by the one or more hardware processors, object detection on the input image sequence using a bounding box based cropping technique on objects in the input image sequences, key point detection of the objects and wireframe model fitting to the key points of the objects to obtain a plurality of shape parameters of the objects and a plurality of poses corresponding to each of the plurality of objects detected in the input image sequence, and a second cost function associated optimization of the shape parameters of the objects and the plurality of poses corresponding to each of the plurality of object.
Parkhiya teaches in a related and pertinent method for real-time object oriented SLAM with a monocular camera (see Parkhiya Abstract), where an object detector is used to detect objects of interest in images of an input image sequence and perform non-maximum suppression resulting in bounding box detection of objects which each bounding box detected in the image are processed by a keypoint localization network (see Parkhiya Fig. 2 and V. A. Object Detection and Data Association), and keypoints are extracted from the observed objects in the frame and used to estimate each objects shape and pose (see Parkhiya Fig. 2 and sect. IV.B. Object Observation Factors), and the estimated object shape and pose parameters are used to form additional factors in the SLAM factor graph in estimating the trajectory of a camera (see Parkhiya Fig. 2 and sect. IV. C. Object-SLAM; see also Parkhiya Fig. 1), where a keypoint reprojection error function is provided to be optimized in estimating  (see Parkhiya sect. IV. B. Object Observation Factors and Eq. (5)).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Parkhiya to the teachings of Chen and Maity, such that SLAM pipeline further performs object detection  and extracting object keypoint to estimate object shapes and poses to form additional factors in minimizing a SLAM factor graph. This modification is rationalized as an application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Chen and Maity disclose a base system and method comprising hardware processors processing an input image sequence of an area of interest captured by an image sensing device, for performing an improved visual simultaneous localization and mapping (SLAM) process. Parkhiya teaches known techniques for implementing an object oriented SLAM, which detects objects and extracts object keypoints to estimate object shape and pose parameters are used to form additional factors in the SLAM factor graph in estimate the trajectory of a camera. One of ordinary skill in the art would have recognized that by applying Parkhiya’s techniques to the system of Chen and Maity would allow for the system of Chen and Maity to further improve the visual SLAM method by performing object detection and extracting object keypoint to estimate object shapes and poses to form additional factors in minimizing a SLAM factor graph, resulting in improving the robustness of optimizing the SLAM model. 
While Maity teaches that the initial two keyframes are chosen in an incremental bundle adjustment, and minimizing a cost function associated with performing local bundle adjustment for the 3D points and the keyframes modeled by camera pose parameters (see Maity sect. sect. 3.3. Two-view Initialization, sect. 3.4.1 Incremental Pose Estimation & Mapping and Eq. (1)), and Parkhiya teaches minimizing a keypoint reprojection error function for optimizing and estimating each object’s shape and pose in each frame (see Parkhiya sect. IV. B. Object Observation Factors and Eq. (5));

Fioraio teaches in a related and pertinent method for implementing a semantic bundle adjustment framework in an integrated SLAM and object detection pipeline (see Fioraio Abstract and sect. 3. Semantic Bundle Adjustment), where a global semantic optimization is performed where a global graph is jointly optimized over all camera poses and object poses, where the global graph comprises all the camera pose vertexes with frame to frame constraints coming from the SLAM engine, all the pose vertexes of those object for which the validation procedure turned out successful, and all frame to object and frame to frame constraints coming from detected objects’ validation graphs (see Fioraio sect. 3.3. Semantic SLAM), and that validation graphs are optimized by minimizing a cost function, e.g. Eq. (1), which includes both frame-to-frame as well as frame-to-object constraints, and the global weighted mean residual from the last global optimization is relied upon to retain or discard edges (see Fioraio sect. 3.2. The Object Detection Pipeline and sect. 3. Semantic Bundle Adjustment), and that the results yield a final reconstruction of a 3D map of the imaged area (see Fioraio Fig. 3 and sect. 4.1. Quantitative Results). 
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Fioraio to the teachings of Chen, Maity, and Parkhiya, such that a semantic bundle adjustment is performed to integrate the tracked edge points and estimated object shapes and poses, 
method ready for improvement to yield predictable results. In this instance, Chen, Maity, and Parkhiya disclose a base system and method comprising hardware processors processing an input image sequence of an area of interest captured by an image sensing device, for performing an improved SLAM process, which performs edge detection and tracking of the images and object detection to estimate object shape and poses to be optimized in the SLAM model. Fioraio teaches known techniques for implementing a semantic bundle adjustment framework in an integrated SLAM and object detection pipeline, where a global semantic optimization is performed where a global graph is jointly optimized over all camera poses and object poses. One of ordinary skill in the art would have recognized that by applying Fioraio’s techniques to the system of Chen, Maity, and Parkhiya would allow for the system of Chen, Maity, and Parkhiya to perform an improved semantic bundle adjustment, where a global graph to be optimized would include frame-frame constraints, corresponding to the tracked edge points and keyframe camera poses, and frame-to-object constraints, corresponding to the object shapes and poses, and resulting in an improved visual SLAM method where the integrated global graph would result in improving the robustness of optimizing the SLAM model. 

Regarding claim 2, please see the above rejection of claim 1. Chen, Maity, Parkhiya, and Fioraio disclose the method of claim 1, wherein identifying the plurality of key-frames comprises: 
detecting edges in a first frame and a second frame in the input image sequence, wherein the first image frame and the second image frames are successive image frames (see Maity sect. 3.2. Keyframe Selection, where the image sequence is a sequence of consecutive frames, where point correspondences are tracked for every image; see Maity sect. 3.1. Correspondence Generation, where edge points are used as the feature correspondence); 
(see Maity sect. 3.2. Keyframe Selection, where if a computed average positions change of feature correspondences between a current frame It, and last keyframe, Km, is more than twenty percent of the image width, the current frame considered as a new keyframe); 
selecting a successive frame to the second frame for the key-frame determination if the second frame is not identified as the key-frame (see Maity sect. 3.2. Keyframe Selection, where if none of the disclosed conditions occur, a new keyframe (Km+1) is considered in a fixed interval of 1 second); and 
repeating the key-frame detection for input image sequence to determine the plurality of key-frames (see Maity sect. 3.2. Keyframe Selection, where point correspondences are tracked for every image, and keyframes are a subset of the image sequence; which suggests that keyframe selection is repeated throughout the sequence of consecutive frames).

Regarding claim 4, it recites a system performing the method of claim 1. Chen, Maity, Parkhiya, and Fioraio teach a system performing the method of claim 1 (see Chen col. 4, ln. 40-55 and Fig. 1A, where a system for implementing the disclosed teachings is taught). Please see above for detailed claim analysis, with the exception to the following further limitations:
a memory storing instructions (see Chen col. 4, ln. 40-55 and Fig. 1A, where the system includes a memory; and see Chen Fig. 1A, col. 5, ln. 10-35, and col. 7, ln. 5-30, where software for implementing the disclosed teachings are stored in the memory); 
one or more Input/Output (I/O) interfaces (see Chen Fig. 1A and col. 4, ln. 40-55, where one or more buses are provided); and
one or more processor(s) coupled to the memory via the one or more I/O interfaces (see Chen Fig. 1A and col. 4, ln. 40-55, where the one or more buses are coupled to the processors and memory), wherein the one or more processor(s) are configured by the instructions to perform the method of claim (see Chen Fig. 1A, col. 5, ln. 10-35, and col. 7, ln. 5-30, where processors perform the disclosed teachings by implementing the stored software; see the above rejection for claim 1 in view of the combined teachings of Chen, Maity, Parkhiya, and Fioraio).
Please see the above rejection for claim 1, as the rationale to combine the teachings of Chen, Maity, Parkhiya, and Fioraio are similar, mutatis mutandis.

Regarding claim 5, see above rejection for claim 4. It is a system claim reciting similar subject matter as claim 2. Please see above claim 2 for detailed claim analysis as the limitations of claim 5 are similarly rejected.

Regarding claim 7, it recites a non-transitory machine-readable information storage mediums comprising one or more instructions, which when executed by one or more hardware processors, performing the method of claim 1. Chen, Maity, Parkhiya, and Fioraio teach a non-transitory machine-readable information storage medium performing the method of claim 1 (see Chen Fig. 1A, col. 5, ln. 10-35, and col. 7, ln. 5-30, where software for implementing the disclosed teachings is stored on non-transitory computer-readable medium, such as RAM and ROM memory, and processors perform the disclosed teachings by implementing the software stored on the memory; see also the above rejection for claim 1 in view of the combined teachings of Chen, Maity, Parkhiya, and Fioraio). Please see above for detailed claim analysis.
Please see the above rejection for claim 1, as the rationale to combine the teachings of Chen, Maity, Parkhiya, and Fioraio are similar, mutatis mutandis.

.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TIMOTHY WING HO CHOI whose telephone number is (571)270-3814. The examiner can normally be reached 9:00 AM to 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT RUDOLPH can be reached on (571) 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/TIMOTHY CHOI/Examiner, Art Unit 2661                                                                                                                                                                                                        

/VINCENT RUDOLPH/Supervisory Patent Examiner, Art Unit 2661