DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 21 June 2022 has been entered.

Response to Amendment
Applicant’s response, filed 21 June 2022, to the last office action has been entered and made of record. 
In response to the amendments to the claims, they are acknowledged, supported by the original disclosure, and no new matter is added.
Amendments to the independent claims 1, 4, and 7 have necessitated a new ground of rejection over the applied prior art. Please see below for the updated interpretations and rejections.

Response to Arguments
Applicant's arguments filed 21 June 2022 have been fully considered but they are not persuasive.
Examiner notes the claims are treated with their broadest reasonable interpretations consistent with the specification. See MPEP 2111. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). Furthermore, the test for obviousness is what the combined teachings of the references would have suggested to those of ordinary skill in the art. See In re Keller, 642 F.2d 413, 208 USPQ871 (CCPA 1981).
Additionally, in response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

	
In response to Applicant’s remarks on p. 13-14 of Applicant’s reply, that the teachings of Chen fails to disclose the noted amended limitations, the Examiner respectfully disagrees.
Chen is relied upon to teach a base system and method for performing a visual simultaneous localization and mapping (SLAM) process which comprises hardware processors processing an input image sequence of an area of interest captured by an image sensing device, for performing a visual simultaneous localization and mapping (SLAM) process (see Chen col. 4, ln. 40-55 and Fig. 1A; and see Chen col. 4, ln. 55- col. 5, ln. 5).
As Chen teaches a system and method for performing visual SLAM process which uses hardware processors to process an input image sequence, Chen provides teachings for the broadest reasonable interpretations for the recited claim limitations of, “receiving, by one or more hardware processors, an input image sequence of an area of interest captured by an image sensing device”. 
In regards to Applicants remarks that the present claims are not directed to concepts of generating virtual images from virtual viewpoints different from viewpoints of the images acquired by a sensor to assist model of a scene reconstruction, the Examiner notes that Chen’s teachings are relied upon to teach a system and method for performing visual SLAM process which uses hardware processors to process an input image sequence. 
In regards to the other recited noted amended limitations, Chen’s teachings are further combined with the teachings of the other cited prior art references to suggest to one of ordinary skill in the art the other noted amended limitations, please see below for further discussion of the cited prior art teachings. 

In response to Applicant’s remarks on p.14-18 of Applicant’s reply, that the teachings of Maity fails to disclose the noted amended limitations, the Examiner respectfully disagrees
In particular, Applicants assert that Maity fails to disclose “identification of key frames comprise detecting edges in a first frame and a second frame in the input image sequences, wherein the first image frame and the second image frames are successive image frames”, “identifying whether the second frame is a key frame with respect to the first frame using correspondence detection based on a set of parameters; selecting a successive frame to the second frame for the key frame determination if the second frame is not identified as the key frame”, and “repeating the key frame detection for all input image sequences to determine the plurality of key frames”.
Maity is relied upon to teach in implementing a feature based monocular visual SLAM, described as Edge SLAM, where image sequence of consecutive frames are processed to determine and track point correspondences for every image, where edge points are used as the feature for correspondence (see Maity sect. 3.1 Correspondence generation, sect. 3.2. Keyframe Selection ), and computed average positions change of feature correspondences between a current frame and a last keyframe is used to determine if the current frame is considered as a new keyframe (see Maity sect. 3.2. Keyframe Selection), and if none of the disclosed conditions occur, a new keyframe (Km+1) is considered in a fixed interval of 1 second (see Maity sect. 3.2. Keyframe Selection). As Maity teaches that the point correspondences are tracked for every image, and keyframes are a subset of the image sequence (see Maity sect. 3.2. Keyframe Selection), Maity teachings suggests that keyframe selection is repeated throughout the sequence of consecutive frames.
As discussed above, Maity teaches in a related and pertinent edge SLAM method, which processes an input sequence of consecutive frames to determine and track edge point correspondences between every frame to determine keyframes, where if disclosed conditions do not occur for determining if a current frame is a keyframe, a later frame in the image sequence is considered for being a keyframe, and suggests that keyframe selection is repeated throughout the sequence of consecutive frames. Thus, the combined teachings of Chen and Maity provides teachings for the broadest reasonable interpretations for the recited claim limitations of, “identification of key frames comprise detecting edges in a first frame and a second frame in the input image sequences, wherein the first image frame and the second image frames are successive image frames”, “identifying whether the second frame is a key frame with respect to the first frame using correspondence detection based on a set of parameters; selecting a successive frame to the second frame for the key frame determination if the second frame is not identified as the key frame”, and “repeating the key frame detection for all input image sequences to determine the plurality of key frames”.

In response to Applicant’s remarks on p.18-21 of Applicant’s reply, that the teachings of Parkhiya fails to disclose the noted amended limitations, the Examiner respectfully disagrees. 
Parkhiya is relied upon to teach implementing an object oriented SLAM, which detects objects (see Parkhiya Fig. 2 and V. A. Object Detection and Data Association) and extracts object keypoints to estimate object shape and pose parameters (see Parkhiya Fig. 2 and sect. IV.B. Object Observation Factors), which are used to form additional factors in the SLAM factor graph in estimate the trajectory of a camera (see Parkhiya Fig. 2 and sect. IV. C. Object-SLAM; see also Parkhiya Fig. 1). Parkhiya further teaches that where objects may be represented as 3D wireframes and an object’s shape parameters may be expressed as a corresponding mean shape that can be deformed along linearly independent directions according to deformation coefficients (see Parkhiya Fig. 2, category level shape priors panel, sect. III. CONSTRUCTING CATEGORY-SPECIFIC MODELS, and sect. III. A. Category-Level Model)
As discussed above, Parkhiya teaches in a related and pertinent object SLAM method, which detects objects and extracts object keypoints to estimate object shape and pose parameters are used to form additional factors in the SLAM factor graph in estimate the trajectory of a camera, where object shape parameters are expressed according to deformation coefficients for deforming a mean shape model and corresponding to category level shape priors 3D wireframe models. Thus, the combined teachings of Chen, Maity, and Parkhiya suggests teachings for the broadest reasonable interpretations for the recited claim limitations of “simultaneously performing, by the one or more hardware processors, object detection on the input image sequence using a bounding box based cropping technique on objects in the input image sequences, key point detection of the objects and wireframe model fitting to the key points of the objects to obtain a plurality of shape parameters of the objects and a plurality of poses corresponding to each of the plurality of objects detected in the input image sequence”.

In response to Applicant’s remarks on p.13-14 of Applicant’s reply, that the teachings of Fioraio fail to disclose a recitation of “joint optimization framework for objects as well as camera trajectory and 3D structure in the SLAM backend”, the Examiner respectfully disagrees. 
Fioraio is relied upon to teach in a related and pertinent semantic bundle adjustment framework for visual SLAM methods, which integrates SLAM and object detection pipeline using a global graph to be jointly optimized over all camera poses and object poses (see Fioraio sect. 3. Semantic Bundle Adjustment), where the global graph comprises all the camera pose vertexes with frame to frame constraints coming from the SLAM engine, all the pose vertexes of those object for which the validation procedure turned out successful, and all frame to object and frame to frame constraints coming from detected objects’ validation graphs (see Fioraio sect. 3.3. Semantic SLAM).
Furthermore, as Maity teaches that the initial two keyframes are chosen in an incremental bundle adjustment, and minimizing a cost function associated with performing local bundle adjustment for the 3D points and the keyframes modeled by camera pose parameters (see Maity sect. sect. 3.3. Two-view Initialization, sect. 3.4.1 Incremental Pose Estimation & Mapping and Eq. (1)), and Parkhiya teaches minimizing a keypoint reprojection error function for optimizing and estimating each object’s shape and pose in each frame (see Parkhiya sect. IV. B. Object Observation Factors and Eq. (5)), where object shape parameters are expressed according to deformation coefficients for deforming a mean shape model and corresponding to category level shape priors 3D wireframe models (see Parkhiya Fig. 2, category level shape priors panel, sect. III. CONSTRUCTING CATEGORY-SPECIFIC MODELS, and sect. III. A. Category-Level Model); one of ordinary skill in the art would have recognized that by applying Fioraio’s techniques to the system of Chen, Maity, and Parkhiya would allow for the SLAM system of Chen, Maity, and Parkhiya to perform an improved semantic bundle adjustment, where a global graph to be optimized would include frame-frame constraints, corresponding to the tracked edge points and keyframe camera poses and related to bundle adjustment, and frame-to-object constraints, corresponding to the object shapes and poses and relating to the localized objects and shape parameters expressed as deformation coefficients based on category level shape priors 3D wireframe models, and resulting in an improved visual SLAM method where the integrated global graph would result in improving the robustness of optimizing the SLAM model. 
Thus, the combined teachings of Chen, Maity, Parkhiya, and Fioraio provides suggested teachings to one of ordinary skill in the art for the broadest reasonable interpretations for a joint optimization framework for objects as well as camera trajectory and 3D structure in the SLAM backend

Applicant’s remaining arguments with respect to independent claims 1, 4, and 7 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Objections
Claims 1, 4, and 7 are objected to because of the following informalities:  
Claims 1, 4, and 7, respectively recite, “wherein an object optimization is added to a bundle adjustment of a edge SLAM by providing the joint optimization”, where a typographical error exists, and a ”a bundle adjustment of an edge SLAM” is assumed.  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-2, 4-5, and 7-8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, 4, and 7 respectively recites the limitations, “wherein the joint optimization is performed on object localization, the edges using a category level shape priors and the bundle adjustment and wherein the resultant cost function comprises” in the respective claim limitation beginning with “performing a joint optimization…”. There is insufficient antecedent basis for “the edges using a category level shape priors”. While the respective claims previously recite that “plurality of shape parameters of objects” are applied and added to the joint optimization, it is unclear what antecedent basis “the edges using a category level shape priors” are intended to reference. 
Claims 1, 4, and 7 further recite the limitations,"wherein an object optimization is added to a bundle adjustment of a edge SLAM by providing the joint optimization for the edge 3D points, the camera poses, the object shape parameters and the object poses together in a single optimization framework " in the respective claim limitation beginning with “a second cost function…”.  There is insufficient antecedent basis for “the edge 3D points” and “the camera poses” in the claims. While the respective claims do previously recite “plurality of successive 3D points” and “poses of the image sensing device”, however it is unclear if the “edge 3D points” and “the camera poses” are intended to reference such limitations. 
Dependent claims 2, 5, and 8 incorporate the indefinite subject matter of the respective independent claims they depend upon and are rejected for similar rationale. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-2, 4-5, and 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 10,659,768, effectively filed 28 Feb. 2017), herein Chen, in view of Maity et al. (“Edge SLAM: Edge Points Based Monocular Visual SLAM”), herein Maity, Parkhiya et al. (“Constructing Category-Specific Models for Monocular Object-SLAM”), herein Parkhiya, Fioraio et al. (“Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment”), herein Fioraio, and Ramadasan et al. (“DCSLAM: A DYNAMICALLY CONSTRAINED REAL-TIME SLAM”), herein Ramadasan.
Regarding claim 1, Chen discloses a processor implemented method (300) for integrating objects in monocular simultaneous localization and mapping (SLAM), the method comprising: 
receiving, by one or more hardware processors (see Chen col. 4, ln. 40-55 and Fig. 1A, where the system includes special purpose processors, such as an image processor, a pose processor, and a virtual image processor), an input image sequence of an area of interest captured by an image sensing device (see Chen col. 4, ln. 55- col. 5, ln. 5, where sensors, such as cameras, capture images of a scene).
Chen does not explicitly disclose performing, by the one or more hardware processors steps for bundle adjustment, the steps comprising: a) identifying a plurality of key frames in the input image sequence based on edge correspondences of 2D points between successive images frames in the input image sequence, wherein identifying the plurality of key-frames comprises: 
detecting edges in a first frame and a second frame in the input image sequence, the input image sequence comprising a first image frame and a second image frame, wherein the first image frame and the second image frames are successive image frames and analyzing the first image frames and the second image frames; 
performing correspondence detection by associating one or more edges in the first image frame and the second image frame; 
identifying whether the second frame is a key-frame with respect to the first frame using the correspondence detection; b) determining an initial pose of the image sensing device by obtaining rotation and translation of a second key-frame with respect to a first key-frame from the plurality of key-frames; c) determining an initial 3D map of the area of interest using the initial pose, wherein the initial 3D map provides a plurality of initial 3D points; d) obtaining a plurality of successive initial poses of the image sensing device based on a resection technique that utilizes the initial 3D map and edge correspondences of the 2D Points in each of successive key-frames among the plurality of key-frames; 
e) determining initializations of a plurality of successive 3D points for each of the successive key-frames using a triangulation technique, wherein the triangulation technique determines each of the edge correspondences between the 2D points of each of the successive key-frames, wherein the 3D map points are obtained using 2D-2D correspondences and relative pose between the frames; and f) performing the bundle adjustment for the SLAM based on the initializations of the plurality of successive 3D points and the plurality of successive initial poses, and a first cost function associated with the bundle adjustment for initializations of the plurality of successive 3D points and the plurality of successive initial poses.
Maity teaches in a related and pertinent edge based SLAM method (see Maity Abstract), which performs 
a) identifying a plurality of key frames in the input image sequence based on edge correspondences of 2D points between successive images frames in the input image sequence (see Maity sect. 3.2. Keyframe Selection, where a number of keyframes are selected from an image sequence of consecutive frames), wherein identifying the plurality of key-frames comprises: 
detecting edges in a first frame and a second frame in the input image sequence, the input image sequence comprising a first image frame and a second image frame, wherein the first image frame and the second image frames are successive image frames and analyzing the first image frames and the second image frames (see Maity sect. 3.2. Keyframe Selection, where the image sequence is a sequence of consecutive frames, where point correspondences are tracked for every image, and the image sequence suggests successive first and second image frames; see Maity sect. 3.1. Correspondence Generation, where edge points are used as the feature correspondence); 
performing correspondence detection by associating one or more edges in the first image frame and the second image frame (see Maity Fig. 3 and sect. 3.1. Correspondence Generation, where feature correspondence of thinned edge points are estimated using bi-direction sparse iterative and pyramidal version of Lucas-Kanade optical flow);
identifying whether the second frame is a key-frame with respect to the first frame using correspondence detection (see Maity sect. 3.2. Keyframe Selection, where if a computed average positions change of feature correspondences between a current frame It, and last keyframe, Km, is more than twenty percent of the image width, the current frame considered as a new keyframe); 
b) determining an initial pose of the image sensing device by obtaining rotation and translation of a second key-frame with respect to a first key-frame from the plurality of key-frames (see Maity sect. 3.3. Two-view Initialization, where the initial pair wise pose estimation between the initial seed pair keyframes is determined);
c) determining an initial 3D map of the area of interest using the initial pose, wherein the initial 3D map provides a plurality of initial 3D points (see Maity sect. 3.3. Two-view Initialization Fig. 4, where the initial 3D structure is generated based on the initial pair wise pose estimation);
d) obtaining a plurality of successive initial poses of the image sensing device based on a resection technique that utilizes the initial 3D map and edge correspondences of the 2D Points in each of successive key-frames among the plurality of key-frames (see Maity sect. 3.4.1 Incremental Pose Estimation & Mapping and Fig. 6, where the initial 3D structure from two-view bundle adjustment is used to add new cameras through resectioning);
e) determining initializations of a plurality of successive 3D points for each of the successive key-frames using a triangulation technique, wherein the triangulation technique determines each of the edge correspondences between the 2D points of each of the successive key-frames (see Maity sect. 3.4.1 Incremental Pose Estimation & Mapping , where after two-view initialization, the next keyframes frames are found using keyframe selection method described in Section 3.2, and add these keyframes into the existing reconstruction by re-sectioning using 3D-2D correspondences followed by addition of new structure points through triangulation), wherein the 3D map points are obtained using 2D-2D correspondences and relative pose between the frames (see Maity 3.4.2 Track-loss Handling, where track loss is mitigated by estimating the current keyframe separately using 2D-2D point correspondences and epipolar geometry, which provides a pairwise rotation and an unit direction from the previous frame to the current frame and new 3D points with 2D correspondences and estimated poses of previous frames and current frame through triangulation); and
f) performing the bundle adjustment for the SLAM based on the initializations of the plurality of successive 3D points and the plurality of successive initial poses (see Maity sect. 3.4.1 Incremental Pose Estimation & Mapping, where bundle adjustment is performed on the set new keyframes and co-visible keyframes and all 3D points visible to those keyframes), and
a first cost function associated with the bundle adjustment for initializations of the plurality of successive 3D points and the plurality of successive initial poses (see Maity sect. sect. 3.3. Two-view Initialization, where the initial two keyframes are chosen in an incremental bundle adjustment; see Maity sect. 3.4.1 Incremental Pose Estimation & Mapping and Eq. (1), where a local bundle adjustment to be minimized is disclosed).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Maity to the teachings of Chen, such that the edge based SLAM method is implemented on a similar system comprising hardware processors processing an input image sequence of an area of interest captured by an image sensing device. This modification is rationalized as an application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Chen disclose a base system and method comprising hardware processors processing an input image sequence of an area of interest captured by an image sensing device, for performing a visual simultaneous localization and mapping (SLAM) process. Maity teaches known techniques for implementing a feature based monocular visual SLAM, described as Edge SLAM, which detects edge points from images, tracks those using optical flow and refine the point correspondences using geometrical relationships among three views, with a two-view initialization for bundle adjustment, and further estimates new cameras using a local optimization technique. One of ordinary skill in the art would have recognized that by applying Maity’s techniques to the system of Chen would allow for the system of Chen to perform the improved visual SLAM method taught by Maity. 
	Chen and Maity do not explicitly disclose simultaneously performing, by the one or more hardware processors, object detection on the input image sequence using a bounding box based cropping technique on objects in the input image sequences, key point detection of the objects and wireframe model fitting to the key points of the objects to obtain a plurality of shape parameters of the objects and a plurality of poses corresponding to each of the plurality of objects detected in the input image sequence, and a second cost function associated optimization of the shape parameters of the objects and the plurality of poses corresponding to each of the plurality of object.
Parkhiya teaches in a related and pertinent method for real-time object oriented SLAM with a monocular camera (see Parkhiya Abstract), where an object detector is used to detect objects of interest in images of an input image sequence and perform non-maximum suppression resulting in bounding box detection of objects which each bounding box detected in the image are processed by a keypoint localization network (see Parkhiya Fig. 2 and V. A. Object Detection and Data Association), and keypoints are extracted from the observed objects in the frame and used to estimate each objects shape and pose (see Parkhiya Fig. 2 and sect. IV.B. Object Observation Factors), and the estimated object shape and pose parameters are used to form additional factors in the SLAM factor graph in estimating the trajectory of a camera (see Parkhiya Fig. 2 and sect. IV. C. Object-SLAM; see also Parkhiya Fig. 1), where a keypoint reprojection error function is provided to be optimized in estimating each object’s shape and pose in each frame (see Parkhiya sect. IV. B. Object Observation Factors and Eq. (5)), where objects may be represented as 3D wireframes and an object’s shape parameters may be expressed as a corresponding mean shape that can be deformed along linearly independent directions according to deformation coefficients (see Parkhiya Fig. 2, category level shape priors panel, sect. III. CONSTRUCTING CATEGORY-SPECIFIC MODELS, and sect. III. A. Category-Level Model).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Parkhiya to the teachings of Chen and Maity, such that SLAM pipeline further performs object detection  and extracting object keypoint to estimate object shapes and poses to form additional factors in minimizing a SLAM factor graph. This modification is rationalized as an application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Chen and Maity disclose a base system and method comprising hardware processors processing an input image sequence of an area of interest captured by an image sensing device, for performing an improved visual simultaneous localization and mapping (SLAM) process. Parkhiya teaches known techniques for implementing an object oriented SLAM, which detects objects and extracts object keypoints to estimate object shape and pose parameters are used to form additional factors in the SLAM factor graph in estimate the trajectory of a camera. One of ordinary skill in the art would have recognized that by applying Parkhiya’s techniques to the system of Chen and Maity would allow for the system of Chen and Maity to further improve the visual SLAM method by performing object detection and extracting object keypoint to estimate object shapes and poses to form additional factors in minimizing a SLAM factor graph, resulting in improving the robustness of optimizing the SLAM model. 
While Maity teaches that the initial two keyframes are chosen in an incremental bundle adjustment, and minimizing a cost function associated with performing local bundle adjustment for the 3D points and the keyframes modeled by camera pose parameters (see Maity sect. sect. 3.3. Two-view Initialization, sect. 3.4.1 Incremental Pose Estimation & Mapping and Eq. (1)), and Parkhiya teaches minimizing a keypoint reprojection error function for optimizing and estimating each object’s shape and pose in each frame (see Parkhiya sect. IV. B. Object Observation Factors and Eq. (5)), where object shape parameters are expressed according to deformation coefficients for deforming a mean shape model and corresponding to category level shape priors 3D wireframe models (see Parkhiya Fig. 2, category level shape priors panel, sect. III. CONSTRUCTING CATEGORY-SPECIFIC MODELS, and sect. III. A. Category-Level Model); Chen, Maity, and Parkhiya do not explicitly disclose performing, by the one or more hardware processors, a joint optimization by minimizing a resultant cost function to generate an optimized 3D map of the area of interest, wherein the joint optimization comprises adding constraints to the bundle adjustment by integrating the plurality of objects in the SLAM by applying the plurality of shape parameters of the objects and the plurality of poses corresponding to each of the plurality of objects detected in the input image sequence, wherein the joint optimization is performed on object localization, the edges using a category level shape priors and the bundle adjustment, wherein the resultant cost function comprises the first cost function and the second cost function.
Fioraio teaches in a related and pertinent method for implementing a semantic bundle adjustment framework in an integrated SLAM and object detection pipeline (see Fioraio Abstract and sect. 3. Semantic Bundle Adjustment), where a global semantic optimization is performed where a global graph is jointly optimized over all camera poses and object poses, where the global graph comprises all the camera pose vertexes with frame to frame constraints coming from the SLAM engine, all the pose vertexes of those object for which the validation procedure turned out successful, and all frame to object and frame to frame constraints coming from detected objects’ validation graphs (see Fioraio sect. 3.3. Semantic SLAM), and that validation graphs are optimized by minimizing a cost function, e.g. Eq. (1), which includes both frame-to-frame as well as frame-to-object constraints, and the global weighted mean residual from the last global optimization is relied upon to retain or discard edges (see Fioraio sect. 3.2. The Object Detection Pipeline and sect. 3. Semantic Bundle Adjustment), and that the results yield a final reconstruction of a 3D map of the imaged area (see Fioraio Fig. 3 and sect. 4.1. Quantitative Results). 
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Fioraio to the teachings of Chen, Maity, and Parkhiya, such that a semantic bundle adjustment is performed to integrate the tracked edge points, corresponding to the edge SLAM bundle adjustment of Maity, and estimated object shapes and poses, corresponding to shape parameters expressed as deformation coefficients based on category level shape priors 3D wireframe models and localized objects of Parkhiya, where a global graph to be optimized would include the tracked edge points and object shapes and poses, and provides for the broadest reasonable interpretation for performing joint optimization on object localization, edges using category level shape priors and bundle adjustment. This modification is rationalized as an application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Chen, Maity, and Parkhiya disclose a base system and method comprising hardware processors processing an input image sequence of an area of interest captured by an image sensing device, for performing an improved SLAM process, which performs edge detection and tracking of the images and object detection to estimate object shape and poses to be optimized in the SLAM model, where object shape parameters are expressed according to deformation coefficients for deforming a mean shape model and corresponding to category level shape priors 3D wireframe models. Fioraio teaches known techniques for implementing a semantic bundle adjustment framework in an integrated SLAM and object detection pipeline, where a global semantic optimization is performed where a global graph is jointly optimized over all camera poses and object poses. One of ordinary skill in the art would have recognized that by applying Fioraio’s techniques to the system of Chen, Maity, and Parkhiya would allow for the system of Chen, Maity, and Parkhiya to perform an improved semantic bundle adjustment, where a global graph to be optimized would include frame-frame constraints, corresponding to the tracked edge points and keyframe camera poses and related to bundle adjustment, and frame-to-object constraints, corresponding to the object shapes and poses and relating to the localized objects and shape parameters expressed as deformation coefficients based on category level shape priors 3D wireframe models, and resulting in an improved visual SLAM method where the integrated global graph would result in improving the robustness of optimizing the SLAM model. 
Chen, Maity, and Parkhiya and Fioraio do not explicitly disclose wherein an object optimization is added to a bundle adjustment of a edge SLAM by providing the joint optimization for the edge 3D points, the camera poses, the object shape parameters and the object poses together in a single optimization framework and wherein the bundle adjustment refers to optimization of the 3D points and the camera poses
Ramadasan teaches in a related and pertinent dynamically constrained SLAM (DCSLAM) method (see Ramadasan Abstract), where DCSLAM involves the constraints of parameters for camera poses, 3D features (3D points and 3D edges), and 3D objects of different sorts and the functioning of the DCSLAM is based on an exhaustive list of constraints and of parameters organized in a graph of dependencies (see Ramadasan sect. 3. DCSLAM and Fig. 2), where constraints between observed 3D objects and its 3D features are created and added to the optimization process, which includes the initialization of the position, orientation, scale factor and the shape of the 3D object (see Ramadasan sect. 3.1. Create and remove 3D objects), where the role of the DCSLAM is to dynamically integrate in the optimization process the constraints coming from the objects partially known from the environment bundle adjustment (see Ramadasan sect. 3.2 3D objets and features association), and the optimization is formulated as a dependencies graph linking the parameters and the constraints, where each constraint solves its own dependencies before solving itself, such that 3D objects are linked to the 3D features through the 3D distance minimization constraint (between a feature and an object) which is applied on the features which were previously selected by the reprojection error constraint linked to the last camera poses of the incremental SLAM and  the optimization of a 3D object depends of the features observed in the camera poses to optimize. (see Ramadasan sect. 3.3. Solving the dependencies graph).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Ramadasan to the teachings of Chen, Maity, Parkhiya, and Fioraio, such that the optimization for the semantic bundle adjustment to integrate the tracked edge points, corresponding to the edge SLAM bundle adjustment of Maity, and estimated object shapes and poses, corresponding to shape parameters expressed as deformation coefficients based on category level shape priors 3D wireframe models and localized objects of Parkhiya, would integrate the constraints of parameters for camera poses, 3D features (including 3D edges), and the observed 3D objects shape and pose parameters as a single optimization framework. This modification is rationalized as an application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Chen, Maity, Parkhiya, and Fioraio disclose a base system and method for performing an improved SLAM process to perform an improved semantic bundle adjustment, where a global graph to be optimized would include frame-frame constraints, corresponding to the tracked edge points and keyframe camera poses and related to bundle adjustment, and frame-to-object constraints, corresponding to the object shapes and poses and relating to the localized objects and shape parameters expressed as deformation coefficients based on category level shape priors 3D wireframe models. Ramadasan teaches known techniques for implementing in a DCSLAM method, an integrated optimization framework which involves the constraints of parameters for camera poses, 3D features (3D points and 3D edges), and 3D objects of different sorts, dynamically integrate in the optimization process the constraints coming from the objects partially known from the environment bundle adjustment, and the optimization is formulated as a dependencies graph linking the parameters and the constraints. One of ordinary skill in the art would have recognized that by applying Ramadasan’s techniques to the system of Chen, Maity, Parkhiya, and Fioraio would allow for the optimization for the semantic bundle adjustment would integrate the constraints of parameters for camera poses, 3D features (including 3D edges), corresponding to the edge SLAM bundle adjustment of Maity, and the observed 3D objects shape and pose parameters, corresponding to shape parameters expressed as deformation coefficients based on category level shape priors 3D wireframe models and localized objects of Parkhiya, as a single optimization framework, and resulting in an improved visual SLAM method where the integrated optimization framework would result in improving the robustness for optimizing the SLAM model. 

Regarding claim 2, please see the above rejection of claim 1. Chen, Maity, Parkhiya, Fioraio, and Ramadasan disclose the method of claim 1, wherein identifying the plurality of key-frames comprises: 
selecting a successive frame to the second frame for the key-frame determination if the second frame is not identified as the key-frame (see Maity sect. 3.2. Keyframe Selection, where if none of the disclosed conditions occur, a new keyframe (Km+1) is considered in a fixed interval of 1 second); and 
repeating the key-frame detection for input image sequence to determine the plurality of key-frames (see Maity sect. 3.2. Keyframe Selection, where point correspondences are tracked for every image, and keyframes are a subset of the image sequence; which suggests that keyframe selection is repeated throughout the sequence of consecutive frames).
Regarding claim 4, it recites a system performing the method of claim 1. Chen, Maity, Parkhiya, Fioraio, and Ramadasan teach a system performing the method of claim 1 (see Chen col. 4, ln. 40-55 and Fig. 1A, where a system for implementing the disclosed teachings is taught). Please see above for detailed claim analysis, with the exception to the following further limitations:
a memory storing instructions (see Chen col. 4, ln. 40-55 and Fig. 1A, where the system includes a memory; and see Chen Fig. 1A, col. 5, ln. 10-35, and col. 7, ln. 5-30, where software for implementing the disclosed teachings are stored in the memory); 
one or more Input/Output (I/O) interfaces (see Chen Fig. 1A and col. 4, ln. 40-55, where one or more buses are provided); and
one or more processor(s) coupled to the memory via the one or more I/O interfaces (see Chen Fig. 1A and col. 4, ln. 40-55, where the one or more buses are coupled to the processors and memory), wherein the one or more processor(s) are configured by the instructions to perform the method of claim 1 (see Chen Fig. 1A, col. 5, ln. 10-35, and col. 7, ln. 5-30, where processors perform the disclosed teachings by implementing the stored software; see the above rejection for claim 1 in view of the combined teachings of Chen, Maity, Parkhiya, and Fioraio).
Please see the above rejection for claim 1, as the rationale to combine the teachings of Chen, Maity, Parkhiya, Fioraio, and Ramadasan are similar, mutatis mutandis.

Regarding claim 5, see above rejection for claim 4. It is a system claim reciting similar subject matter as claim 2. Please see above claim 2 for detailed claim analysis as the limitations of claim 5 are similarly rejected.

Regarding claim 7, it recites a non-transitory machine-readable information storage mediums comprising one or more instructions, which when executed by one or more hardware processors, performing the method of claim 1. Chen, Maity, Parkhiya, Fioraio, and Ramadasan teach a non-transitory machine-readable information storage medium performing the method of claim 1 (see Chen Fig. 1A, col. 5, ln. 10-35, and col. 7, ln. 5-30, where software for implementing the disclosed teachings is stored on non-transitory computer-readable medium, such as RAM and ROM memory, and processors perform the disclosed teachings by implementing the software stored on the memory; see also the above rejection for claim 1 in view of the combined teachings of Chen, Maity, Parkhiya, and Fioraio). Please see above for detailed claim analysis.
Please see the above rejection for claim 1, as the rationale to combine the teachings of Chen, Maity, Parkhiya, Fioraio, and Ramadasan are similar, mutatis mutandis.

Regarding claim 8, see above rejection for claim 7. It is a non-transitory machine-readable information storage mediums claim reciting similar subject matter as claim 2. Please see above claim 2 for detailed claim analysis as the limitations of claim 8 are similarly rejected.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TIMOTHY WING HO CHOI whose telephone number is (571)270-3814. The examiner can normally be reached 9:00 AM to 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT RUDOLPH can be reached on (571) 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/TIMOTHY CHOI/Examiner, Art Unit 2661                                                                                                                                                                                                        

/VINCENT RUDOLPH/Supervisory Patent Examiner, Art Unit 2661