DETAILED ACTION
Response to Amendment
Claims 1-11 are pending. Claim 11 is amended.
Response to Arguments
Applicant's arguments filed April 20, 2021 have been fully considered but they are not persuasive. 
Applicant argues Leonard does not disclose, “performing a matching calculation on the graph of the previous frame and the graph of the current frame by using a graph matching algorithm to obtain a vertex correspondence relationship between the graph of the previous frame and the graph of the current frame” and cites paragraphs 43, 45, 56 and 69 of Leonard. Examiner disagrees. Leonard discloses:

point cloud having a plurality of vertices, each vertex comprising an (x,y,z)-tuple, sequence of estimated sensor locations is sampled to provide a pose graph (P) comprising a linked sequence of nodes, sequence of estimated sensor locations is sampled to provide a deformation graph (N) comprising a linked sequence of nodes, each corresponding to respective estimated sensor locations along the path, abstract, 

pose graph represents a camera pose for each camera frame, [0034], [this limitation indicates it is for claimed “the previous frame and the graph of the current frame”]

minimises the inconsistency in the graph, [0043] 

each point, or vertex, has a position in 3D space, [0045], 

“Each deformation node comprises coefficients for an affine transformation in the form of a 3x3 rotation matrix and a 3x1 translation vector. Each vertex in the model is effected by the transformation contained in the four nearest nodes to it using a weighted sum of these transformations”, [0048], 

“Thus, the deformation graph nodes N lie on the path of the pose graph P and are typically more closely spaced than the pose graph nodes P, but less closely spaced than the original sequential camera poses. It will be appreciated that this sampling can be performed incrementally as the camera trajectory is created”, [0050], 

pose graph, [0052].

While these paragraphs do not disclose the term “graph matching”, these processes disclose graph matching. Caetano et al., “Learning Graph Matching” describes graph matching as follows: “In graph matching, patterns are modeled as graphs and pattern recognition amounts to finding a correspondence between the nodes of different graphs” (abstract) and “Graph matching then consists of finding a correspondence between nodes of the two graphs such that they “look most similar” when the vertices are labeled according to such a correspondence” (part 1). Thereby according to the common definition of graph matching as indicated for instance by Caetano et al., Leonard discloses graph matching.

Applicant argues Leonard does not disclose, “calculating a pose of the target three-dimensional object in the current frame according to the vertex correspondence relationship, a pose of the target three- dimensional object in the previous frame and a perspective n-point (PnP) algorithm”, examiner disagrees.

Leonard discloses:

initial camera pose is denoted P0, Pi is the latest camera pose, [0034], [indicates pose of the target three- dimensional object in the previous frame]

If we start at pose P0, there is a transformation computed between the camera frame at P0 and the next camera frame (using some odometry estimation method) that informs us of the position of the next camera frame relative to P0, [0035], 

“camera pose P associated with each cloud slice, denoted CjP, is chosen as a starting point”, [0051], [interpreted as pose of the target three- dimensional object in the previous frame]

“The present embodiment uses the uncorrected camera trajectory P as an initial condition for the deformation graph i.e. deformation graph node coefficients are set to null, and the positions of each camera pose in the corrected trajectory P', derived using iSam as described above, as the final desired vertex positions in the deformation. Deformation graph optimisation attempts to define deformation graph node coefficients which pull the uncorrected map vertices towards a corrected map around the optimised camera trajectory”, [0056], 

vertices transformed, [0069].

The term corrected indicates “according to the vertex correspondence relationship”. 

The term “transformed” indicates a transformation that is similar to a perspective n-point (PnP) algorithm, however, as noted, examiner agrees Leonard does not disclose the term “perspective n-point (PnP) algorithm”.

Levinshtein et al. teach using a perspective n-point (PnP) algorithm. Applicant argues “Levinshtein teaches performing feature point matching, rather than graph matching”.  Levinshtein et al. teach the concept of using PnP to find the pose. Examiner agrees Levinshtein et al. do not explicitly disclose graph matching, but as noted, this limitation is disclosed by Leonard.

All other prior art arguments are by similarity or dependency and are therefore addressed by the above.
Applicant’s arguments, see page 6, filed April 20, 2021, with respect to the 35 USC 101 rejection of claim 11 along with accompanying amendment have been fully considered and are persuasive.  The 35 USC 101 rejection of claim 11 has been withdrawn. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 5, 6, 9, 10 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Leonard et al. (US 20160071278 A1) in view of Levinshtein et al. (US 10380763 B2).

Regarding claims 1 and 6, Leonard et al. disclose method for three-dimensional object pose estimation, comprising, and terminal device, comprising: one or more processors; a memory, configured to store one or more programs; wherein when the one or more programs are executed by the one or more processors, the one or more processors are caused to: calculating a graph of a previous frame and a graph of a current frame for a target three-dimensional object (graph where each edge contains a rigid transformation between individual camera poses, [0036], “A deformation graph comprising a set of deformation nodes can be created for example, as a uniformly sampled subset of the vertices that make up the 3D model”, [0048]); performing a matching calculation on the graph of the previous frame and the graph of the current frame by using a graph matching algorithm to obtain a vertex correspondence relationship between the graph of the previous frame and the graph of the current frame (point cloud having a plurality of vertices, each vertex comprising an (x,y,z)-tuple, sequence of estimated sensor locations is sampled to provide a pose graph (P) comprising a linked sequence 

Leonard et al. do not use the term “perspective n-point (PnP) algorithm”.

Levinshtein et al. teach calculating a pose of the target three-dimensional object in the current frame according to the vertex correspondence relationship, a pose of the target three-dimensional object in the previous frame and a perspective n-point (PnP) algorithm (“For rich feature objects, a common approach is to establish sparse point correspondences between features on a 3D model and features in the image and solve the Perspective-n-Point (PnP) problem”, col. 2, lines 5-15, “Data containing (a) a preceding pose of the object, (b) preceding image first features detected in a preceding image frame prior to the image frame and corresponding to the preceding pose of the object, and (c) first 3D points corresponding to the preceding image first features is retrieved from one or more memories. The image first features and the preceding image first features are matched to establish first correspondences between the image first features and the first 3D points, and a candidate pose of the object corresponding to the image frame is derived based on the preceding pose of the object and the first correspondences”, col. 2, lines 30-50, “First corners are extracted for an object in an image frame and back-projected unto the 3D model given the current pose.  In successive frames, the corners are tracked using a Kanade-Lucas-Tomasi (KLT) algorithm and the pose is calculated by solving the PnP problem… After the KLT-based corner tracking 3D points and the tracked 2D points are used to calculate pose by solving PnP problem at each image frame, the pose is further refined using a cost function”, col. 13, lines 5-25, 3D pose in frame j, col. 14, lines 45-50).

Leonard et al. and Levinshtein et al. are in the same art of pose calculation (Leonard et al., abstract; Levinshtein et al., abstract). The combination of Levinshtein et al. with Leonard et al. enables using a PnP algorithm. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the PnP algorithm of Levinshtein et al. with the pose calculation of Leonard et al. as this was known at the time of filing, the combination would have predictable results, and as this is known as an accurate and low error method of determining an object pose. 

Regarding claims 4 and 9, Leonard et al. and Levinshtein et al. disclose the method and system of claims 1 and 6. Leonard et al. and Levinshtein et al. further disclose the performing a matching calculation on the graph of the previous frame and the graph of the current frame using a graph matching algorithm to obtain a vertex correspondence relationship between the graph of the previous frame and the graph of the current frame, comprises: inputting the graph of the previous frame and the graph of the current frame into a model of the graph matching algorithm to perform the matching calculation on the graph of the previous frame and the graph of the current frame; outputting the vertex correspondence relationship between the graph of the previous frame and the graph of the current frame (Leonard et al., “Between each consecutive pose there exists a transformation or a "constraint".  If we start at pose P0, there is a transformation computed between the camera frame at P0 and the next camera frame (using some odometry estimation method) that informs us of the position of the next camera frame relative to P0”, [0035]; Levinshtein et al., “Data containing (a) a preceding pose of the object, (b) preceding image first features detected in a preceding image frame prior to the image frame and corresponding to the preceding pose of the object, and (c) first 3D points 

Regarding claims 5 and 10, Leonard et al. and Levinshtein et al. disclose the method and system of claims 1 and 6. Levinshtein et al. further disclose the calculating a pose of the target three-dimensional object in the current frame according to the vertex correspondence relationship, a pose of the target three-dimensional object in the previous frame and a PnP algorithm, comprises: inputting the vertex correspondence relationship and the pose of the target three-dimensional object in the previous frame into a model of the PnP algorithm to calculate the pose of the target three-dimensional object in the current frame; outputting the pose of the target three-dimensional object in the current frame (“For initial detection of rich feature objects, some known systems establish sparse point correspondences between features on a 3D model of the object and features in the image and solve the PnP problem based on the established correspondences.  These correspondences are typically established by detecting 

Regarding claim 11, Leonard et al. and Levinshtein et al. disclose the method and system of claims 1 and 6. Leonard et al. and Levinshtein et al. further disclose a computer readable storage medium having a computer program stored thereon, wherein the program is executed by a processor to implement the method of claim 1 (Leonard et al., computer program product stored on a computer readable medium which when executed on a computer system performs the steps of claim 1, [0019]; Levinshtein et al., computer programs include computer programs for realizing tracking processing and AR display processing, col. 6, lines 35-40).

Claims 2 and 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Leonard et al. (US 20160071278 A1) and Levinshtein et al. (US 10380763 B2) as applied to .

Regarding claims 2 and 7, Leonard et al. and Levinshtein et al. disclose the method and system of claims 1 and 6. Leonard et al. and Levinshtein et al. do not disclose the calculating a graph of a previous frame and a graph of a current frame for the target three-dimensional object, comprises: obtaining a mask image of the previous frame for the target three-dimensional object; extracting feature points of the target three-dimensional object in an image of the previous frame in a region with a pixel value of 1 in the make image of the previous frame, and extracting the feature points of the target three-dimensional object in an image of the current frame in the region with the pixel value of 1 in the mask image of the previous frame;  connecting adjacent feature points corresponding to the image of the previous frame to form the graph of the previous frame;  connecting adjacent feature points corresponding to the image of the current frame to form the graph of the current frame;  wherein vertices in the graph of the previous frame and the graph of the current frame are the feature points, and a weight of an edge is an average value of response values of two feature points corresponding to the edge. 

Ward et al. teach the calculating a graph of a previous frame and a graph of a current frame for the target three-dimensional object, comprises: obtaining a mask image of the previous frame for the target three-dimensional object; extracting feature points of the target three-dimensional object in an image of the previous frame in a region with a pixel value of 1 in the make image of the previous frame, and extracting the feature points of the target three-dimensional object in an image of the current frame in the region with the pixel value of 1 in 

Leonard et al. and Ward et al. are in the same art of graph information (Leonard et al., abstract; Ward et al., claim 8, [0048]-[0050]). The combination of Ward et al. with Leonard et al. and Levinshtein et al. enables using a weight of an edge is an average value of response values of two feature points corresponding to the edge. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the a weight of Ward et al. with the pose calculation of Leonard et al. and Levinshtein et al. as this was known at the time of filing, the combination would have predictable results, and as Ward et al. indicate, “Although the feature points that were recovered for each frame in the scene could be utilized to set the segment depths directly, this is not desirable since the feature points are typically far sparser than the segments; therefore, depths would need to be inferred for segments that have no corresponding feature point.  Rather, the depth initialization stage 136 utilizes the SFM results 134 to set segments to depths that are probable given the recovered feature points while 

Leonard et al. and Levinshtein et al. and Ward et al. do not disclose obtaining a mask image of the previous frame for the target three-dimensional object; extracting feature points of the target three-dimensional object in an image of the previous frame in a region with a pixel value of 1 in the make image of the previous frame, and extracting the feature points of the target three-dimensional object in an image of the current frame in the region with the pixel value of 1 in the mask image of the previous frame.

Malon et al. teach obtaining a mask image of the previous frame for the target three-dimensional object; extracting feature points of the target three-dimensional object in an image of the previous frame in a region with a pixel value of 1 in the make image of the previous frame, and extracting the feature points of the target three-dimensional object in an image of the current frame in the region with the pixel value of 1 in the mask image of the previous frame;  connecting adjacent feature points corresponding to the image of the previous frame to form the graph of the previous frame;  connecting adjacent feature points corresponding to the image of the current frame to form the graph of the current frame;  wherein vertices in the graph of the previous frame and the graph of the current frame are the feature points (“In block 23, the detector generates a skeleton graph by applying a thinning algorithm to the binary mask or the eroded gland mask, which produces a skeleton image of the mask.  In one exemplary embodiment, the thinning algorithm can be a Hilditch thinning algorithm.  The detector then computes a path junction graph of the skeleton image which results in the skeleton graph. As described earlier, the path junction graph is an undirected graph, in which 

Leonard et al. and Ward et al. and Malon et al. are in the same art of graph information (Leonard et al., abstract; Ward et al., claim 8, [0048]-[0050]; Malon et al., [0007], [0028]). The combination of Malon et al. with Leonard et al. and Levinshtein et al. and Ward et al. enables using a binary mask. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the mask of Malon et al. with the pose calculation of Leonard et al. and Levinshtein et al. and Ward et al. as this was known at the time of filing, the combination would have predictable results, and as Malon et al. indicate this is an improved method for detecting object structures ([0005]), which is a benefit to the object pose detection of Leonard et al..

Claims 3 and 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Leonard et al. (US 20160071278 A1) and Levinshtein et al. (US 10380763 B2) and Ward et al. (US 20100111417 A1) and Malon et al. (US 20110293165 A1) as applied to claims 2 and 7 above, further in view of Hampel et al. (US 8948450 B2).

Regarding claims 3 and 8, Leonard et al. and Levinshtein et al. and Ward et al. and Malon et al. disclose the method and system of claims 2 and 7. Leonard et al. and Levinshtein et al. and Ward et al. and Malon et al. do not disclose the extracting feature points of the target three-dimensional object in an image of the previous frame in a region with a pixel value of 1 in the mask image of the previous frame, and extracting the feature points of the target three-

Hampel et al. teach extracting feature points of the target three-dimensional object in an image of the previous frame in a region with a pixel value of 1 in the mask image of the previous frame, and extracting the feature points of the target three-dimensional object in an image of the current frame in the region with the pixel value of 1 in the mask image of the previous frame, comprises: using a SIFT algorithm to extract the feature points of the target three-dimensional object in the previous frame image in the region with the pixel value of 1 in the mask image of the previous frame, and using the SIFT algorithm to extract the feature points of the target three-dimensional object in the current frame image in the region with the pixel value of 1 in the mask image of the previous frame (“In accordance with the method and system, an object detection algorithm based on a Gaussian mixture model and expanded object tracking based on Mean-Shift are combined with each other in object detection. The object detection is expanded in accordance with a model of the background by improved removal of shadows, the binary mask generated in this way is used to create an asymmetric filter core, and then the actual algorithm for the shape-adaptive object tracking, expanded by a segmentation step for adapting the shape, is initialized, and therefore a determination at least of the object shape or object contour or the orientation of the object in space is made possible”, abstract, “By simple morphological operations (see 4) small deviations, often caused by noise and false 

Leonard et al. Hampel et al. are in the same art of modeling (Leonard et al., [0048]; Hampel et al., abstract). The combination of Hampel et al. with Leonard et al. and Levinshtein et al. and Ward et al. and Malon et al. enables using SIFT. It would have been obvious at the time of filing to one of ordinary skill in the art to combine SIFT as described by Hampel et al. with the pose calculation of Leonard et al. and Levinshtein et al. and Ward et al. and Malon et al. as this was known at the time of filing, the combination would have predictable results, and as Hampel et al. indicate this allows for improved removal of shadows and shape-adaptive object tracking (abstract), which will be beneficial when tracking a variety of shapes in different backgrounds as may be required in the tracking of Leonard et al.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084.  The examiner can normally be reached on 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT M RUDOLPH can be reached on (571)272-8243.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHELLE M ENTEZARI/Primary Examiner, Art Unit 2661