DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 4-5, 9-10, 14-15, and 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim’301 (PGPUB: 20190164301) in view of Kim’321 (PGPUB: 20200247321), and further in view of Xiao (PGPUB: 20180174046). 

Regarding claims 1 and 14, Kim’301 teaches a method for estimating a 3D pose of an actual object, said method comprising: 
obtaining a 2D image of the actual object using a 2D camera (see, paragraph 9, estimate the pose of the target moving object included in one image captured by the camera by using the pose estimation model); 
extracting a plurality of features on the actual object from the 2D image using a neural network (see paragraph 9, a data extraction module configured to extract learning data associated with pose estimation of the moving object from the 2D learning image with the 3D mesh model fitted thereto and to store the extracted learning data); 
providing a feature point image that combines the feature points from the 2D image (see Fig. 2, paragraph 73, the fitting module 140_3 may select 2D feature points corresponding to 3D coordinates of 3D feature points transferred from the feature point determination module 140_1 and may fit the 3D mesh model to the 2D learning image so as to match the 3D feature points with the selected 2D feature points); and
 	estimating the 3D pose of the actual object using the feature point image (see Fig. 5 and 6, paragraph 106 and 107, a process of fitting the 3D mesh model to the 2D learning image by using the rotation conversion matrix and the parallel movement conversion matrix so as to match the 3D feature points with the selected 2D feature points may be performed; a pose estimation model learned to estimate a pose of a moving object may be built by using learning data processed based on a 3D mesh model, and thus, the pose of the moving object may be estimated based on one image including the moving object).
However, Kim’301does not expressly teach generating a heatmap for each of the extracted features that identify the probability of a location of a feature point on the actual object.
Kim’321 teaches that the position adjusting device performs a process of instructing the pose estimation network to (i) generate each of one or more feature tensors by extracting one or more features from each of the upper body image and the see paragraph 15); the position adjusting device instructs the pose estimation network to connect pairs respectively having highest mutual connection probabilities of being connected among pairs of the extracted keypoints by referring to each of the part affinity fields, to thereby group the extracted keypoints, via the keypoint grouping layer (see paragraph 19); the body keypoint detector 110 may input each of the feature tensors into the keypoint heatmap & part affinity field extractor 112, to thereby instruct the keypoint heatmap & part affinity field extractor 112 to generate (i) each of keypoint heatmaps corresponding to each of the feature tensors and (ii) each of part affinity fields which are vector maps representing relations between the keypoints. Herein, each of the part affinity fields may be a map showing connections of a specific keypoint with other keypoints, and may be a map representing each of mutual connection probabilities of each of the keypoints in each of keypoint heatmap pairs. And, a meaning of the "heatmap" may represent a combination of heat and a map, which may graphically show various information that can be expressed by colors as heat-like distribution on an image (see Fig. 1, paragraph 58).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Kim’301 by Kim’321 for providing generate 
However, the combination does not expressly teach combines the feature points from the heatmaps.
Xiao teaches that acquiring a target detection result by use of the second neural network and based on the combined feature information, wherein the number of layers of the second neural network is larger than the number of layers of the first neural network, the first feature information is heatmap feature information, and the second feature information is picture feature information (see Fig. 1, paragraph 32); first feature information of the to-be-detected image is acquired by use of a first neural network that has been trained in advance. As described above, the first feature information is heatmap feature information. More specifically, the heatmap is used to indicate a probability for that each pixel dot belongs to a target (see Fig. 1, paragraph 62).
It would have been obvious to one of ordinary skill in the art before the effective 

Regarding claims 2 and 15, the combination teaches wherein estimating the 3D pose of the actual object includes comparing the feature point image to a 3D virtual model of the object (see paragraph 9, a three-dimensional (3D) mesh model obtained by previously modeling a general shape of a moving object and a two-dimensional (2D) learning image captured by photographing a real shape of the moving object; a feature point determination module configured to determine 3D feature points in the 3D mesh model, based on a user input received through the input interface; a fitting module configured to fit the 3D mesh model to the 2D learning image).  
 
Regarding claims 4 and 17, the combination teaches wherein the probability of a location of the feature point in the heatmap is shown as color on the heatmap (see Kim’321, Fig. 1, paragraph 19 and 58 the position adjusting device instructs the pose estimation network to connect pairs respectively having highest mutual connection probabilities of being connected among pairs of the extracted keypoints by referring to each of the part affinity fields, to thereby group the extracted keypoints, via the keypoint grouping layer; a meaning of the "heatmap" may represent a combination of heat and a map, which may graphically show various information that can be expressed by colors as heat-like distribution on an image).  

Regarding claims 5 and 18, the combination teaches wherein the probability of a location of the feature point on the actual object is assigned a confidence value that it is a feature point (see Xiao, Fig. 1, paragraph 62, the first feature information is heatmap feature information. More specifically, the heatmap is used to indicate a probability for that each pixel dot belongs to a target; see Kim’321, paragraph 58, each of part affinity fields which are vector maps representing relations between the keypoints. Herein, each of the part affinity fields may be a map showing connections of a specific keypoint with other keypoints, and may be a map representing each of mutual connection probabilities of each of the keypoints in each of keypoint heatmap pairs).  

Regarding claim 9, the combination teaches wherein extracting a plurality of features on the actual object includes extracting at least four features (see Kim’301,Fig. 4, paragraph 65, when the 3D mesh model 30 is defined as a front plane 41, a rear plane 42, a left-side plane 43, a right-side plane 44 according to the four viewpoints, the 3D feature points may be determined in at least two adjacent planes of the front plane 41, the rear plane 42, the left-side plane 43, and the right-side plane 44).  

Regarding claims 10 and 20, the combination teaches wherein the method is employed in a robotic system and the actual object is being picked up a robot (see Kim’301, Fig. 1, paragraph 24, processing based on a 3D mesh model obtained by previously modeling a general shape of a moving object and a 2D learning image obtained by photographing a real shape of the moving object and may include feature information robust to pose estimation of the moving object). 
 
Regarding claim 19, the combination teaches further comprising means for training nodes in the neural network from a collected training set using a representative object of the object (see Kim’301, Fig. 1, paragraph 24 and 80, in order to build the pose estimation model, learning data (or training data) needed for learning the pose estimation model may be constructed, deep learning may be used as a learning method. A pose estimation model learned based on deep learning may include a neural network or a deep neural network).


Claims 3, 11-13, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim’301 (PGPUB: 20190164301) in view of Kim’321 (PGPUB: 20200247321), in view of Xiao (PGPUB: 20180174046), and further in view of Grabner (PGPUB: 20200388071).

Regarding claim 11, Kim’301 a method for estimating a 3D pose of an object, said object being picked up by a robot, said method comprising: 
obtaining 2D images of the object using a 2D camera (see, paragraph 9, estimate the pose of the target moving object included in one image captured by the camera by using the pose estimation model); 
extracting a plurality of feature points on the object from the 2D images  using a neural network (see paragraph 9, a data extraction module configured to extract learning data associated with pose estimation of the moving object from the 2D learning image with the 3D mesh model fitted thereto and to store the extracted learning data),
where nodes in the neural network are trained from a collected training set using a representative object of the object (see Fig. 5, paragraph 102, a process of learning a pose estimation model estimating a pose of a target moving object included in one real image by using the learning data may be performed by the processor module 140 or the feature point determination module 140_1. Here, a learning method may be machine learning. The machine learning may be, for example, deep learning which learns a multilayer neural network or a deep neural network); 
providing a feature point image that combines the feature points from the 2D image (see Fig. 2, paragraph 73, the fitting module 140_3 may select 2D feature points corresponding to 3D coordinates of 3D feature points transferred from the feature point determination module 140_1 and may fit the 3D mesh model to the 2D learning image so as to match the 3D feature points with the selected 2D feature points); and 
estimating the 3D pose of the actual object by comparing the feature point image to a 3D virtual model of the object (see Fig. 5 and 6, paragraph 106 and 107, a process of fitting the 3D mesh model to the 2D learning image by using the rotation conversion matrix and the parallel movement conversion matrix so as to match the 3D feature points with the selected 2D feature points may be performed; a pose estimation model learned to estimate a pose of a moving object may be built by using learning data processed based on a 3D mesh model, and thus, the pose of the moving object may be estimated based on one image including the moving object).
However, Kim’301does not expressly teach generating a heatmap for each of the extracted features that identify the probability of a location of a feature point on the actual object, 17 Attorney Docket No. FARL005-US/60721-1/232608wherein the probability of a location of the feature point in the heatmap is shown as color on the heatmap.
Kim’321 teaches that the position adjusting device performs a process of instructing the pose estimation network to (i) generate each of one or more feature tensors by extracting one or more features from each of the upper body image and the lower body image via a feature extractor, (ii) generate each of one or more keypoint heatmaps and one or more part affinity fields corresponding to each of the feature tensors via a keypoint heatmap & part affinity field extractor, and (iii) extract one or more keypoints from each of the keypoint heatmaps and group each of the extracted keypoints by referring to each of the part affinity fields, and thus generate the body keypoints corresponding to the driver, via a keypoint grouping layer (see paragraph 15); the position adjusting device instructs the pose estimation network to connect pairs respectively having highest mutual connection probabilities of being connected among pairs of the extracted keypoints by referring to each of the part affinity fields, to thereby group the extracted keypoints, via the keypoint grouping layer (see paragraph 19); the body keypoint detector 110 may input each of the feature tensors into the keypoint heatmap & part affinity field extractor 112, to thereby instruct the keypoint heatmap & part affinity field extractor 112 to generate (i) each of keypoint heatmaps corresponding to each of the feature tensors and (ii) each of part affinity fields which are vector maps representing relations between the keypoints. Herein, each of the part affinity fields may be a map showing connections of a specific keypoint with other keypoints, and may be a map representing each of mutual connection probabilities of each of the keypoints in each of keypoint heatmap pairs. And, a meaning of the "heatmap" may represent a combination of heat and a map, which may graphically show various information that can be expressed by colors as heat-like distribution on an image (see Fig. 1, paragraph 58).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Kim’301 by Kim’321 for providing generate each of one or more keypoint heatmaps and one or more part affinity fields corresponding to each of the feature tensors via a keypoint heatmap & part affinity field extractor, as generating a heatmap for each of the extracted features, providing the position adjusting device instructs the pose estimation network to connect pairs respectively having highest mutual connection probabilities of being connected among pairs of the extracted keypoints, as generating a heatmap for each of the extracted 
However, the combination does not expressly teach combines the feature points from the heatmaps.
Xiao teaches that acquiring a target detection result by use of the second neural network and based on the combined feature information, wherein the number of layers of the second neural network is larger than the number of layers of the first neural network, the first feature information is heatmap feature information, and the second feature information is picture feature information (see Fig. 1, paragraph 32); first feature information of the to-be-detected image is acquired by use of a first neural network that has been trained in advance. As described above, the first feature information is heatmap feature information. More specifically, the heatmap is used to indicate a probability for that each pixel dot belongs to a target (see Fig. 1, paragraph 62).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by Xiao for providing the combined feature information, the first feature information is heatmap feature 
However, the combination does not expressly teach weights and a 3D virtual model of the object using a perspective-n-point algorithm.
	Grabner teaches that Locations fields explicitly present 3D shape and 3D pose information, because they encode dense correspondences between 2D pixel locations and 3D surface coordinates. In some cases, from these 2D-3D correspondences, the 3D pose of an object in an image can be geometrically recovered using a perspective-n-point (PnP) algorithm (see Fig. 6, paragraph 86); The convolutional backbone and the detection branches of the Location Field CNN can be initialized with weights trained for instance segmentation on the COCO dataset.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by Grabner for providing CNN can be initialized with trained weights, as teaching weight; providing from these 2D-3D correspondences, the 3D pose of an object in an image can be geometrically recovered using a perspective-n-point (PnP) algorithm, as a 3D virtual model of the object using a perspective-n-point algorithm. Therefore, combining the elements from prior arts according to known methods and technique, such as trained weight and 3D pose of an object in an image can be geometrically recovered using a perspective-n-

Regarding claim 12, the combination teaches wherein the probability of a location of the feature on the actual object is assigned a confidence value that it is a feature point (see Xiao, Fig. 1, paragraph 62, the first feature information is heatmap feature information. More specifically, the heatmap is used to indicate a probability for that each pixel dot belongs to a target; see Kim’321, paragraph 58, each of part affinity fields which are vector maps representing relations between the keypoints. Herein, each of the part affinity fields may be a map showing connections of a specific keypoint with other keypoints, and may be a map representing each of mutual connection probabilities of each of the keypoints in each of keypoint heatmap pairs).  

Regarding claim 13, the combination teaches wherein extracting a plurality of features on the actual object includes extracting at least four features (see Kim’301,Fig. 4, paragraph 65, when the 3D mesh model 30 is defined as a front plane 41, a rear plane 42, a left-side plane 43, a right-side plane 44 according to the four viewpoints, the 3D feature points may be determined in at least two adjacent planes of the front plane 41, the rear plane 42, the left-side plane 43, and the right-side plane 44).

Regarding claims 3 and 16, the combination teaches wherein estimating the 3D pose of the actual object (see Fig. 5 and 6, paragraph 106 and 107, a process of fitting the 3D mesh model to the 2D learning image by using the rotation conversion matrix and the parallel movement conversion matrix so as to match the 3D feature points with the selected 2D feature points may be performed; a pose estimation model learned to estimate a pose of a moving object may be built by using learning data processed based on a 3D mesh model, and thus, the pose of the moving object may be estimated based on one image including the moving object).
However, the combination does not expressly teach a 3D virtual model of the object using a perspective-n-point algorithm.
	Grabner teaches that Locations fields explicitly present 3D shape and 3D pose information, because they encode dense correspondences between 2D pixel locations and 3D surface coordinates. In some cases, from these 2D-3D correspondences, the 3D pose of an object in an image can be geometrically recovered using a perspective-n-point (PnP) algorithm (see Fig. 6, paragraph 86); The convolutional backbone and the detection branches of the Location Field CNN can be initialized with weights trained for instance segmentation on the COCO dataset.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by Grabner for providing from these 2D-3D correspondences, the 3D pose of an object in an image can be geometrically recovered using a perspective-n-point (PnP) algorithm, as a 3D virtual model of the object using a perspective-n-point algorithm. Therefore, combining the elements from prior arts according to known methods and technique, such as trained weight and 3D pose of an object in an image can be geometrically recovered using a .


Allowable Subject Matter
Claims 6-8 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Response to Arguments
	Applicant's arguments filed 12/13/2021 have been fully considered but they are not persuasive. 
In page 9, lines 16-21, applicant argues that since Kim '301 does not teach generating heatmaps for each identified feature point as discussed, it is clearly not possible that Kim '301 could teach or suggest combining all of such heatmaps with a 2D image of an object whose pose is being determined to provide a feature point image, as claimed. Therefore, Applicant respectively submits that Kim '301 does not provide the teaching necessary to make Applicant's claimed invention obvious; Applicant submits that Kim '321 fails to provide the teaching missing from Kim '301 to make Applicant's claimed invention obvious (see page 10, line 14-15). 
Examiner respectfully disagrees. Kim’321 indeed teaches that the body keypoint detector 110 may input each of the feature tensors into the keypoint heatmap & part affinity field extractor 112, to thereby instruct the keypoint heatmap & part affinity field see Fig. 3, paragraph 58). The teaching, “the body keypoint detector 110 may input each of the feature tensors into the keypoint heatmap & part affinity field extractor 112, to thereby instruct the keypoint heatmap & part affinity field extractor 112 to generate (i) each of keypoint heatmaps corresponding to each of the feature tensors”, from Kim’321 teaches the limitation of “generating heatmaps for each identified feature point”. 

In page 10, lines 1-7, applicant argues that applicant respectively submits that Kim '321 does not teach or suggest obtaining an actual 2D image of an object whose pose is being determined, identifying features in that image using a neural network, generating a separate heatmap for each identified feature and then combining all of the heatmaps with that same image to provide a feature point image that is used to identify the 3D pose of the object. Therefore, Applicant submits that Kim '321 fails to provide the teaching missing from Kim '301 to make Applicant's claimed invention obvious. 
Examiner respectfully disagrees. At first, “obtaining an actual 2D image of an object whose pose is being determined” is not recited in the claim. Kim’301 teaches to estimate the pose of the target moving object included in one image captured by the camera by using the pose estimation model (see paragraph 9) and obtaining learning see paragraph 7), as obtaining a 2D image of the actual object using a 2D camera. Kim’301 further teaches that a process of learning a pose estimation model estimating a pose of a target moving object included in one real image by using the learning data may be performed by the processor module 140 or the feature point determination module 140_1. Here, a learning method may be machine learning. The machine learning may be, for example, deep learning which learns a multilayer neural network or a deep neural network, as identifying features in that image using a neural network. As mentioned above, Kim’321 teaches generating a separate heatmap for each identified feature and then combining all of the heatmaps with that same image. Kim’301 further teaches that the fitting module 140_3 may select 2D feature points corresponding to 3D coordinates of 3D feature points transferred from the feature point determination module 140_1 and may fit the 3D mesh model to the 2D learning image so as to match the 3D feature points with the selected 2D feature points (see Fig. 2, paragraph 73), to provide a feature point image that is used to identify the 3D pose of the object. Therefore, one skill in art to combine Kim’301 and Kim’321 would yield the result of “obtaining an actual 2D image of an object whose pose is being determined, identifying features in that image using a neural network, generating a separate heatmap for each identified feature and then combining all of the heatmaps with that same image to provide a feature point image that is used to identify the 3D pose of the object”.

In page 10, lines 14-15, applicant argues that applicant submits that Xiao fails to provide the teaching missing from Kim '301 to make Applicant's claimed invention 
Examiner respectfully disagrees. Xiao teaches that acquiring a target detection result by use of the second neural network and based on the combined feature information, wherein the number of layers of the second neural network is larger than the number of layers of the first neural network, the first feature information is heatmap feature information, and the second feature information is picture feature information (see Fig. 1, paragraph 32); first feature information of the to-be-detected image is acquired by use of a first neural network that has been trained in advance. As described above, the first feature information is heatmap feature information. More specifically, the heatmap is used to indicate a probability for that each pixel dot belongs to a target (see Fig. 1, paragraph 62).
Xiao for providing the combined feature information, the first feature information is heatmap feature information, and the heatmap is used to indicate a probability for that each pixel dot belongs to a target, the second feature information is picture feature information, as teaches combines the feature points from the heatmaps. Therefore, combining the elements from prior arts according to known methods and technique, such as combining the first feature information that is heatmap feature information, would yield predictable results.


Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIN JIA whose telephone number is (571)270-5536.  The examiner can normally be reached on 9:00 am-7:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-






/XIN JIA/Primary Examiner, Art Unit 2667