DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-18 are rejected under 35 U.S.C. 103 as being unpatentable over Guleryuz US-PGPUB No. 2019/0180473 (hereinafter Guleryuz) in view of Hebbalaguppe et al. US PGPUB No. 2019/0107894 (hereinafter Hebbalaguppe). 
Re Claim 1: 
Guleryuz teaches a method for automatically generating labeled data of a hand, comprising: 
acquiring at least three images to be processed of the hand under different angles of view ( 
Guleryuz implicitly teaches the claim limitation. 
Guleryuz implicitly teaches capturing a collection of images of hand from different viewpoints/poses of the camera. 
Guleryuz teaches at Paragraph 0048 that the 3D skeleton model of the hand is used to determine a 3D key point 1020 that corresponds to the same location in the hand as the key point 1015 and at Paragraph 0051 that the 3D skeleton model is generated according to embodiments of the method 800 shown in FIG. 8 and the method 900 shown in FIG. 9. Guleryuz teaches at FIG. 8, Step 805 and Step 810 capturing 2D images of hand in training set of poses and identifying key points in 2D images of hand and at Paragraph 0047 that images captured by the camera are projected onto the image plane 1005. Characteristics of the camera also determine a vanishing point 1010 that is an abstract point on the image plane 1005 where 2D projects of parallel lines in 3D space appear to converge. 
Guleryuz teaches at Paragraph 0020 that the learning phase includes generating one or more lookup tables (LUTs) 230 using training images of the hand 205 and the lengths of the phalanxes of the fingers and thumb are determined from the set of training images of the hand 205. Guleryuz teaches at Paragraph 0024 that values of the parameters that define the palm triangle 300 are learned using 2D images of the hand and at Paragraph 0037 that 2D images of a hand positioned in a training set of poses are captured and at Paragraph 0038 that the processor identifies key-points in the 2D images of the hand and at Paragraph 0051 that the 3D skeleton model is generated according to embodiments of the method 800 shown in FIG. 8); 
detecting key points on the at least three images to be processed respectively; screening the detected key points by using an association relation among the at least three images to be processed (Applicant’s specification discloses at Paragraph 0049 of the instant application publication that “a frame of image is a collection of images from different angles of view”. 
Guleryuz implicitly teaches capturing a collection of 2D images of hand from different viewpoints/poses of the camera and the association relation of the key points among the captured 2D images. 
Guleryuz teaches at Paragraph 0048 that the 3D skeleton model of the hand is used to determine a 3D key point 1020 that corresponds to the same location in the hand as the key point 1015 and at Paragraph 0051 that the 3D skeleton model is generated according to embodiments of the method 800 shown in FIG. 8 and the method 900 shown in FIG. 9. Guleryuz teaches at FIG. 8, Step 805 and Step 810 capturing 2D images of hand in training set of poses and identifying key points in 2D images of hand and at Paragraph 0047 that images captured by the camera are projected onto the image plane 1005. Characteristics of the camera also determine a vanishing point 1010 that is an abstract point on the image plane 1005 where 2D projects of parallel lines in 3D space appear to converge. 
Guleryuz teaches at Paragraph 0020 that the learning phase includes generating one or more lookup tables (LUTs) 230 using training images of the hand 205 and the lengths of the phalanxes of the fingers and thumb are determined from the set of training images of the hand 205. Guleryuz teaches at Paragraph 0024 that values of the parameters that define the palm triangle 300 are learned using 2D images of the hand and at Paragraph 0037 that 2D images of a hand positioned in a training set of poses are captured and at Paragraph 0038 that the processor identifies key-points in the 2D images of the hand), the association relation being that the at least three images to be processed are from the same frame of image of the hand under different angles of view (
Guleryuz implicitly teaches the claim limitation. Guleryuz teaches at Paragraph 0034 that one or more of the key points that are derived from the LUT 600 for one 3D pose are the same or similar to one or more of the key points that are derived from the LUT 600 for another 3D pose and a confidence score is derived for the dissimilar poses that can result from the same set of projected 2D coordinates. Guleryuz teaches at FIG. 7 that relationship of the 2D coordinates of (circles 1, 2, 3, 4 and 5 in FIG. 6) that define the position of the skeleton model 705/7110/715/720/725 of the finger in the finger pose plane. 
Guleryuz teaches at Paragraph 0047 that the images captured by the camera are projected onto the image plane 1005 and at Paragraph 0038 that the processor identifies key-points in the 2D images of the hand and at Paragraph 0039 that the processor determines lengths of phalanxes in the fingers and thumb of the hand based on the key points of the 2D images of the hand. 
Guleryuz teaches at Paragraph 0043 that the processor learns orientations of the palm triangle and the thumb triangle); 
reconstructing a three-dimensional space representation of the hand with regard to the key points screened on the same frame of image, in combination with a given finger bone length (
Applicant’s specification discloses at Paragraph 0049 of the instant application publication that “a frame of image is a collection of images from different angles of view”. 
Guleryuz implicitly teaches capturing a collection of 2D images of hand from different viewpoints/poses of the camera and the association relation of the key points among the captured 2D images. 
Guleryuz teaches at Paragraph 0051 that the 3D skeleton model is generated according to embodiments of the method 800 shown in FIG. 8. Guleryuz teaches at Paragraph 0052 that the processor identifies a first set of 3D key-points that are compliant with the 3D skeleton model of the hand and at Paragraph 0053 that the processor identifies second 3D key-points based on the first 3D key-points and a vanishing point associated with the image…the vanishing point is determined based on characteristics of a camera that acquired the 2D image….the second set of 3D key-points includes the camera-compliant key-point 1030. 
Guleryuz teaches at Paragraph 0042 that the processor can compare lengths of the phalanxes of the fingers and thumb in the skeleton model to lengths of the corresponding phalanxes in the 2D image to account for perspective projection and de-project the 2D image of the hand. 
Guleryuz teaches at Paragraph 0048-0049 that a 3D skeleton model of the hand is lifted from the 2D image on the basis of the noisy key-point 1015 extracted from the 2D image. The 3D skeleton model of the hand is used to determine a 3D key-point 1020 that corresponds to the same location in the hand as the key-point 1015. Guleryuz teaches at Paragraph 0049 that a modified 3D key-point 1030 is therefore determined by projecting the skeleton-compliant key-point 1020 onto the line 1025); 
projecting the key points on the three-dimensional representation of the hand onto the at least three images to be processed (Applicant’s specification discloses at Paragraph 0049 of the instant application publication that “a frame of image is a collection of images from different angles of view”. 
Guleryuz implicitly teaches capturing a collection of 2D images of hand from different viewpoints/poses of the camera and the association relation of the key points among the captured 2D images. 
Guleryuz teaches at Paragraph 0042 that the processor can compare lengths of the phalanxes of the fingers and thumb in the skeleton model to lengths of the corresponding phalanxes in the 2D image to account for perspective projection and de-project the 2D image of the hand. Guleryuz teaches at Paragraph 0024 that values of the parameters that define the palm triangle 300 are learned using 2D images of the hand and at Paragraph 0037 that 2D images of a hand positioned in a training set of poses are captured and at Paragraph 0038 that the processor identifies key-points in the 2D images of the hand. 
Guleryuz teaches at Paragraph 0049 that a modified 3D key-point 1030 is therefore determined by projecting the skeleton-compliant key-point 1020 onto the line 1025); and 
generating the labeled data of the hand on the images to be processed by using the projected key points on the at least three images to be processed (
Applicant’s specification discloses at Paragraph 0049 of the instant application publication that “a frame of image is a collection of images from different angles of view”. 
Guleryuz implicitly teaches capturing a collection of 2D images of hand from different viewpoints/poses of the camera and the association relation of the key points among the captured 2D images. 
Guleryuz teaches at Paragraph 0021 that the processor determines a 3D pose and location of the hand 205 using locations of the key-points to access 2D coordinates of the fingers and thumb from the LUTs 230 which stores the 2D coordinates of each finger and thumb as a function of a relative location of the fingertip and the palm knuckle and at Paragraph 0022 that the processor 225 then modifies the 3D locations of the key-points indicated by the skeleton model based on projections of the 3D locations of the key-points into an image plane along a line connecting the original noisy key-points to a vanishing point associated with the 2D image. 
Guleryuz teaches at Paragraph 0046 and block 903 of FIG. 9 that the processor generates a 3D skeleton model that represents the 3D pose of the hand and at Paragraph 0052 that the first set of 3D key points represents key points corresponding to tips of the fingers and thumb, joints of the fingers and thumb, palm knuckles of the fingers and thumb and a wrist location defined by the 3D skeleton model of the hand and at Paragraph 0055 that an updated 3D skeleton model is generated on the basis of the modified values of the noisy key points).

Hebbalaguppe teaches the claim limitation: acquiring at least three images to be processed of the hand under different angles of view (
Hebbalaguppe teaches at Paragraph 0032 that the media stream captured by the RGB camera in user’s FPV (first person view) and at Paragraph 0044 that using a large-scale 3D hand pose dataset having a plurality of training sample RGB images….the camera location may be chosen randomly in spherical vicinity around the hand for each frame and at Paragraph 0040 that temporal information includes a plurality of key-points on hand present in the user’s field of view (FoV) in the frames. The plurality of key-points includes 21 hand key-points comprises 4 key points per finger and one key-point close to wrist of the user’s hand. The gesture recognition system detects the plurality of key-points an learns/estimates a plurality of network-implicit 3D articulation prior having the plurality of key points of sample user’s hands from sample RGB images using the deep learning network…RGB images such as images 130, 132, 134 are received at the gesture recognition system at 502. The gesture recognition system may include the hand pose estimation module 502 for estimating temporal information associated the gesture); 
detecting key points on the at least three images to be processed respectively; screening the detected key points by using an association relation among the at least three images to be processed (
Hebbalaguppe teaches at Paragraph 0040 that temporal information includes a plurality of key-points on hand present in the user’s field of view (FoV) in the frames. The plurality of key-points includes 21 hand key-points comprises 4 key points per finger and one key-point close to wrist of the user’s hand. The gesture recognition system detects the plurality of key-points an learns/estimates a plurality of network-implicit 3D articulation prior having the plurality of key points of sample user’s hands from sample RGB images using the deep learning network…RGB images such as images 130, 132, 134 are received at the gesture recognition system at 502. The gesture recognition system may include the hand pose estimation module 502 for estimating temporal information associated the gesture), the association relation being that the at least three images to be processed are from the same frame of image of the hand under different angles of view (Hebbalaguppe teaches at FIG. 7 and Paragraph 0054-0055 that the 21 key points detected by the hand pose detection module are shown as an overlay on the input images while testing the gesture recognition system. 
Hebbalaguppe teaches at Paragraph 0045 that the processor determines the 2D finger coordinates of the fingers and thumb based on the LUTs and relative locations of the tips of the fingers and the corresponding palm knuckle and at Paragraph 0045 the first layer includes a LSTM layer…to learn long-term dependencies and patterns in 3D coordinates sequence of 21 key-points detected on the user’s hand);. 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Hebbalaguppe’s teaching of capturing three or more 2D images of the hand as input for the gesture recognition of the hand into Guleryuz’s system of generating a 3D skeleton model of hand based on the gesture recognition (learning orientations of finger pose planes) by identifying the 3D coordinates of the 3D key points of the 3D skeleton model. One of the ordinary skill in the art would have been motivated to have provided the 2D images captured from the different viewpoints of the camera to have collected to the 2D key 

Re Claim 2: 
The claim 2 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the at least three images to be processed are taken by a camera set, and the method further comprises: calibrating the camera set and obtaining camera internal and external parameters of the camera set. 
Hebbalaguppe and Guleryuz further teach the claim limitation that the at least three images to be processed are taken by a camera set, and the method further comprises: calibrating the camera set and obtaining camera internal and external parameters of the camera set (Hebbalaguppe teaches at Paragraph 0032 that the media stream captured by the RGB camera in user’s FPV (first person view) and at Paragraph 0044 that using a large-scale 3D hand pose dataset having a plurality of training sample RGB images….the camera location may be chosen randomly in spherical vicinity around the hand for each frame and at Paragraph 0040 that temporal information includes a plurality of key-points on hand present in the user’s field of view (FoV) in the frames. The plurality of key-points includes 21 hand key-points comprises 4 key points per finger and one key-point close to wrist of the user’s hand. The gesture recognition system detects the plurality of key-points an learns/estimates a plurality of network-implicit 3D articulation prior having the plurality of key points of sample user’s hands from sample RGB images using the deep learning network…RGB images such as images 130, 132, 134 are received at the gesture recognition system at 502. The gesture recognition system may include the hand pose estimation module 502 for estimating temporal information associated the gesture. 
Applicant’s specification discloses at Paragraph 0049 of the instant application publication that “a frame of image is a collection of images from different angles of view”. 
Guleryuz implicitly teaches capturing a collection of 2D images of hand from different viewpoints/poses of the camera and the association relation of the key points among the captured 2D images. 
Guleryuz teaches at Paragraph 0048 that the 3D skeleton model of the hand is used to determine a 3D key point 1020 that corresponds to the same location in the hand as the key point 1015 and at Paragraph 0051 that the 3D skeleton model is generated according to embodiments of the method 800 shown in FIG. 8 and the method 900 shown in FIG. 9. Guleryuz teaches at FIG. 8, Step 805 and Step 810 capturing 2D images of hand in training set of poses and identifying key points in 2D images of hand and at Paragraph 0047 that images captured by the camera are projected onto the image plane 1005. Characteristics of the camera also determine a vanishing point 1010 that is an abstract point on the image plane 1005 where 2D projects of parallel lines in 3D space appear to converge. 
Guleryuz teaches at Paragraph 0020 that the learning phase includes generating one or more lookup tables (LUTs) 230 using training images of the hand 205 and the lengths of the phalanxes of the fingers and thumb are determined from the set of training images of the hand 205. Guleryuz teaches at Paragraph 0024 that values of the parameters that define the palm triangle 300 are learned using 2D images of the hand and at Paragraph 0037 that 2D images of a hand positioned in a training set of poses are captured and at Paragraph 0038 that the processor identifies key-points in the 2D images of the hand). 

Re Claim 3: 

Guleryuz further teaches the claim limitation that calculating the finger bone length by: acquiring at least two images of the hand in a frame of image under different angles of view (Guleryuz teaches at Paragraph 0048 that the 3D skeleton model of the hand is used to determine a 3D key point 1020 that corresponds to the same location in the hand as the key point 1015 and at Paragraph 0051 that the 3D skeleton model is generated according to embodiments of the method 800 shown in FIG. 8 and the method 900 shown in FIG. 9. Guleryuz teaches at FIG. 8, Step 805 and Step 810 capturing 2D images of hand in training set of poses and identifying key points in 2D images of hand and at Paragraph 0047 that images captured by the camera are projected onto the image plane 1005. Characteristics of the camera also determine a vanishing point 1010 that is an abstract point on the image plane 1005 where 2D projects of parallel lines in 3D space appear to converge. 
Guleryuz teaches at Paragraph 0020 that the learning phase includes generating one or more lookup tables (LUTs) 230 using training images of the hand 205 and the lengths of the phalanxes of the fingers and thumb are determined from the set of training images of the hand 205. Guleryuz teaches at Paragraph 0024 that values of the parameters that define the palm triangle 300 are learned using 2D images of the hand and at Paragraph 0037 that 2D images of a hand positioned in a training set of poses are captured and at Paragraph 0038 that the processor identifies key-points in the 2D images of the hand); performing gesture recognition for the hand by using the at least two images under different angles of view (Guleryuz teaches at Paragraph 0020 that the learning phase includes generating one or more lookup tables (LUTs) 230 using training images of the hand 205 and the lengths of the phalanxes of the fingers and thumb are determined from the set of training images of the hand 205. Guleryuz teaches at Paragraph 0024 that values of the parameters that define the palm triangle 300 are learned using 2D images of the hand and at Paragraph 0037 that 2D images of a hand positioned in a training set of poses are captured and at Paragraph 0038 that the processor identifies key-points in the 2D images of the hand); performing detection of key points on each hand image in at least two images respectively in the case that the recognized gesture is a predefined simple gesture (Guleryuz teaches at FIG. 8, Step 805 and Step 810 capturing 2D images of hand in training set of poses and identifying key points in 2D images of hand and at Paragraph 0047 that images captured by the camera are projected onto the image plane 1005. Characteristics of the camera also determine a vanishing point 1010 that is an abstract point on the image plane 1005 where 2D projects of parallel lines in 3D space appear to converge); reconstructing a three-dimensional representation of the hand by using the detected key points (Applicant’s specification discloses at Paragraph 0049 of the instant application publication that “a frame of image is a collection of images from different angles of view”. 
Guleryuz implicitly teaches capturing a collection of 2D images of hand from different viewpoints/poses of the camera and the association relation of the key points among the captured 2D images. 
Guleryuz teaches at Paragraph 0052 that the processor identifies a first set of 3D key-points that are compliant with the 3D skeleton model of the hand and at Paragraph 0053 that the processor identifies second 3D key-points based on the first 3D key-points and a vanishing point associated with the image…the vanishing point is determined based on characteristics of a camera that acquired the 2D image….the second set of 3D key-points includes the camera-compliant key-point 1030. 
Guleryuz teaches at Paragraph 0042 that the processor can compare lengths of the phalanxes of the fingers and thumb in the skeleton model to lengths of the corresponding phalanxes in the 2D image to account for perspective projection and de-project the 2D image of the hand. 
Guleryuz teaches at Paragraph 0048-0049 that a 3D skeleton model of the hand is lifted from the 2D image on the basis of the noisy key-point 1015 extracted from the 2D image. The 3D skeleton model of the hand is used to determine a 3D key-point 1020 that corresponds to the same location in the hand as the key-point 1015. Guleryuz teaches at Paragraph 0049 that a modified 3D key-point 1030 is therefore determined by projecting the skeleton-compliant key-point 1020 onto the line 1025); and calculating the finger bone length of the hand according to the three-dimensional key points on the reconstructed three-dimensional representation of the hand (Guleryuz teaches at Paragraph 0042 that the processor can compare lengths of the phalanxes of the fingers and thumb in the skeleton model to lengths of the corresponding phalanxes in the 2D image to account for perspective projection and de-project the 2D image of the hand). 
Re Claim 4: 
The claim 4 encompasses the same scope of invention as that of the claim 3 except additional claim limitation that after the step of performing detection of key points on each hand image in at least three images respectively, the calculation method for the finger bone length further comprises: screening the detected key points by using the association relation among the at least three images in the same frame of image, wherein the steps of reconstructing the three-dimensional representation of the hand by using the detected key points comprises: reconstructing the three-dimensional representation of the hand by using the screened key points.
Guleryuz further teaches the claim limitation that after the step of performing detection of key points on each hand image in at least three images respectively, the calculation method for the finger bone length further comprises: screening the detected key points by using the association relation among the at least three images in the same frame of image (Guleryuz implicitly teaches the claim limitation. Guleryuz teaches at Paragraph 0034 that one or more of the key points that are derived from the LUT 600 for one 3D pose are the same or similar to one or more of the key points that are derived from the LUT 600 for another 3D pose and a confidence score is derived for the dissimilar poses that can result from the same set of projected 2D coordinates. Guleryuz teaches at FIG. 7 that relationship of the 2D coordinates of (circles 1, 2, 3, 4 and 5 in FIG. 6) that define the position of the skeleton model 705/7110/715/720/725 of the finger in the finger pose plane. 
Guleryuz teaches at Paragraph 0047 that the images captured by the camera are projected onto the image plane 1005 and at Paragraph 0038 that the processor identifies key-points in the 2D images of the hand and at Paragraph 0039 that the processor determines lengths of phalanxes in the fingers and thumb of the hand based on the key points of the 2D images of the hand. 
Guleryuz teaches at Paragraph 0043 that the processor learns orientations of the palm triangle and the thumb triangle), wherein the steps of reconstructing the three-dimensional representation of the hand by using the detected key points comprises: reconstructing the three-dimensional representation of the hand by using the screened key points (Guleryuz teaches at Paragraph 0048-0049 that a 3D skeleton model of the hand is lifted from the 2D image on the basis of the noisy key-point 1015 extracted from the 2D image. The 3D skeleton model of the hand is used to determine a 3D key-point 1020 that corresponds to the same location in the hand as the key-point 1015. Guleryuz teaches at Paragraph 0049 that a modified 3D key-point 1030 is therefore determined by projecting the skeleton-compliant key-point 1020 onto the line 1025. 
Guleryuz teaches at Paragraph 0052 that the processor identifies a first set of 3D key-points that are compliant with the 3D skeleton model of the hand and at Paragraph 0053 that the processor identifies second 3D key-points based on the first 3D key-points and a vanishing point associated with the image…the vanishing point is determined based on characteristics of a camera that acquired the 2D image….the second set of 3D key-points includes the camera-compliant key-point 1030. 
Guleryuz teaches at Paragraph 0042 that the processor can compare lengths of the phalanxes of the fingers and thumb in the skeleton model to lengths of the corresponding phalanxes in the 2D image to account for perspective projection and de-project the 2D image of the hand). 


The claim 5 encompasses the same scope of invention as that of the claim 4 except additional claim limitation that the at least three images are taken by a camera set, and the step of reconstructing the three-dimensional representation of the hand by using the screened key points further comprises: reconstructing the three-dimensional representation of the hand by using the screened key points, in combination with the internal and external parameters of the camera set.
Guleryuz further teaches the claim limitation that the at least three images are taken by a camera set, and the step of reconstructing the three-dimensional representation of the hand by using the screened key points further comprises: reconstructing the three-dimensional representation of the hand by using the screened key points, in combination with the internal and external parameters of the camera set (Guleryuz teaches at Paragraph 0048-0049 that a 3D skeleton model of the hand is lifted from the 2D image on the basis of the noisy key-point 1015 extracted from the 2D image. The 3D skeleton model of the hand is used to determine a 3D key-point 1020 that corresponds to the same location in the hand as the key-point 1015. Guleryuz teaches at Paragraph 0049 that a modified 3D key-point 1030 is therefore determined by projecting the skeleton-compliant key-point 1020 onto the line 1025. 
Guleryuz teaches at Paragraph 0052 that the processor identifies a first set of 3D key-points that are compliant with the 3D skeleton model of the hand and at Paragraph 0053 that the processor identifies second 3D key-points based on the first 3D key-points and a vanishing point associated with the image…the vanishing point is determined based on characteristics of a camera that acquired the 2D image….the second set of 3D key-points includes the camera-compliant key-point 1030. 
Guleryuz teaches at Paragraph 0042 that the processor can compare lengths of the phalanxes of the fingers and thumb in the skeleton model to lengths of the corresponding phalanxes in the 2D image to account for perspective projection and de-project the 2D image of the hand). 
Re Claim 6: 
The claim 6 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that acquiring positions of a plurality of three-dimensional key points in the three-dimensional representation of the hand, the three-dimensional representation of the hand being reconstructed according to at least two two-dimensional images of the hand; generating an auxiliary geometric structure associated with each key point according to the category of each key point in the plurality of three-dimensional key points; generating for each auxiliary geometric structure a set of auxiliary points on the surface of each auxiliary geometric structure; projecting the auxiliary points onto the at least two two-dimensional images; and acquiring edge nodes at the topmost, bottommost, leftmost and rightmost in the projection of the auxiliary points on the at least two two-dimensional images, and generating the gesture bounding box based on the four nodes.
Guleryuz further teaches the claim limitation that acquiring positions of a plurality of three-dimensional key points in the three-dimensional representation of the hand, the three-dimensional representation of the hand being reconstructed according to at least two two-dimensional images of the hand (Guleryuz teaches at Paragraph 0048-0049 that a 3D skeleton model of the hand is lifted from the 2D image on the basis of the noisy key-point 1015 extracted from the 2D image. The 3D skeleton model of the hand is used to determine a 3D key-point 1020 that corresponds to the same location in the hand as the key-point 1015. Guleryuz teaches at Paragraph 0049 that a modified 3D key-point 1030 is therefore determined by projecting the skeleton-compliant key-point 1020 onto the line 1025); generating an auxiliary geometric structure associated with each key point according to the category of each key point in the plurality of three-dimensional key points (Guleryuz teaches at Paragraph 0015 lengths of the phalanxes of the fingers and thumb are determined from a set of training images of the hand. The finger pose lookup tables are generated based on the lengths and anatomical constraints on ranges of motion of the joints that connect the phalanxes. The palm of the hand is represented as a palm triangle and a thumb triangle, which are defined by corresponding sets of vertices and parameters that define the palm triangle and the thumb triangle are also determined from the set of training images. Guleryuz teaches at Paragraph 0016 that a 3D pose of the fingers is then determined by rotating the 2D coordinates based on the orientation of the palm triangle. 
Guleryuz teaches at Paragraph 0042 that the processor can compare lengths of the phalanxes of the fingers and thumb in the skeleton model to lengths of the corresponding phalanxes in the 2D image to account for perspective projection and de-project the 2D image of the hand); generating for each auxiliary geometric structure a set of auxiliary points on the surface of each auxiliary geometric structure (Guleryuz teaches at Paragraph 0015 lengths of the phalanxes of the fingers and thumb are determined from a set of training images of the hand. The finger pose lookup tables are generated based on the lengths and anatomical constraints on ranges of motion of the joints that connect the phalanxes. The palm of the hand is represented as a palm triangle and a thumb triangle, which are defined by corresponding sets of vertices and parameters that define the palm triangle and the thumb triangle are also determined from the set of training images. Guleryuz teaches at Paragraph 0016 that a 3D pose of the fingers is then determined by rotating the 2D coordinates based on the orientation of the palm triangle); Guleryuz teaches at Paragraph 0016 that the 3D locations of the key points (including the set of vertices associated with the palm triangle and the thumb triangle) indicated by the skeleton model are modified based on projections of the 3D locations of the key points into an image plane along a line connecting the original 2D key points to a vanishing point associated with the 2D image); and acquiring edge nodes at the topmost, bottommost, leftmost and rightmost in the projection of the auxiliary points on the at least two two-dimensional images, and generating the gesture bounding box based on the four nodes (Guleryuz teaches at FIG. 1 that the thumb triangle and the palm triangle forms a bounding box based on the four nodes). 
Re Claim 7: 
The claim 7 encompasses the same scope of invention as that of the claim 6 except additional claim limitation that the step of projecting the auxiliary points onto the at least two two-dimensional images further comprises: calculating projection positions of the auxiliary points on the at least two two-dimensional images in combination with the internal and external parameters of the camera set, wherein the at least two images are taken by the camera set.
Guleryuz further teaches the claim limitation that the step of projecting the auxiliary points onto the at least two two-dimensional images further comprises: calculating projection positions of the auxiliary points on the at least two two-dimensional images in combination with the internal and external parameters of the camera set, wherein the at least two images are taken by the camera set (Guleryuz teaches at Paragraph 0016 that the 3D locations of the key points (including the set of vertices associated with the palm triangle and the thumb triangle) indicated by the skeleton model are modified based on projections of the 3D locations of the key points into an image plane along a line connecting the original 2D key points to a vanishing point associated with the 2D image. Guleryuz teaches at Paragraph 0048-0049 that a 3D skeleton model of the hand is lifted from the 2D image on the basis of the noisy key-point 1015 extracted from the 2D image. The 3D skeleton model of the hand is used to determine a 3D key-point 1020 that corresponds to the same location in the hand as the key-point 1015. Guleryuz teaches at Paragraph 0049 that a modified 3D key-point 1030 is therefore determined by projecting the skeleton-compliant key-point 1020 onto the line 1025. 
Guleryuz teaches at Paragraph 0052 that the processor identifies a first set of 3D key-points that are compliant with the 3D skeleton model of the hand and at Paragraph 0053 that the processor identifies second 3D key-points based on the first 3D key-points and a vanishing point associated with the image…the vanishing point is determined based on characteristics of a camera that acquired the 2D image….the second set of 3D key-points includes the camera-compliant key-point 1030. 
Guleryuz teaches at Paragraph 0042 that the processor can compare lengths of the phalanxes of the fingers and thumb in the skeleton model to lengths of the corresponding phalanxes in the 2D image to account for perspective projection and de-project the 2D image of the hand). 
Re Claim 8: 
The claim 8 further recites a computer program product that automatically generates labeled data of a hand, the product for causing one or more processors to execute the method according to claim 1. 
The claim 8 is in parallel with the claim 1 in the form of a computer program product claim. The claim 8 is subject to the same rationale of rejection as the claim 1. Additionally, Guleryuz further teaches the claim limitation of a computer program product that automatically 

Re Claim 9: 
The claim 9 is in parallel with the claim 1 in the form of an apparatus claim. The claim 9 is subject to the same rationale of rejection as the claim 1. 

The claim 9 further recites the claim limitation of a device for automatically generating labeled data of a hand, comprising: 
an acquisition device for acquiring at least three images to be processed under different angles of view for a hand; 
a detection device for detecting key points on the at least three images to be processed respectively; 
a screening device for screening the detected key points by using an association relation among the at least three images to be processed, the association relation being that the at least three images to be processed are from the same frame of image of the hand under different angles of view; 
a reconstruction device for reconstructing a three-dimensional space representation of the hand with regard to the key points screened on the same frame of image, in combination with a given finger bone length; 
a projection device for projecting the key points on the three-dimensional representation of the hand onto the at least three images to be processed; and 
a labeling device for generating the labeled data of the hand on the images to be processed by using the projected key points on the at least three images to be processed.
However, Guleryuz further teaches the claim limitation that a device for automatically generating labeled data of a hand (e.g., FIG. 2), comprising: 
an acquisition device (e.g., camera 215 of FIG. 2 for performing the operation 805 of FIG. 8); 
a detection device (e.g., processor 225 of FIG. 2 for performing the operation 810 of FIG. 8); 
a screening device (e.g., processor 225 of FIG. 2 for performing the operations of FIGS. 8-9 and FIG. 11); 
a reconstruction device for reconstructing a three-dimensional space representation of the hand with regard to the key points screened on the same frame of image, in combination with a given finger bone length (e.g., the processor 225 of FIG. 2 for performing the operation 1105 of FIG. 11 and the operations 815 and 820 of FIG. 8); 
a projection device (e.g., the processor 225 of FIG. 2 for performing the operation 905 and the operations of FIG. 10 in relation to the block 1110 and block 115 of FIG. 11 and Paragraph 0052-0053); and 
a labeling device (e.g., the processor 225 of FIG. 2 performing the operation of block 1125 of FIG. 1 that an updated 3D skeleton model is generated based on the modified values of the noisy key points).

Re Claim 10: 

The claim 10 is in parallel with the claim 2 in the form of an apparatus claim. The claim 10 is subject to the same rationale of rejection as the claim 2. 
Re Claim 11: 
The claim 11 encompasses the same scope of invention as that of the claim 9 except additional claim limitation that a recognition device and a calculation device, wherein, the acquisition device further for acquiring at least two images of the hand in a frame of image under different angles of view; the recognition device for performing gesture recognition for the hand by using the at least two images under different angles of view; the detection device further for performing detection of key points on each hand image in at least two images respectively in the case that the recognized gesture is a predefined simple gesture; the reconstruction device further for reconstructing a three-dimensional representation of the hand by using the detected key points; and the calculation device for calculating the finger bone length of the hand according to the three-dimensional key points on the reconstructed three-dimensional representation of the hand.
The claim 11 is in parallel with the claim 3 in the form of an apparatus claim. The claim 11 is subject to the same rationale of rejection as the claim 3. 
Re Claim 12: 
The claim 12 encompasses the same scope of invention as that of the claim 11 except additional claim limitation that the at least two images are at least three images, and the device 
The claim 12 is in parallel with the claim 4 in the form of an apparatus claim. The claim 12 is subject to the same rationale of rejection as the claim 4. 
Re Claim 13: 
The claim 13 encompasses the same scope of invention as that of the claim 9 except additional claim limitation that an auxiliary geometric structure generation device, an auxiliary point generation device and a bounding box generation device, wherein, the acquisition device is further configured for acquiring positions of a plurality of three-dimensional key points in the three-dimensional representation of the hand, the three-dimensional representation of the hand is reconstructed according to at least two two-dimensional images of the hand; the auxiliary geometric structure generation device is configured for generating an auxiliary geometric structure associated with each key point according to the category of each key point in the plurality of three-dimensional key points; the auxiliary point generation device is configured for generating for each auxiliary geometric structure a set of auxiliary points on the surface of each auxiliary geometric structure; the projection device is further configured for projecting the auxiliary points onto the at least two two-dimensional images; and the bounding box generation device is configured for acquiring edge nodes at the topmost, bottommost, leftmost and rightmost in the projection of the auxiliary points on the at least two two-dimensional images, and generating the gesture bounding box based on the four nodes. 

Re Claim 14: 
The claim 14 is in parallel with the claim 1 in the form of an apparatus claim. The claim 14 is subject to the same rationale of rejection as the claim 1. 
The claim 14 recites a system for automatically generating labeled data of a hand, comprising: 
an image capture system comprising a camera set configured to acquire at least three images to be processed for the hand under different angles of view; and a labeling device configured to carry out the following operations: 
detecting key points on the at least three images to be processed respectively;
screening the detected key points by using an association relation among the at least three images to be processed, the association relation being that the at least three images to be processed are from the same frame of image of the hand under different angles of view;
reconstructing a three-dimensional space representation of the hand with regard to the key points screened on the same frame of image, in combination with a given finger bone length; 
projecting the key points on the three-dimensional representation of the hand onto the at least three images to be processed; and 
generating the labeled data of the hand on the images to be processed by using the projected key points on the at least three images to be processed.
However, Guleryuz further teaches the claim limitation that a system for automatically generating labeled data of a hand (e.g., FIG. 2), comprising: 
a system for automatically generating labeled data of a hand, comprising: 
an image capture system (e.g., camera 215 of FIG. 2 for performing the operation 805 of FIG. 8); and a labeling device (e.g., the processor 225 of FIG. 2 performing the operation of block 1125 of FIG. 1 that an updated 3D skeleton model is generated based on the modified values of the noisy key points).

Re Claim 15: 
The claim 15 encompasses the same scope of invention as that of the claim 14 except additional claim limitation that the operation of reconstructing the three-dimensional representation of the hand further comprises: reconstructing the three-dimensional space representation of the hand with regard to the screened key points on the same frame of image, in combination with the given finger bone length, and the camera internal and external parameters of the camera set. 
The claim 15 is in parallel with the claim 5 in the form of an apparatus claim. The claim 15 is subject to the same rationale of rejection as the claim 5. 
Re Claim 16: 
The claim 16 encompasses the same scope of invention as that of the claim 14 except additional claim limitation that a calculation device for the finger bone length, wherein, the image capture system is further configured to acquire at least two images of the hand in a frame of image under different angles of view; and the calculation device for the finger bone length is configured to carry out the following operations: performing gesture recognition for the hand by using the at least two images under different angles of view; performing detection of key points on each hand image in at least two images respectively in the case that the recognized gesture is a predefined simple gesture; reconstructing a three-dimensional representation of the hand by 
The claim 16 is in parallel with the claim 3 in the form of an apparatus claim. The claim 16 is subject to the same rationale of rejection as the claim 3. 
Re Claim 17: 
The claim 17 encompasses the same scope of invention as that of the claim 16 except additional claim limitation that the at least two images are at least three images, and the calculation device for the finger bone length is further configured to carry out the following operation after the step of performing detection of key points on each hand image in at least three images respectively: screening the detected key points by using an association relation between the at least two images in the same frame of image, wherein the step of reconstructing the three-dimensional representation of the hand by using the detected key points comprises: reconstructing the three-dimensional representation of the hand by using the screened key points. 
The claim 17 is in parallel with the claim 4 in the form of an apparatus claim. The claim 17 is subject to the same rationale of rejection as the claim 4. 
Re Claim 18: 
The claim 18 encompasses the same scope of invention as that of the claim 14 except additional claim limitation that a calculation device for a gesture bounding box, wherein, the image capture system is further configured to acquire at least two images of the hand in a frame of image under different angles of view; and the calculation device for the gesture bounding box is configured to carry out the following operations: acquiring positions of a plurality of three-dimensional key points in the three-dimensional representation of the hand, the three-dimensional representation of the hand is reconstructed according to at least two two-
The claim 18 is in parallel with the claim 6 in the form of an apparatus claim. The claim 18 is subject to the same rationale of rejection as the claim 6. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIN CHENG WANG whose telephone number is (571)272-7665.  The examiner can normally be reached on Mon-Fri 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on 571-272-7761.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications 






/JIN CHENG WANG/Primary Examiner, Art Unit 2613