DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claims 1, 8 and 15 recite limitations – “generating a second set of joint location coordinates using the first set of joint location coordinates and identifying a three-dimensional hand pose based on the plurality of sets of joint location coordinates”, it appears that the second set of joint location coordinates are generated using the first set of joint location coordinates. However, it is not clear if the second set of joint location coordinates are created using the first set of joint location coordinates which are transformed from one system of coordinates to another or if the second set of joint 
Further, it determines hand pose based on plurality of sets of joint location coordinates, however, it is not clear if the first set, second set or both of the sets are utilized to determine pose of body. Therefore, Examiner suggests amending claims in order to explicitly define the features discussed above. 

Claims 6, 13 and 19 also rejected similarly because the claims recite limitations that appear to convert third set of joint locations to second set of joint locations, however, it is not clear as to if second set is already generated, third set that is generated later, could be converted to something that is already generated. Therefore, claims 6, 13 and 19 are rejected similarly. 

Dependent claims do not remedy the deficiencies introduced by the independent claims. Therefore, dependent claims are also rejected similarly. 

Examiner suggest to amend claims to further clarify the explicitly definitions of the discussed features above in order to render the claims definite. 

Lack of Antecedent basis: Claim 8 recite limitations – “A system comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to perform operations comprising…”, claim 8 recite “the apparatus”, however, there is no apparatus recited previously. Therefore, there is lack 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-6, 8-10, 12-13, 15-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US Pub No. 20180047175 A1) in view of Tanabiki et al. (US Pub No. 20130230211 A1). 

Regarding Claim 1,
		Wang discloses A method comprising: receiving, from a camera, a plurality of images of a hand; (Wang, [0005], discloses detecting, by a 3D depth sensor, actions of an operator to obtain data frames, converting the data frames into an image, segmenting an object similar to a human body in the image and a background environment, and obtaining depth-of-field data; extracting human skeleton information, recognizing different portions of the human body, and establishing a 3D coordinate of joints of the human body; recognizing rotation information of skeleton joints of the two hands of the human body, recognizing, by capturing changes of angles of the different skeleton joints, which hand of the human body is triggered; analyzing different action plurality of hand frame are obtained)

generating a plurality of sets of joint location coordinates by: for each given image in the plurality of images: (Wang, [0005], discloses detecting, by a 3D depth sensor, actions of an operator to obtain data frames, converting the data frames into an image, segmenting an object similar to a human body in the image and a background environment, and obtaining depth-of-field data; extracting human skeleton information, recognizing different portions of the human body, and establishing a 3D coordinate of joints of the human body; recognizing rotation information of skeleton joints of the two hands of the human body, recognizing, by capturing changes of angles of the different skeleton joints, which hand of the human body is triggered; analyzing different action features of the operator, using corresponding characters as control instructions, which are sent to a robot of a lower computer; coordinates of image are generated of body parts including hand, palm, wrist, shoulders, arm or leg)  

cropping, using one or more processors, a portion of the given image comprising the hand; (Wang, [0005], discloses detecting, by a 3D depth sensor, actions of an operator to obtain data frames, converting the data frames into an image, segmenting an object similar to a human body in the image and a background environment, and obtaining depth-of-field data; extracting human skeleton information, recognizing different portions of the human body, and establishing a 3D coordinate of joints of the human body; recognizing rotation information of skeleton joints of the two hands of the human body, coordinates of segmented (cropped) image are generated of body parts including hand, palm, wrist, shoulders, arm or leg) 

identifying, a first set of joint location coordinates in the cropped portion of the given image; (Wang, [0005], discloses detecting, by a 3D depth sensor, actions of an operator to obtain data frames, converting the data frames into an image, segmenting an object similar to a human body in the image and a background environment, and obtaining depth-of-field data; extracting human skeleton information, recognizing different portions of the human body, and establishing a 3D coordinate of joints of the human body; recognizing rotation information of skeleton joints of the two hands of the human body, recognizing, by capturing changes of angles of the different skeleton joints, which hand of the human body is triggered; analyzing different action features of the operator, using corresponding characters as control instructions, which are sent to a robot of a lower computer; coordinates of segmented (cropped) image are generated of body parts including hand, palm, wrist, shoulders, arm or leg)) 

identifying a three-dimensional hand pose based on the plurality of sets of joint location coordinates. (Wang, [0005], discloses detecting, by a 3D depth sensor, actions of an operator to obtain data frames, converting the data frames into an image, segmenting an object similar to a human body in the image and a background environment, and three dimensional body or hand movement or pose is determined based on joint location coordinates determined from frames)

Wang does not explicitly disclose generating a second set of joint location coordinates using the first set of joint location coordinates; and neural network

		Tanabiki discloses generating a second set of joint location coordinates using the first set of joint location coordinates; (Tanabiki,  [0095-0096], Fig. 11, discloses skeleton information from skeleton estimating section 251.  As shown in FIG. 11, skeleton information contains different pieces of information on the joint positions of the neck, right shoulder, left shoulder, right elbow, left elbow, right wrist, left wrist, hip, right hip, left hip, right knee, left knee, right heel, and left heel of the target person in the coordinate system of target coordinates Fig. 12 is a conceptual diagram in which dots and straight lines are superimposed on a target image, each dot representing a joint position indicated by skeleton information from skeleton estimating section 251, each straight line representing a bone joining the joint positions.  Note that the conceptual diagram of FIG. 12 is optional output of pose estimating apparatus 200 shown to joint position of one part (cropped portion) is estimated, then joint position of other parts of body are also estimated, so converting coordinates from one set of joints to other would estimate position of other joint positions in body image; second set of joint positions are determined based on first initial estimated joint location coordinates since the frame of body parts are references to one set of joint locations to others).

neural network (Tanabiki, [0133], [0165], discloses detailed algorithm of the detectors and a method for making the detectors may employ known techniques.  For example, the detectors are made by learning the tendency of the features by a boosting method such as Real AdaBoost using many sample images (an image showing a head and a shoulder and an image not showing a head or shoulder for the head and shoulder detector).  The features may be of any type such as features of histogram of gradient (HoG), Sparse, or Haar.  The learning method may be of any type, such as support vector machine (SVM) or neural network, other than a boosting method; for a wrist, skeleton estimating section 251 extracts an intersection point of a side of the right forearm in the horizontal scale and a straight line passing through the center of gravity of the right forearm contained in part region information from part extracting section 240 and extending in parallel to a side of the right forearm in the vertical scale.  There are two intersection points.  Skeleton estimating section 251 detects a hand region and determines the intersection point closer to the hand region as the position of the right wrist of the target person.  For detection of the hand region, 
skeleton estimating section 251 may use a skin color extraction using the HSV 

position of the left wrist in a similar manner, neural network is disclosed to learning algorithm for classifying hand and other pose information)

		Both Wang and Tanabiki are directed to determining pose of body or body part such as hand. Therefore, it would have been obvious to one of the ordinary skills in the art to modify the Wang that discloses segmenting body part such as hand from image data (cropping) and identifying position of joint coordinates and further determining hand pose using the identified coordinates with use of neural network algorithms for classifying identified hand pose of Tanabiki as it is also one of the well known object classifying techniques in image processing for improved classification of hand gestures (see Tanabiki, Abstract).

Regarding Claim 2, 
		Furthermore, the combination of Wang and Tanabiki further discloses wherein the plurality of images comprises a plurality of views of the hand.  (Tanabiki, [0073] Individual embodiments will be described below. in particular, relate to a pose estimating apparatus and pose estimating method for estimating the pose of a person in an image every frame.  Embodiment 3 relates to a pose estimating apparatus and pose estimating method using a plurality of frames in order to improve the accuracy of estimation of the pose of the person while reducing the calculation load required for the 
plurality of views of hand are obtained to improve pose estimation). Additionally, the rational and motivation to combine the references Wang and Tanabiki as applied in claim 1 apply to this claim. 

Regarding Claim 3,
Furthermore, the combination of Wang and Tanabiki further discloses 
prompting a user of a client device to initialize a hand position; (Tanabiki, [0073], discloses a pose estimating apparatus and pose estimating method for estimating the pose of a person in an image every frame.  Embodiment 3 relates to a pose estimating apparatus and pose estimating method using a plurality of frames in order to improve the accuracy of estimation of the pose of the person while reducing the calculation load required for the pose estimation; initial position of pose to be estimated is captured in image form)

receiving the initialized hand position; (Tanabiki, [0073], discloses a pose estimating apparatus and pose estimating method for estimating the pose of a person in an image every frame.  Embodiment 3 relates to a pose estimating apparatus and pose estimating method using a plurality of frames in order to improve the accuracy of estimation of the pose of the person while reducing the calculation load required for the pose estimation; initial position of pose to be estimated is captured in image form) and 

tracking the hand based on the initialized hand position. (Tanabiki, [0073], discloses a pose estimating apparatus and pose estimating method for estimating the pose of a position change of pose is tracked from initial position of any body organ including hand, shoulders, arms etc.). Additionally, the rational and motivation to combine the references Wang and Tanabiki as applied in claim 1 apply to this claim. 
  

Regarding Claim 5, 
		Furthermore, the combination of Wang and Tanabiki further discloses wherein the first set of joint location coordinates is measured based on pixel location.  (Wang, [0025], discloses data acquisition unit is configured to use a depth camera Kinect to acquire depth data of a scene and establish a three-dimensional coordinate system, where a Z coordinate denotes a depth value of each pixel point.  The data acquisition unit is configured as an input part of the present invention to use a depth camera capable of capturing depth information of a scene to capture a video sequence which carries depth information, including time of flight, structured light, three-dimensional image, and the like.  The depth information may include a depth value of each pixel point of a human body.  A 3D spatial region may be reconstructed over a range by using a depth image.  Even if there is a blocked part between two human bodies, a distance difference is generated in a depth image because of a longitudinal relationship of the human bodies, that is, grayscale values are layered.  Therefore, a threshold may be used to segment a blocked human body or different blocked parts of a same human coordinates at pixel points are obtaine of skeletol joints). Additionally, the rational and motivation to combine the references Wang and Tanabiki as applied in claim 1 apply to this claim. 

Regarding Claim 6,
Furthermore, the combination of Wang and Tanabiki further discloses 
converting the first set of joint location coordinates to a third set of joint location coordinates, wherein the third set of joint location coordinates is measured relative to an uncropped version of the given image; (Tanabiki, [0094], discloses skeleton estimating section 251 estimates the joint position of the target person, based on the part region information and likelihood maps from part extracting section 240.  A method for estimating a joint position will be described in detail later.  Skeleton estimating section 
251 outputs skeleton information, i.e., information on the estimated positions of the joints to skeleton model description converting section 252.  If the joint position can be determined, the position of the bone joining the joints can also be determined.  Estimation of the joint position is the same as the estimation of the skeleton model; joint position of one part (cropped portion) is estimated, then joint position of other parts of body are also estimated)

and 
converting the third set of joint location coordinates to the second set of joint location coordinates. (Tanabiki,  [0095-0096], Fig. 11, discloses skeleton information from 
shoulder, right elbow, left elbow, right wrist, left wrist, hip, right hip, left hip, right knee, left knee, right heel, and left heel of the target person in the coordinate system of target coordinates Fig. 12 is a conceptual diagram in which dots and straight lines are 
superimposed on a target image, each dot representing a joint position indicated by skeleton information from skeleton estimating section 251, each straight line representing a bone joining the joint positions.  Note that the conceptual diagram of FIG. 12 is optional output of pose estimating apparatus 200 shown to provide a supplementary explanation; joint position of one part (cropped portion) is estimated, then joint position of other parts of body are also estimated, so converting coordinates from one set of joints to other would estimate position of other joint positions in body image). Additionally, the rational and motivation to combine the references Wang and Tanabiki as applied in claim 1 apply to this claim. 
Claims 8-10 and 12-13 recite system with elements corresponding to the method steps recited in Claims 1-3 and 5-6 respectively. Therefore, the recited elements of the system Claims 8-10 and 12-13 mapped to the proposed combination in the same manner as the corresponding steps of Claims 1-3 and 5-6 respectively. Additionally, the rationale and motivation to combine the Wang and Tanabiki presented in rejection of Claims 1-3 and 5-6, apply to these claims.

Claim 15-17 and 20 recite computer readable storage medium with program instructions corresponding to the method steps recited in Claims 1-3 and 6. Therefore, 

		Furthermore, the combination of Wang and Tanabiki further discloses system comprising processor, memory storing instructions (Tanabiki, [0070-0071], discloses pose estimating apparatus 110 is, for example, a computer system (a personal computer, workstation or the like) with a communication function.  This computer system, not shown, primarily includes an input device, a computer main frame, an output device, and a communication device; the input device is, for example, a keyboard or mouse.  The output device is, for example, a display or printer.  The communication device is, for example, a communication interface connectable to an IP network.  The computer main frame is mainly composed, for example, of a central processing unit (CPU) and a memory device.  The CPU has a control function and a calculation function.  The memory device includes, for example, a read only memory (ROM) storing programs or data and a random access memory (RAM) temporarily storing data.  The ROM may be a flash memory which is electrically rewritable)

Furthermore, the combination of Wang and Tanabiki further discloses A non-transitory computer-readable storage medium, the computer-readable storage medium including instructions (Tanabiki, [0070-0071], discloses pose estimating apparatus 110 


Regarding Claim 18, 
		The combination of Wang and Tanabiki further discloses wherein the second set of joint location coordinates is measured use millimeters. (Wang, [0072], Fig 2, discloses diagram of a depth value of each pixel point of a depth image of a scene, acquired by a depth camera.  The depth image includes a two-dimensional (2D) pixel region of a captured scene.  Each pixel in the 2D pixel region may represent a depth value, for example, a length or a distance in centimeter, millimeter or the like which is from an object in a captured scene to a capturing device; pixel location coordinates are measured in millimeters). Additionally, the rational and motivation to combine the references Wang and Tanabiki as applied in claim 1 apply to this claim. 

Claims 4, 7, 11, 14 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wang as modified by Tanabiki, and further in view of Wetzler et al.( US 20170371403 A1). The teachings of Wang and Tanabiki have been discussed previously. 

Regarding Claim 4, 
The combination of Wang and Tanabiki further discloses wherein the camera is a stereo camera.  
		Wetzler disclose wherein the camera is a stereo camera.  (Wetzler, [0106], discloses the system and method described above may be applied for recording egocentric motion in a virtual reality or augmented reality scenario.  In particular, these systems and methods may be used for training such a virtual and/or augmented reality system.  The true motion of a trainer's hands may be recorded from a first-person point-of-view by attaching the camera to the trainer's head such that the camera faces the trainer's hands.  Using a depth generating camera system, such as a structured light scanner or stereo camera reconstruction setup to obtain such automatically labeled data for use by a machine learning system may provide a powerful tool for use by Virtual Reality (VR) and AR systems where markerless detection and tracking of a user's hands may be critical for a useful user experience; stereo camera is used for capturing hand images). 




Regarding Claim 7,
Furthermore, the combination of Wang and Tanabiki does not explicitly disclose  generating a synthetic training dataset comprising stereo image pairs of virtual hands and corresponding ground truth labels, wherein the corresponding ground truth labels comprise joint locations.
		Wetzler discloses generating a synthetic training dataset comprising stereo image pairs of virtual hands and corresponding ground truth labels, wherein the corresponding ground truth labels comprise joint locations. (Wetzler, [0006], [0013], [0080], [0106], discloses techniques to acquire training data use visual markers such 
as painted gloves, stickers, reflectors or LEDs attached to different parts of the hand or a hand glove for observation by one or more cameras.  The positions and orientations of the hand and fingers may then extracted from this visually tracked data.  However, visually captured data suffers from occluded markers due to the dimensions and articulation possibilities of the human hand, resulting in incomplete data.  Additionally training data of user hand coordinates are obtained and used for training for recognizing hand pose and gestures). 


Claims 11 and 14 recite system with elements corresponding to the method steps recited in Claims 4 and 7 respectively. Therefore, the recited elements of the 

Claim 19 recite computer readable storage medium with program instructions corresponding to the method steps recited in Claim 7. Therefore, the recited program instructions of the computer readable storage medium Claim 19 are mapped to the proposed combination in the same manner as the corresponding steps of Claim 7. Additionally, the rationale and motivation to combine the Wang, Tanabiki and Wetzler presented in rejection of Claim 7, apply to these claims.


 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 20210174519 A1
US 20180338710 A1
US 20180024641 A1
US 20180260039 A1

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PINALBEN V PATEL whose telephone number is (571)270-5872. The examiner can normally be reached M-F: 10am - 8pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on (571)272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Pinalben Patel/Examiner, Art Unit 2661