DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in reply to the RCE filed on 10/11/2022.
The amendment filed 9/21/2022 has been entered.
Claims 1-2 and 15-17 are amended. Claims 3-6 and 19-22 are canceled.
Claims 1-2, 7-18, and 23-28 are currently pending and have been examined. 
This action is made NON-FINAL.

Response to Arguments
Regarding the applicant’s argument on page 16 with respect to the technique of Fitzgibbon not working for a human hand without the prior knowledge of the true lengths of the individual elements is unpersuasive. See at least the cited portions [0036] and [0042-0043] of Fitzgibbon where the technique works for a human hand. The examiner relies on Iqbal to teach the usage of previously determined true lengths of a plurality of segments of the digits of a human hand.
Regarding the applicant’s argument on page 16 with respect to Fitzgibbon and Iqbal not teaching the applicant's two-step process is unpersuasive. Upon further consideration of the references in view of the applicant’s argument, the examiner has determined that Iqbal teaches the two-step process in full and therefore the Fitzgibbon reference is unnecessary (the Fitzgibbon reference is no longer cited in the 103 rejections). The examiner agrees that Fitzgibbon performs a P-n-P calculation without known finger segment lengths. The examiner understands a P-n-P calculation as a calculation for finding the relative pose between an object (hand) and a camera from a set 3D points and their corresponding 2D projections on the focal plane. Iqbal teaches this sort of calculation without using the actual term P-n-P and also involves known finger segment lengths (See at least fig. 1C

    PNG
    media_image1.png
    598
    616
    media_image1.png
    Greyscale
; 
Also see the rejections provided below.).
Regarding the applicant’s argument on page 17 with respect to Itkowitz using fiducial markers physically attached to the hand being different than the claimed use of a fiducial marker grid is unpersuasive. Luo is relied upon for the grid of fiducial markers. The fiducial markers physically attached to the hand in Itkowitz are irrelevant. The examiner cited Col. 16, lines 4-15 in Itkowitz using fiducial markers on a surgeon console for calibration purposes to more accurately locate key points on a hand. The tracking coordinate system is equivalent to a marker coordinate system as it relies upon the position and orientation of fiducial markers on the console based on the calibration process. Therefore, Itkowitz does teach computing transformations from a marker coordinate system to a screen coordinate system and using the transformations to compute coordinates of the key points in the marker coordinate system.
	Upon further consideration of the Yusuke reference, the examiner has determined that Yusuke teaches the feature analyzing camera images of a new workpiece to determine an initial position and orientation of the new workpiece. Therefore, the Sager reference is unnecessary in the prior art rejections of claims 1, 2, 7-10, 12-14, 17, 18, 23-25, 27, and 28.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 2, 7, 12, 13, 14, 17, 18, 27, and 28 is/are rejected under 35 U.S.C. 103 as being unpatentable over Boca (US 20150314442 A1) in view of Yusuke (translated JP2015044257A), Kofman (IDS: “Teleoperation of a robot manipulator using a vision-based human-robot interface”), JETTÉ (US 20190210217 A1), Luo (US 20150100910 A1), Itkowitz (US 9901402 B2), and Iqbal (US 10929654 B2).

Regarding Claim 1, 
Boca teaches
A method for programming a robot to perform an operation by human demonstration, said method comprising (“There is described below the use of hand gestures to teach a path to be followed by the industrial robot 12 in performing work on workpiece 14.” [0021]):
demonstrating the operation on a workpiece by a human hand (“the instructions to the robot 12 that will be assembled from the hand gestures from the one or two hands seen by the camera and as described herein the object being pointed to, that is the scene data to create the path and instructions to be followed by the robot 12 when the robot performs work on the workpiece 14. For example, one hand is used to teach a robot target and the other hand is used to generate a grab or drop instruction. It is up to the robot operator to associate a particular hand gesture with a particular instruction.” [0023]; Examiner Interpretation: Despite the human hand not physically touching the workpiece in the demonstration, the operation to be performed on the workpiece is demonstrated by the human hand.);
analyzing camera images of the hand demonstrating the operation on the workpiece, by a computer (“the image of the location pointing hand gesture of step 304 and the associated location on the object are captured by the camera 11 and sent to the computation device 13 for processing. At step 308, the computation device 13 calculates from the image the corresponding location and orientation of the robot tool in the robot scene.” See at least [0034]),
a move step where hand pose and workpiece pose are determined at a plurality of points defining a move path (“At step 310 the calculated location and orientation of the robot tool are sent to the computation device. Query 312 asks if more location points are needed to complete the robot path. Query 312 can be another gesture. If the answer is yes, the method 300 asks at query 314 if there is a need to reposition the camera. If the answer to query 314 is no, then the method 300 returns to step 304 where the operator makes the hand gesture associated with the next location point. While not shown in FIG. 3, if the answer to query 314 is yes, then the method 300 returns to step 302 where the camera is repositioned. If the answer to query 312 is no, then method 300 ends since no more robot path points have to be acquired.” [0039]; Examiner Interpretation: Hand pose also corresponds to workpiece pose (see [0024]).)
generating robot motion commands, based on the demonstration data … to cause the robot to perform the operation on the new workpiece (“creating robot instructions from the gestures by using the gesture context to the scene data from the same image or as additional data or extra processing to calculate/generate robot instructions (step 704 and optional step 706), storing the created instructions (step 708), asking if more created instructions are needed (step 710) and in step 712 sending the created instructions to the robot if no more created instructions are needed and performing in FIG. 7b all of the steps shown in FIG. 7a except the step 712 of sending the created instructions to the robot. The optional step 706 in these flowcharts of providing the scene 3D model to convert the gesture to a robot instruction step 704 is only needed if the scene will be subtracted from the image of the gesture.” [0052]; “In general a robot move instruction has information about the robot tool and coordinate system used for the robot target” [0057]);
and performing the operation on the new workpiece by the robot (“that is the scene data to create the path and instructions to be followed by the robot 12 when the robot performs work on the workpiece 14.” [0023]; “By work is meant those actions performed by a robot such as painting, grinding, polishing, deburring, welding etc. that make a physical change to the workpiece and those interactions that a robot has with a workpiece such as picking up the workpiece from one location and moving it to another location or inserting the workpiece into a specific location that does not physically change the workpiece.” [0003]).

Boca does not explicitly teach
receiving the images from a two dimensional camera,
to create demonstration data where the demonstration data defines a pick, move and place operation including a grasping step where hand pose and workpiece pose are determined when the hand grasps the workpiece, … and a place step where the workpiece pose is determined when the workpiece becomes stationary after the move step, 
where the demonstration data includes a hand coordinate frame and a gripper coordinate frame corresponding to the hand coordinate frame, 
where the gripper coordinate frame represents a gripper type selected from a group including a finger-type gripper and a vacuum-type gripper
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, 
analyzing the sizing image to compute transformations from a marker  coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, and calculating the true lengths of the segments of the digits of the human hand;
analyzing camera images of a new workpiece to determine an initial position and orientation of the new workpiece; … and the initial position and orientation of the new workpiece,

However, Yusuke teaches
receiving the images from a two dimensional camera (“The camera 15 is a monocular camera.” See page 2, line 18),
create demonstration data where the demonstration data defines a pick, move and place operation including a grasping step where hand pose and workpiece pose are determined when the hand grasps the workpiece, a move step …, and a place step where the workpiece pose is determined when the workpiece becomes stationary after the move step … and a gripper coordinate frame corresponding to the hand (“The control unit 16 assumes that the user's two fingers (hereinafter, the user's (human) finger is referred to as a “finger” and distinguished from the finger unit 14 or the finger members 14A and 14B of the robot 10) grips the workpiece. When recognized, the position of the finger in the work coordinate system is recognized (step S33). That is, the control unit 16 recognizes that the finger has gripped the workpiece from the finger position and the workpiece position, and converts the finger position at that time to the workpiece coordinate system. The control unit 16 stores the position in the workpiece coordinate system at this time in the storage unit 17 as gripping position information. Next, the user holding the workpiece with two fingers moves the workpiece to a desired location and releases the workpiece (releases the finger from the workpiece). The control unit 16 recognizes that the finger has moved away from the workpiece based on the image from the camera 15, recognizes the position and posture of the workpiece in the robot coordinate system at that time as the position and posture of the movement destination of the workpiece, is stored (step S34). With the above operation, the control unit 16 stores information on the position of the workpiece in the workpiece coordinate system and the position and posture of the workpiece in the robot coordinate system. As a result, preparation for instructing the work to be performed on the work and having the robot 10 perform the work on the work is completed. Note that the position and orientation of the movement destination of the workpiece may be recognized based on the position and orientation of the workpiece when it is recognized that the movement of the workpiece has stopped for a predetermined time or more based on the captured image. In the above embodiment, the gripping position of the workpiece is determined based on the gripping position when gripping the workpiece.” See at least page 3, line 48 to page 4, line 14.),
analyzing camera images of a new workpiece to determine an initial position and orientation of the new workpiece; … and the initial position and orientation of the new workpiece (“The control unit 16 identifies a workpiece in the direction indicated by the finger by collating a captured image obtained by capturing the workpiece with the image DB, and determines the robot coordinates of the workpiece and the workpiece from the captured image and the posture of the arm unit 12 at that time. … If the position and orientation of the workpiece can be recognized, the control unit 16 controls the arm unit 12, the hand unit 13, and the finger unit 14 to cause the finger unit 14 to grip the workpiece (step S42). At this time, the control unit 16 controls the arm unit 12, the hand unit 13, and the finger unit 14 so as to grip the workpiece at the stored workpiece gripping position. When the finger unit 14 grips the workpiece, the control unit 16 controls the arm unit 12 and the hand unit 13 so that the workpiece moves to the movement destination taught in advance and assumes the posture taught in advance.” See at least page 5, lines 6-20; Examiner Interpretation: The grasp and place instructions are taught in advance on a workpiece. The new workpiece is the workpiece in which the robot recognizes and performs the taught actions.),
	 It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca to further include the teachings of Yusuke to quickly and easily teach robots workpiece operations (See at least “problem to be solved” on page 1.).

Yusuke also does not explicitly teach
where the demonstration data includes a hand coordinate frame and a gripper coordinate frame corresponding to the hand coordinate frame, 
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, 
analyzing the sizing image to compute transformations from a marker  coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, and calculating the true lengths of the segments of the digits of the human hand;
where the gripper coordinate frame represents a gripper type selected from a group including a finger-type gripper and a vacuum-type gripper

However, Kofman teaches
where the demonstration data includes a hand coordinate frame and a gripper coordinate frame corresponding to the hand coordinate frame (“The orientation of the hand of the operator is used to control the orientation of the robot-manipulator end-effector and is computed from the 3-D coordinates of the centroids of the three hand markers as shown in Fig. 3. Firstly, the midpoint of the line segment joining the thumb and index-finger marker centroids, T and I, respectively, is defined as M (Fig. 3(a)). A coordinate system X o YoZo with origin at wrist Wis then defined by a translation of the local-site global reference coordinate system XY Z to the wrist [Fig. 3(b)]. Through yaw, pitch, and roll rotations, explained below, the final axes X3Y3Z3 to be used to determine the tool axes of the robot-end-effector are obtained with X3 collinear with WM, WT I coplanar with X3Y3, and T lying in the first quadrant of X3Y3, as shown in Fig. 3(b). The yaw-pitch-roll tool rotation angles are determined directly from the hand rotation angles of WM and TI as follows: yaw rotation a of coordinate system XoYoZo about Zo to X1Y1Z1, pitch rotation (3 of X1Y1Z1 about Y1 to X2Y2Z2, shown in Fig. 3(c) using -(3 for clarity, and roll rotation I of X2Y2Z2 about X2 to X3Y3Z3, as shown in Fig. 3(d).” See at least Pg. 4, Col. 2, lines 1-20; Fig. 3 shows the hand coordinate frame which corresponds to the robot coordinate frame.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca and Yusuke to further include the teachings of Kofman regarding corresponding coordinate frames, to remotely control a robot based on position and orientation of a human operator’s hand in a demonstration of the operation without the restraints of sensors and wires on the human hand. See at least the introduction on Pgs. 1-2.

Kofman also does not explicitly teach
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, 
analyzing the sizing image to compute transformations from a marker  coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, and calculating the true lengths of the segments of the digits of the human hand;
where the gripper coordinate frame represents a gripper type selected from a group including a finger-type gripper and a vacuum-type gripper

However, JETTÉ teaches
where the gripper coordinate frame represents a gripper type selected from a group including a finger-type gripper and a vacuum-type gripper (“In this specific embodiment, the vacuum cup is made of a flexible, resilient material, and the relative distance between the robot and the workpiece held by the robot can vary based on this flexibility and operating conditions. Such variations in the relative distance between a given robot and the workpiece it holds was a source of positioning uncertainty in the reference frame of the robots. This gripper type was found to provide satisfactory gripping capability in the embodiment shown in FIG. 1, but it will be understood that other gripper types can be used in other embodiments. Moreover, more than one gripper, possibly of different gripper types, can be used as the end effector per robot if desired. For instance, a clamp gripper can be used in addition to a vacuum cup for a given robot, or for all robots, for instance. The gripper type or types can vary from one robot to another within a given workpiece holding system embodiment. Indeed, the exact type of gripper can be selected from the following general categories: impactive—e.g. jaws, clamps or claws which physically grasp by direct impact upon the object; ingressive—pins, needles or hackles which physically penetrate the surface of the object (e.g. an aperture or bore of the workpiece); astrictive—forces applied to the objects surface (e.g. by vacuum, magneto- or electroadhesion); and contigutive—requiring direct contact for adhesion to take place (e.g. surface tension or freezing).” [0043-0044])
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca, Yusuke, and Kofman to further include the teachings of JETTÉ so that gripper type can vary as needed for different application requirements (see at least [0043-0044]).

JETTÉ also does not explicitly teach
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, 
analyzing the sizing image to compute transformations from a marker  coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, and calculating the true lengths of the segments of the digits of the human hand;

However, Luo further teaches
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, … calculating the true lengths of the segments of the digits of the human hand (“FIG. 14 depicts how the simplified exemplary user's hand or hands may be photographed by the device's camera or other camera, and this image information may be used to refine the default parameters of the biomechanical and/or anatomical model of the user's hand, in accordance with one embodiment of the present invention. In acquiring such images, often it is useful to have the system provide a standardized background, such as a series of distance markings, grid, graph paper, and the like (1400) in order to better calibrate the image of the hand and correct for image distortions. This standardized background may additionally include various color, shades of gray, and resolution test targets as well. The background may be conveniently provided by, for example, electronically providing one or more background image sheets (e.g. a jpeg, png, pdf or other image file) for printing on the user's printer.” [0204]; “the user may put each hand on background (1400), and take a photo of the hand(s) (1402) with either the computerized device's camera or other camera. This image may then be analyzed, preferably by an image analysis program. The background image will help correct for any image distortions caused by different camera angles, and the like. The user hand image analysis may be done onboard the user's handheld computerized device, but it need not be. In an alternative embodiment, the user may upload one or more images of the hand taken by any imaging device to an external image analyzer, such as a remote internet server. In either event, the image analyzer will analyze the user's skin or hand outline appearance (1404), deduce the most probable lengths one or more bones of the user's hand, such as the user's various finger and thumb bones, and send this data or other data to correct the default biomechanical and/or anatomical model of the user's hand(s) back to the user's computerized device, such as for example during calibration step 906 referenced in FIG. 9 above.” [0205] Examiner Interpretation: The standardized background, such as a series of distance markings, grid, graph paper is equivalent to a fiducial marker grid.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca, Yusuke, Kofman, and JETTÉ to further include the teachings of Luo to improve the accuracy of hand gesture identification and tracking by using a personalized anatomical hand model in place of a default hand model (“In some embodiments, to improve accuracy (that is to replace standard human hand biomechanical and/or anatomical model default parameters with actual user calibration parameters), it will be useful to acquire an image of the user's hands, and to employ various image processing and analysis techniques to analyze this image of the user's one or more hands to better estimate the relative length of the various bones of the user's hands. Indeed, in the event that the user has lost one or more fingers, the system may then use this information to make corresponding changes in its biomechanical and/or anatomical model of the human hand. In other words, the model may include calibration information associated with an image of at least a portion of the hand of the user.” [0203]. Also see at least [0010-0018]).

Luo also does not explicitly teach
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
analyzing the sizing image to compute transformations from a marker  coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, 

However, Itkowitz teaches
analyzing the sizing image to compute transformations from a marker coordinate system to a screen coordinate system (“The three-dimensional reconstruction accuracy relies heavily on the accuracy of camera calibration. Some fiducial markers attached to known locations on the surgeon's console can be used to determine the extrinsic parameters (rotation and translation) of multiple cameras with respect to the surgeon's console. This process can be done automatically. Active fiducial markers can be used for the calibration fiducial markers since such markers are only turned on during a calibration process and before the procedure. During the procedure, the calibration fiducial markers are turned off to avoid confusion with the fiducial markers used to localize the surgeon's hands.” Col. 16, lines 4-15; Examiner Interpretation: The determined rotation and translation of a camera from the surgeons console with fiducial markers at known locations is the transformation from a marker coordinate system to a screen coordinate system.)
using the transformations to compute coordinates of the key points in the marker coordinate system (“FIG. 7 is an illustration of sensor 212 mounted on forefinger 292B with a location 713 in tracking coordinate system 750, and a sensor 211 mounted on thumb 292A with a location 711 in tracking coordinate system 750.” Col. 17, lines 28-31; Examiner Interpretation: The tracking coordinate system is the same as the marker coordinate system.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca, Yusuke, Kofman, JETTÉ, and Luo to further include the calibration teachings of Itkowitz to accurately recreate the three-dimensional hand poses to allow for remote operation of a robot (“The three-dimensional reconstruction accuracy relies heavily on the accuracy of camera calibration.” Col. 16, lines 4-5; “the use of hand gesture poses and hand gesture trajectories to control system 100 makes it is unnecessary for the surgeon to take the surgeon's eyes off the patient and/or viewing screen” See at least Col. 11, lines 26-47.).

Itkowitz also does not explicitly teach
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, 

However, Iqbal teaches
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images … processing the (“the 2.5D keypoint locations are generated using a neural network model, such as the neural network model 210 or 212 shown in FIG. 2A or 2C, respectively. The 2.5D keypoint locations ({circumflex over (P)}.sub.k.sup.2.5D) include three components for each keypoint, namely, pixel coordinates (x.sub.k, y.sub.k) and scale normalized depth values ({circumflex over (Z)}.sub.k.sup.r), where k is the keypoint and r is a root or reference keypoint. In an embodiment, the depth values are determined relative to a reference keypoint. For example, in an embodiment where the 3D pose reconstruction unit 110 is configured for estimating the pose of a hand, the reference keypoint is located on the palm of the hand. The reference keypoint can be chosen to be any keypoint as a function of the object's skeleton model.” See at least col. 3, lines 28-41; Examiner Interpretation: The reference keypoint is equivalent to the hand coordinate frame.), 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments (See fig. 1C (provided below)

    PNG
    media_image1.png
    598
    616
    media_image1.png
    Greyscale
 
“Given an RGB image I of a hand, the goal is to estimate the 2D and 3D positions of K keypoints of the hand. The 2D hand pose is defined as p={p.sub.k}.sub.k∈K and the 3D pose is defined as P={P.sub.k}.sub.k∈K where P.sub.k=(x.sub.k, y.sub.k)∈custom character.sup.2 represents the 2D pixel coordinates of the keypoint kin image I and P.sub.k=(X.sub.k, Y.sub.k, Z.sub.k)∈custom character.sup.3 denotes the location of the keypoint in the 3D camera coordinate frame measured in millimeters. The Z-axis corresponds to the optical axis. Given the intrinsic camera parameters K, the relationship between the 3D location P.sub.k and corresponding 2D projection p.sub.k can be written as follows under a perspective projection: 
    PNG
    media_image2.png
    96
    306
    media_image2.png
    Greyscale
where k∈1, . . . K, Z.sub.root is the depth of the root keypoint, and Z.sub.k.sup.r=Z.sub.k−Z.sub.root corresponds to the depth of the k.sup.th keypoint relative to the root keypoint. In an embodiment, the palm of the hand is used as the root keypoint.” See at least col. 4, lines 24-47; “The 3D pose reconstruction unit 110 calculates the normalized 3D pose based on the 2.5D keypoint locations and intrinsic parameters of the monocular camera corresponding to the captured image of the object. In an embodiment, the 3D pose reconstruction unit 110 provides an exact solution to obtain the absolute 3D pose up to a scale factor. The full absolute 3D pose can be recovered by knowing the measured length of the component of the object's structural skeletal model or using optimization under assumption of prior component length distribution.” See at least Col. 3, lines 50-59; Examiner Interpretation: The perspective projection and scaling step are point-n-perspective calculations.),
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca, Yusuke, Kofman, JETTÉ, Luo, and Itkowitz to further include the teachings of Iqbal to estimate a 3D pose of a human hand with the use of a single 2D camera for the use of human-computer interaction while reducing the impact of the hand’s appearance variation, complex poses, and self-occlusions (“Estimating a 3D pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is useful for human-computer interaction. Hand pose can be represented by a fixed set of points in 3D space, usually joints, called landmarks or keypoints. Estimating the 3D pose accurately is a difficult task due to the large amounts of appearance variation, self-occlusions, and complexity of articulated hand poses. 3D hand pose estimation escalates the difficulties even further because a depth of each of the hand keypoints also has to be estimated. Conventional techniques for determining locations of the landmarks of a hand in 3D space include one or more of multi-view camera systems, depth sensors, and color markers/gloves. Each of the conventional techniques requires a constrained environment and/or specialized equipment. Furthermore, environmental conditions such as sunlight, occlusions, and complexity of non-rigid hand poses present challenges to landmark detection and determination. There is a need for addressing these issues and/or other issues associated with the prior art.” See at least Col. 1, lines 21-40.)
Even though Iqbal doesn’t teach processing a sizing image with a neural network, it would have been obvious to apply a neural network to identify key points on the human hand in a sizing image because a “scale and translation invariant 2.5D pose representation can be easily obtained using the neural network models” (See at least Col. 13, lines 46-48) and neural networks can accurately and quickly solve complex problems in the field of image processing (See the machine learning section in cols. 24-25).

Regarding Claim 2,
Modified Boca teaches
	The method according to Claim 1
Boca further teaches
	wherein demonstrating the operation on the workpiece by the human hand and performing the operation on the new workpiece by the robot are both performed in a robotic work cell  (“There is described below the use of hand gestures to teach a path to be followed by the industrial robot 12 in performing work on workpiece 14. As shown in FIG. 2a, an operator 16 uses hand gestures to point to a location in the robot workspace.” See at least [0021-0022] and figs. 2a and 2b.
 
    PNG
    media_image3.png
    406
    311
    media_image3.png
    Greyscale
 
    PNG
    media_image4.png
    277
    261
    media_image4.png
    Greyscale
; Examiner Interpretation: The demonstration is performed in the robotic workcell when hand gestures are pointed at a location within the robotic workspace. From Fig. 2a, you can see the hand within the workspace of the robot.).

Regarding Claim 12,
Modified Boca teaches
	The method according to Claim 1
Boca further teaches
wherein generating robot motion commands includes generating commands, by a robot controller having a processor and memory (“At step 514, the identified gesture is stored in the memory of the computation device 13 or in the absence of such a device in the memory of the robot controller 15.” [0050]; “creating robot instructions from the gestures by using the gesture context to the scene data from the same image or as additional data or extra processing to calculate/generate robot instructions (step 704 and optional step 706), storing the created instructions (step 708), asking if more created instructions are needed (step 710) and in step 712 sending the created instructions to the robot if no more created instructions are needed and performing in FIG. 7b all of the steps shown in FIG. 7a except the step 712 of sending the created instructions to the robot. The optional step 706 in these flowcharts of providing the scene 3D model to convert the gesture to a robot instruction step 704 is only needed if the scene will be subtracted from the image of the gesture.” [0052]; “In general a robot move instruction has information about the robot tool and coordinate system used for the robot target” [0057]),
to cause a robot gripper to move to a grasping position and orientation based on … and position and orientation of the gripper relative to the workpiece contained in the demonstration data (“the hand and finger location and orientation can be used to calculate the corresponding location and orientation of the robot tool in the robot scene.” [0038]; “location, orientation and associated action can be sent to the robot individually or all at once at the end of the teaching process; … with the image of the scene the part can be recognized and then the processing of the gesture has to be in relationship to the part; the robot targets can be defined relative to a part coordinate system” [0061-0064]; “there will be other points along the path between the start and stop points at which the robot will perform work such as follow a path, pick up an object, drop an object and a unique gesture will be associated with each of these intermediate points.” [0026]).

Boca does not explicitly teach
based on the initial position and orientation of the new workpiece
However, Yusuke teaches
“The control unit 16 identifies a workpiece in the direction indicated by the finger by collating a captured image obtained by capturing the workpiece with the image DB, and determines the robot coordinates of the workpiece and the workpiece from the captured image and the posture of the arm unit 12 at that time. … If the position and orientation of the workpiece can be recognized, the control unit 16 controls the arm unit 12, the hand unit 13, and the finger unit 14 to cause the finger unit 14 to grip the workpiece (step S42). At this time, the control unit 16 controls the arm unit 12, the hand unit 13, and the finger unit 14 so as to grip the workpiece at the stored workpiece gripping position. When the finger unit 14 grips the workpiece, the control unit 16 controls the arm unit 12 and the hand unit 13 so that the workpiece moves to the movement destination taught in advance and assumes the posture taught in advance.” See at least page 5, lines 6-20; Examiner Interpretation: The grasp and place instructions are taught in advance on a workpiece. The new workpiece is the workpiece in which the robot recognizes and performs the taught actions.
	 It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca to further include the teachings of Yusuke to improve flexibility of the robot to handle workpieces in an unexpected pose (“Furthermore, in the present embodiment, when teaching the gripping work, the gripping position in the work system coordinates is stored, so when actually instructing the work, the position and posture of the work to be worked are: If the position and posture of the workpiece can be recognized even if the position and posture are different from those taught, the same workpiece position as that taught can be gripped by the finger members 14A and 14B.” See at least page 5, lines 34-36).

Regarding Claim 13,
Modified Boca teaches
	The method according to Claim 12
Boca further teaches
wherein generating robot motion commands further includes generating commands causing the robot gripper to move the … workpiece from the grasping position to other positions
contained in the demonstration data (“there will be other points along the path between the start and stop points at which the robot will perform work such as follow a path, pick up an object, drop an object and a unique gesture will be associated with each of these intermediate points.” [0026]).

Boca does not explicitly teach
the new workpiece
However, Yusuke teaches
“The control unit 16 identifies a workpiece in the direction indicated by the finger by collating a captured image obtained by capturing the workpiece with the image DB, and determines the robot coordinates of the workpiece and the workpiece from the captured image and the posture of the arm unit 12 at that time. … If the position and orientation of the workpiece can be recognized, the control unit 16 controls the arm unit 12, the hand unit 13, and the finger unit 14 to cause the finger unit 14 to grip the workpiece (step S42). At this time, the control unit 16 controls the arm unit 12, the hand unit 13, and the finger unit 14 so as to grip the workpiece at the stored workpiece gripping position. When the finger unit 14 grips the workpiece, the control unit 16 controls the arm unit 12 and the hand unit 13 so that the workpiece moves to the movement destination taught in advance and assumes the posture taught in advance.” See at least page 5, lines 6-20; Examiner Interpretation: The grasp and place instructions are taught in advance on a workpiece. The new workpiece is the workpiece in which the robot recognizes and performs the taught actions.
	 It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca to further include the teachings of Yusuke to improve flexibility of the robot to handle workpieces in an unexpected pose (“Furthermore, in the present embodiment, when teaching the gripping work, the gripping position in the work system coordinates is stored, so when actually instructing the work, the position and posture of the work to be worked are: If the position and posture of the workpiece can be recognized even if the position and posture are different from those taught, the same workpiece position as that taught can be gripped by the finger members 14A and 14B.” See at least page 5, lines 34-36).

Regarding Claim 17,
Boca teaches
A system for programming a robot to perform an operation by human demonstration, said system comprising (“There is described below the use of hand gestures to teach a path to be followed by the industrial robot 12 in performing work on workpiece 14.” [0021]):
a (“The camera, which is a 3D vision sensor 11, is attached to the robot” [0021]);
an industrial robot (“FIG. 1 shows a block diagram for a robot system with an industrial robot which is used to perform work on a workpiece.” [0008]);
and a robot controller having a processor and memory, said controller being in communication with the robot and receiving images from the camera, said controller being configured to perform steps including (Fig. 1 shows the robot controller 15 communicates to the robot 12 and the vision sensor 11 communicates to the computation device 13.; “The image is used by computation device 13 to calculate the corresponding location and orientation (robot target) on the part/scene of interest. The robot target is sent to the robot controller 15 or the computation device 13.” [0022]; Examiner Interpretation: The robot controller and computation device of Boca are both interpreted together to be a robot controller.):
analyzing camera images of the hand demonstrating the operation on the workpiece (“the image of the location pointing hand gesture of step 304 and the associated location on the object are captured by the camera 11 and sent to the computation device 13 for processing. At step 308, the computation device 13 calculates from the image the corresponding location and orientation of the robot tool in the robot scene.” See at least [0034]; Examiner Interpretation: The location of the robot tool corresponding to the taught locations is the demonstration data. Despite the human hand not physically touching the workpiece in the demonstration, the operation to be performed on the workpiece is demonstrated by the human hand),
a move step where hand pose and workpiece pose are determined at a plurality of points defining a move path (“At step 310 the calculated location and orientation of the robot tool are sent to the computation device. Query 312 asks if more location points are needed to complete the robot path. Query 312 can be another gesture. If the answer is yes, the method 300 asks at query 314 if there is a need to reposition the camera. If the answer to query 314 is no, then the method 300 returns to step 304 where the operator makes the hand gesture associated with the next location point. While not shown in FIG. 3, if the answer to query 314 is yes, then the method 300 returns to step 302 where the camera is repositioned. If the answer to query 312 is no, then method 300 ends since no more robot path points have to be acquired.” [0039]; Examiner Interpretation: Hand pose also corresponds to workpiece pose (see [0024]).)
generating robot motion commands, based on the demonstration data … to cause the robot to perform the operation on the new workpiece (“creating robot instructions from the gestures by using the gesture context to the scene data from the same image or as additional data or extra processing to calculate/generate robot instructions (step 704 and optional step 706), storing the created instructions (step 708), asking if more created instructions are needed (step 710) and in step 712 sending the created instructions to the robot if no more created instructions are needed and performing in FIG. 7b all of the steps shown in FIG. 7a except the step 712 of sending the created instructions to the robot. The optional step 706 in these flowcharts of providing the scene 3D model to convert the gesture to a robot instruction step 704 is only needed if the scene will be subtracted from the image of the gesture.” [0052]; “In general a robot move instruction has information about the robot tool and coordinate system used for the robot target” [0057]);
and performing the operation on the new workpiece by the robot (“that is the scene data to create the path and instructions to be followed by the robot 12 when the robot performs work on the workpiece 14.” [0023]; “By work is meant those actions performed by a robot such as painting, grinding, polishing, deburring, welding etc. that make a physical change to the workpiece and those interactions that a robot has with a workpiece such as picking up the workpiece from one location and moving it to another location or inserting the workpiece into a specific location that does not physically change the workpiece.” [0003]).

Boca does not explicitly teach
a two-dimensional (2D) camera
to create demonstration data, where the demonstration data defines a pick, move and place operation including a grasping step where hand pose and workpiece pose are determined when the hand grasps the workpiece, … and a place step where the workpiece pose is determined when the workpiece becomes stationary after the move step, 
where the demonstration data includes a hand coordinate frame and a gripper coordinate frame corresponding to the hand coordinate frame, 
where the gripper coordinate frame represents a gripper type selected from a group including a finger-type gripper and a vacuum-type gripper
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, 
analyzing the sizing image to compute transformations from a marker coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, and calculating the true lengths of the segments of the digits of the human hand;
analyzing camera images of a new workpiece to determine an initial position and orientation of the new workpiece; … and the initial position and orientation of the new workpiece,

However, Yusuke teaches
a two-dimensional (2D) camera (“The camera 15 is a monocular camera.” See at least page 2, line 18)
create demonstration data where the demonstration data defines a pick, move and place operation including a grasping step where hand pose and workpiece pose are determined when the hand grasps the workpiece, a move step …, and a place step where the workpiece pose is determined when the workpiece becomes stationary after the move step … and a gripper coordinate frame corresponding to the hand (“The control unit 16 assumes that the user's two fingers (hereinafter, the user's (human) finger is referred to as a “finger” and distinguished from the finger unit 14 or the finger members 14A and 14B of the robot 10) grips the workpiece. When recognized, the position of the finger in the work coordinate system is recognized (step S33). That is, the control unit 16 recognizes that the finger has gripped the workpiece from the finger position and the workpiece position, and converts the finger position at that time to the workpiece coordinate system. The control unit 16 stores the position in the workpiece coordinate system at this time in the storage unit 17 as gripping position information. Next, the user holding the workpiece with two fingers moves the workpiece to a desired location and releases the workpiece (releases the finger from the workpiece). The control unit 16 recognizes that the finger has moved away from the workpiece based on the image from the camera 15, recognizes the position and posture of the workpiece in the robot coordinate system at that time as the position and posture of the movement destination of the workpiece, is stored (step S34). With the above operation, the control unit 16 stores information on the position of the workpiece in the workpiece coordinate system and the position and posture of the workpiece in the robot coordinate system. As a result, preparation for instructing the work to be performed on the work and having the robot 10 perform the work on the work is completed. Note that the position and orientation of the movement destination of the workpiece may be recognized based on the position and orientation of the workpiece when it is recognized that the movement of the workpiece has stopped for a predetermined time or more based on the captured image. In the above embodiment, the gripping position of the workpiece is determined based on the gripping position when gripping the workpiece.” See at least page 3, line 48 to page 4, line 14.),
analyzing camera images of a new workpiece to determine an initial position and orientation of the new workpiece; … and the initial position and orientation of the new workpiece (“The control unit 16 identifies a workpiece in the direction indicated by the finger by collating a captured image obtained by capturing the workpiece with the image DB, and determines the robot coordinates of the workpiece and the workpiece from the captured image and the posture of the arm unit 12 at that time. … If the position and orientation of the workpiece can be recognized, the control unit 16 controls the arm unit 12, the hand unit 13, and the finger unit 14 to cause the finger unit 14 to grip the workpiece (step S42). At this time, the control unit 16 controls the arm unit 12, the hand unit 13, and the finger unit 14 so as to grip the workpiece at the stored workpiece gripping position. When the finger unit 14 grips the workpiece, the control unit 16 controls the arm unit 12 and the hand unit 13 so that the workpiece moves to the movement destination taught in advance and assumes the posture taught in advance.” See at least page 5, lines 6-20; Examiner Interpretation: The grasp and place instructions are taught in advance on a workpiece. The new workpiece is the workpiece in which the robot recognizes and performs the taught actions.),
	 It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca to further include the teachings of Yusuke to quickly and easily teach robots workpiece operations (See at least “problem to be solved” on page 1.).

Yusuke also does not explicitly teach
where the demonstration data includes a hand coordinate frame and a gripper coordinate frame corresponding to the hand coordinate frame, 
where the gripper coordinate frame represents a gripper type selected from a group including a finger-type gripper and a vacuum-type gripper
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, 
analyzing the sizing image to compute transformations from a marker coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, and calculating the true lengths of the segments of the digits of the human hand;

However, Kofman teaches
where the demonstration data includes a hand coordinate frame and a gripper coordinate frame corresponding to the hand coordinate frame (“The orientation of the hand of the operator is used to control the orientation of the robot-manipulator end-effector and is computed from the 3-D coordinates of the centroids of the three hand markers as shown in Fig. 3. Firstly, the midpoint of the line segment joining the thumb and index-finger marker centroids, T and I, respectively, is defined as M (Fig. 3(a)). A coordinate system X o YoZo with origin at wrist Wis then defined by a translation of the local-site global reference coordinate system XY Z to the wrist [Fig. 3(b)]. Through yaw, pitch, and roll rotations, explained below, the final axes X3Y3Z3 to be used to determine the tool axes of the robot-end-effector are obtained with X3 collinear with WM, WT I coplanar with X3Y3, and T lying in the first quadrant of X3Y3, as shown in Fig. 3(b). The yaw-pitch-roll tool rotation angles are determined directly from the hand rotation angles of WM and TI as follows: yaw rotation a of coordinate system XoYoZo about Zo to X1Y1Z1, pitch rotation (3 of X1Y1Z1 about Y1 to X2Y2Z2, shown in Fig. 3(c) using -(3 for clarity, and roll rotation I of X2Y2Z2 about X2 to X3Y3Z3, as shown in Fig. 3(d).” See at least Pg. 4, Col. 2, lines 1-20; Fig. 3 shows the hand coordinate frame which corresponds to the robot coordinate frame.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca and Yusuke to further include the teachings of Kofman regarding corresponding coordinate frames, to remotely control a robot based on position and orientation of a human operator’s hand in a demonstration of the operation without the restraints of sensors and wires on the human hand. See at least the introduction on Pgs. 1-2.

Kofman also does not explicitly teach
where the gripper coordinate frame represents a gripper type selected from a group including a finger-type gripper and a vacuum-type gripper
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, 
analyzing the sizing image to compute transformations from a marker coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, and calculating the true lengths of the segments of the digits of the human hand;

However, JETTÉ teaches
where the gripper coordinate frame represents a gripper type selected from a group including a finger-type gripper and a vacuum-type gripper (“In this specific embodiment, the vacuum cup is made of a flexible, resilient material, and the relative distance between the robot and the workpiece held by the robot can vary based on this flexibility and operating conditions. Such variations in the relative distance between a given robot and the workpiece it holds was a source of positioning uncertainty in the reference frame of the robots. This gripper type was found to provide satisfactory gripping capability in the embodiment shown in FIG. 1, but it will be understood that other gripper types can be used in other embodiments. Moreover, more than one gripper, possibly of different gripper types, can be used as the end effector per robot if desired. For instance, a clamp gripper can be used in addition to a vacuum cup for a given robot, or for all robots, for instance. The gripper type or types can vary from one robot to another within a given workpiece holding system embodiment. Indeed, the exact type of gripper can be selected from the following general categories: impactive—e.g. jaws, clamps or claws which physically grasp by direct impact upon the object; ingressive—pins, needles or hackles which physically penetrate the surface of the object (e.g. an aperture or bore of the workpiece); astrictive—forces applied to the objects surface (e.g. by vacuum, magneto- or electroadhesion); and contigutive—requiring direct contact for adhesion to take place (e.g. surface tension or freezing).” [0043-0044])
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca, Yusuke, and Kofman to further include the teachings of JETTÉ so that gripper type can vary as needed for different application requirements (see at least [0043-0044]).

JETTÉ also does not explicitly teach
where the demonstration data includes a hand coordinate frame and a gripper coordinate frame corresponding to the hand coordinate frame, 
where the gripper coordinate frame represents a gripper type selected from a group including a finger-type gripper and a vacuum-type gripper
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, 
analyzing the sizing image to compute transformations from a marker coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, and calculating the true lengths of the segments of the digits of the human hand;

However, Luo further teaches
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, … calculating the true lengths of the segments of the digits of the human hand (“FIG. 14 depicts how the simplified exemplary user's hand or hands may be photographed by the device's camera or other camera, and this image information may be used to refine the default parameters of the biomechanical and/or anatomical model of the user's hand, in accordance with one embodiment of the present invention. In acquiring such images, often it is useful to have the system provide a standardized background, such as a series of distance markings, grid, graph paper, and the like (1400) in order to better calibrate the image of the hand and correct for image distortions. This standardized background may additionally include various color, shades of gray, and resolution test targets as well. The background may be conveniently provided by, for example, electronically providing one or more background image sheets (e.g. a jpeg, png, pdf or other image file) for printing on the user's printer.” [0204]; “the user may put each hand on background (1400), and take a photo of the hand(s) (1402) with either the computerized device's camera or other camera. This image may then be analyzed, preferably by an image analysis program. The background image will help correct for any image distortions caused by different camera angles, and the like. The user hand image analysis may be done onboard the user's handheld computerized device, but it need not be. In an alternative embodiment, the user may upload one or more images of the hand taken by any imaging device to an external image analyzer, such as a remote internet server. In either event, the image analyzer will analyze the user's skin or hand outline appearance (1404), deduce the most probable lengths one or more bones of the user's hand, such as the user's various finger and thumb bones, and send this data or other data to correct the default biomechanical and/or anatomical model of the user's hand(s) back to the user's computerized device, such as for example during calibration step 906 referenced in FIG. 9 above.” [0205] Examiner Interpretation: The standardized background, such as a series of distance markings, grid, graph paper is equivalent to a fiducial marker grid.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca, Yusuke, Kofman, and JETTÉ to further include the teachings of Luo to improve the accuracy of hand gesture identification and tracking by using a personalized anatomical hand model in place of a default hand model (“In some embodiments, to improve accuracy (that is to replace standard human hand biomechanical and/or anatomical model default parameters with actual user calibration parameters), it will be useful to acquire an image of the user's hands, and to employ various image processing and analysis techniques to analyze this image of the user's one or more hands to better estimate the relative length of the various bones of the user's hands. Indeed, in the event that the user has lost one or more fingers, the system may then use this information to make corresponding changes in its biomechanical and/or anatomical model of the human hand. In other words, the model may include calibration information associated with an image of at least a portion of the hand of the user.” [0203]. Also see at least [0010-0018]).

Luo also does not explicitly teach
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
analyzing the sizing image to compute transformations from a marker  coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, 

However, Itkowitz teaches
analyzing the sizing image to compute transformations from a marker coordinate system to a screen coordinate system (“The three-dimensional reconstruction accuracy relies heavily on the accuracy of camera calibration. Some fiducial markers attached to known locations on the surgeon's console can be used to determine the extrinsic parameters (rotation and translation) of multiple cameras with respect to the surgeon's console. This process can be done automatically. Active fiducial markers can be used for the calibration fiducial markers since such markers are only turned on during a calibration process and before the procedure. During the procedure, the calibration fiducial markers are turned off to avoid confusion with the fiducial markers used to localize the surgeon's hands.” Col. 16, lines 4-15; Examiner Interpretation: The determined rotation and translation of a camera from the surgeons console with fiducial markers at known locations is the transformation from a marker coordinate system to a screen coordinate system.)
using the transformations to compute coordinates of the key points in the marker coordinate system (“FIG. 7 is an illustration of sensor 212 mounted on forefinger 292B with a location 713 in tracking coordinate system 750, and a sensor 211 mounted on thumb 292A with a location 711 in tracking coordinate system 750.” Col. 17, lines 28-31; Examiner Interpretation: The tracking coordinate system is the same as the marker coordinate system.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca, Yusuke, Kofman, JETTÉ, and Luo to further include the calibration teachings of Itkowitz to accurately recreate the three-dimensional hand poses to allow for remote operation of a robot (“The three-dimensional reconstruction accuracy relies heavily on the accuracy of camera calibration.” Col. 16, lines 4-5; “the use of hand gesture poses and hand gesture trajectories to control system 100 makes it is unnecessary for the surgeon to take the surgeon's eyes off the patient and/or viewing screen” See at least Col. 11, lines 26-47.).

Itkowitz also does not explicitly teach
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, 

However, Iqbal teaches
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images … processing the (“the 2.5D keypoint locations are generated using a neural network model, such as the neural network model 210 or 212 shown in FIG. 2A or 2C, respectively. The 2.5D keypoint locations ({circumflex over (P)}.sub.k.sup.2.5D) include three components for each keypoint, namely, pixel coordinates (x.sub.k, y.sub.k) and scale normalized depth values ({circumflex over (Z)}.sub.k.sup.r), where k is the keypoint and r is a root or reference keypoint. In an embodiment, the depth values are determined relative to a reference keypoint. For example, in an embodiment where the 3D pose reconstruction unit 110 is configured for estimating the pose of a hand, the reference keypoint is located on the palm of the hand. The reference keypoint can be chosen to be any keypoint as a function of the object's skeleton model.” See at least col. 3, lines 28-41; Examiner Interpretation: The reference keypoint is equivalent to the hand coordinate frame.), 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments (See fig. 1C (provided below)

    PNG
    media_image1.png
    598
    616
    media_image1.png
    Greyscale
 
“Given an RGB image I of a hand, the goal is to estimate the 2D and 3D positions of K keypoints of the hand. The 2D hand pose is defined as p={p.sub.k}.sub.k∈K and the 3D pose is defined as P={P.sub.k}.sub.k∈K where P.sub.k=(x.sub.k, y.sub.k)∈custom character.sup.2 represents the 2D pixel coordinates of the keypoint kin image I and P.sub.k=(X.sub.k, Y.sub.k, Z.sub.k)∈custom character.sup.3 denotes the location of the keypoint in the 3D camera coordinate frame measured in millimeters. The Z-axis corresponds to the optical axis. Given the intrinsic camera parameters K, the relationship between the 3D location P.sub.k and corresponding 2D projection p.sub.k can be written as follows under a perspective projection: 
    PNG
    media_image2.png
    96
    306
    media_image2.png
    Greyscale
where k∈1, . . . K, Z.sub.root is the depth of the root keypoint, and Z.sub.k.sup.r=Z.sub.k−Z.sub.root corresponds to the depth of the k.sup.th keypoint relative to the root keypoint. In an embodiment, the palm of the hand is used as the root keypoint.” See at least col. 4, lines 24-47; “The 3D pose reconstruction unit 110 calculates the normalized 3D pose based on the 2.5D keypoint locations and intrinsic parameters of the monocular camera corresponding to the captured image of the object. In an embodiment, the 3D pose reconstruction unit 110 provides an exact solution to obtain the absolute 3D pose up to a scale factor. The full absolute 3D pose can be recovered by knowing the measured length of the component of the object's structural skeletal model or using optimization under assumption of prior component length distribution.” See at least Col. 3, lines 50-59; Examiner Interpretation: The perspective projection and scaling step are point-n-perspective calculations.),
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca, Yusuke, Kofman, JETTÉ, Luo, and Itkowitz to further include the teachings of Iqbal to estimate a 3D pose of a human hand with the use of a single 2D camera for the use of human-computer interaction while reducing the impact of the hand’s appearance variation, complex poses, and self-occlusions (“Estimating a 3D pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is useful for human-computer interaction. Hand pose can be represented by a fixed set of points in 3D space, usually joints, called landmarks or keypoints. Estimating the 3D pose accurately is a difficult task due to the large amounts of appearance variation, self-occlusions, and complexity of articulated hand poses. 3D hand pose estimation escalates the difficulties even further because a depth of each of the hand keypoints also has to be estimated. Conventional techniques for determining locations of the landmarks of a hand in 3D space include one or more of multi-view camera systems, depth sensors, and color markers/gloves. Each of the conventional techniques requires a constrained environment and/or specialized equipment. Furthermore, environmental conditions such as sunlight, occlusions, and complexity of non-rigid hand poses present challenges to landmark detection and determination. There is a need for addressing these issues and/or other issues associated with the prior art.” See at least Col. 1, lines 21-40.)
Even though Iqbal doesn’t teach processing a sizing image with a neural network, it would have been obvious to apply a neural network to identify key points on the human hand in a sizing image because a “scale and translation invariant 2.5D pose representation can be easily obtained using the neural network models” (See at least Col. 13, lines 46-48) and neural networks can accurately and quickly solve complex problems in the field of image processing (See the machine learning section in cols. 24-25).

Regarding Claims 7 and 18,
Modified Boca teaches
The method according to Claim 1
The system according to Claim 17
Boca does not explicitly teach
wherein analyzing camera images of the hand demonstrating the operation includes identifying locations of a plurality of points on the hand, including a tip, a base knuckle and a second knuckle of each of a thumb and a forefinger.
“the 2.5D keypoint locations are generated using a neural network model, such as the neural network model 210 or 212 shown in FIG. 2A or 2C, respectively. The 2.5D keypoint locations ({circumflex over (P)}.sub.k.sup.2.5D) include three components for each keypoint, namely, pixel coordinates (x.sub.k, y.sub.k) and scale normalized depth values ({circumflex over (Z)}.sub.k.sup.r), where k is the keypoint and r is a root or reference keypoint.” See col. 3, lines 28-34; See the keypoints in fig. 1B (provided below)

    PNG
    media_image5.png
    659
    594
    media_image5.png
    Greyscale
)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca to further include the teachings of Iqbal to estimate a 3D pose of a human hand with the use of a single 2D camera for the use of human-computer interaction while reducing the impact of the hand’s appearance variation, complex poses, and self-occlusions (“Estimating a 3D pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is useful for human-computer interaction. Hand pose can be represented by a fixed set of points in 3D space, usually joints, called landmarks or keypoints. Estimating the 3D pose accurately is a difficult task due to the large amounts of appearance variation, self-occlusions, and complexity of articulated hand poses. 3D hand pose estimation escalates the difficulties even further because a depth of each of the hand keypoints also has to be estimated. Conventional techniques for determining locations of the landmarks of a hand in 3D space include one or more of multi-view camera systems, depth sensors, and color markers/gloves. Each of the conventional techniques requires a constrained environment and/or specialized equipment. Furthermore, environmental conditions such as sunlight, occlusions, and complexity of non-rigid hand poses present challenges to landmark detection and determination. There is a need for addressing these issues and/or other issues associated with the prior art.” See at least Col. 1, lines 21-40.).

Regarding Claims 14 and 28,
Modified Boca teaches
	The method according to Claim 12
	The system according to Claim 17
Boca does not explicitly teach
	wherein the robot gripper is a finger-type gripper or a surface gripper using suction or magnetic force.
However, JETTÉ teaches
The gripper type or types can vary from one robot to another within a given workpiece holding system embodiment. Indeed, the exact type of gripper can be selected from the following general categories: impactive—e.g. jaws, clamps or claws which physically grasp by direct impact upon the object; ingressive—pins, needles or hackles which physically penetrate the surface of the object (e.g. an aperture or bore of the workpiece); astrictive—forces applied to the objects surface (e.g. by vacuum, magneto- or electroadhesion); and contigutive—requiring direct contact for adhesion to take place (e.g. surface tension or freezing).” [0043-0044])
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca to further include the teachings of JETTÉ so that gripper type can vary as needed for different application requirements (see at least [0043-0044]).

Regarding Claim 27,
Modified Boca teaches
	The system according to Claim 17
Boca further teaches
	wherein generating robot motion commands includes generating commands to cause a robot gripper to move to a grasping position and orientation based on the initial position and orientation of the … workpiece and position and orientation of the gripper relative to the workpiece contained in the demonstration data (“the hand and finger location and orientation can be used to calculate the corresponding location and orientation of the robot tool in the robot scene.” [0038]; “location, orientation and associated action can be sent to the robot individually or all at once at the end of the teaching process; … with the image of the scene the part can be recognized and then the processing of the gesture has to be in relationship to the part; the robot targets can be defined relative to a part coordinate system” [0061-0064]; “there will be other points along the path between the start and stop points at which the robot will perform work such as follow a path, pick up an object, drop an object and a unique gesture will be associated with each of these intermediate points.” [0026]),
and generating commands to cause the robot gripper to move the … workpiece from the grasping position to other positions contained in the demonstration data (“there will be other points along the path between the start and stop points at which the robot will perform work such as follow a path, pick up an object, drop an object and a unique gesture will be associated with each of these intermediate points.” [0026]).

Boca does not explicitly teach
the new workpiece
However, Yusuke teaches
“The control unit 16 identifies a workpiece in the direction indicated by the finger by collating a captured image obtained by capturing the workpiece with the image DB, and determines the robot coordinates of the workpiece and the workpiece from the captured image and the posture of the arm unit 12 at that time. … If the position and orientation of the workpiece can be recognized, the control unit 16 controls the arm unit 12, the hand unit 13, and the finger unit 14 to cause the finger unit 14 to grip the workpiece (step S42). At this time, the control unit 16 controls the arm unit 12, the hand unit 13, and the finger unit 14 so as to grip the workpiece at the stored workpiece gripping position. When the finger unit 14 grips the workpiece, the control unit 16 controls the arm unit 12 and the hand unit 13 so that the workpiece moves to the movement destination taught in advance and assumes the posture taught in advance.” See at least page 5, lines 6-20; Examiner Interpretation: The grasp and place instructions are taught in advance on a workpiece. The new workpiece is the workpiece in which the robot recognizes and performs the taught actions.
	 It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca to further include the teachings of Yusuke to improve flexibility of the robot to handle workpieces in an unexpected pose (“Furthermore, in the present embodiment, when teaching the gripping work, the gripping position in the work system coordinates is stored, so when actually instructing the work, the position and posture of the work to be worked are: If the position and posture of the workpiece can be recognized even if the position and posture are different from those taught, the same workpiece position as that taught can be gripped by the finger members 14A and 14B.” See at least page 5, lines 34-36).

Claims 11 and 26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Boca (US 20150314442 A1) in view of Yusuke (translated JP2015044257A), Kofman (IDS: “Teleoperation of a robot manipulator using a vision-based human-robot interface”), JETTÉ (US 20190210217 A1), Luo (US 20150100910 A1), Itkowitz (US 9901402 B2), Iqbal (US 10929654 B2), and Sager (US 5040056 A)

Regarding Claim 11,
Modified Boca teaches
The method according to Claim 1
Boca does not explicitly teach
wherein the new workpiece, before the operation by the robot, rides on a conveyor, and the initial position of the new workpiece is a function of a conveyor position index.
However, Sager teaches
“This invention provides a method and apparatus which uses a vision-equipped robotic system to locate, identify and determine the orientation of objects, and to pick them up and transfer them to a moving or stationary destination. A video camera periodically records images of objects located on a moving conveyor belt. The images are identified and their position and orientation is recorded in a moving conveyor belt coordinate system. The information is transmitted to a motion control device associated with a first robot. The motion control device coordinates the robot with the moving belt coordinate system and instructs the robot's arm to pick up certain objects” See at least Col. 1, lines 50-62; Examiner Interpretation: The use of the moving conveyor belt coordinate system to identify location and orientation is interpreted to be the same as using a function of a conveyor position index.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca to further include the teachings of Sager to increase the flexibility of the robot to pick up randomly positioned objects (“known methods and apparatus are generally not effective with randomly positioned and randomly oriented objects. This is typically the case with objects that are deposited onto a conveyor belt, such as flat components that are asymmetrical about at least one axis. For these parts, the system must locate them on the moving conveyor belt and also determine their orientation. This requires a relatively sophisticated vision system.” See at least Col. 1, lines 17-24).

Regarding Claim 26,
Modified Boca teaches
The system according to Claim 17
Boca does not explicitly teach
wherein the new workpiece, before the operation by the robot, rides on a conveyor, and the initial position of the new workpiece is a function of a conveyor position index.
However, Sager teaches
“This invention provides a method and apparatus which uses a vision-equipped robotic system to locate, identify and determine the orientation of objects, and to pick them up and transfer them to a moving or stationary destination. A video camera periodically records images of objects located on a moving conveyor belt. The images are identified and their position and orientation is recorded in a moving conveyor belt coordinate system. The information is transmitted to a motion control device associated with a first robot. The motion control device coordinates the robot with the moving belt coordinate system and instructs the robot's arm to pick up certain objects” See at least Col. 1, lines 50-62; Examiner Interpretation: The use of the moving conveyor belt coordinate system to identify location and orientation is interpreted to be the same as using a function of a conveyor position index.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca to further include the teachings of Sager to increase the flexibility of the robot to pick up randomly positioned objects (“known methods and apparatus are generally not effective with randomly positioned and randomly oriented objects. This is typically the case with objects that are deposited onto a conveyor belt, such as flat components that are asymmetrical about at least one axis. For these parts, the system must locate them on the moving conveyor belt and also determine their orientation. This requires a relatively sophisticated vision system.” See at least Col. 1, lines 17-24).

Claims 8-9 and 23-24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Boca (US 20150314442 A1) in view of Yusuke (translated JP2015044257A), Kofman (IDS: “Teleoperation of a robot manipulator using a vision-based human-robot interface”), JETTÉ (US 20190210217 A1), Luo (US 20150100910 A1), Itkowitz (US 9901402 B2), Iqbal (US 10929654 B2), and Yunde (NPL: “Hand Action Perception for Robot Programming”).

Regarding Claims 8 and 23,
Modified Boca teaches,
The method according to Claim 7
The system according to Claim 18
Boca does not explicitly teach
wherein the demonstration data includes, at the grasping step of the operation, position and orientation of the hand coordinate frame, the gripper coordinate frame corresponding to the hand coordinate frame,
and a workpiece coordinate frame.
However, Kofman teaches
Wherein the demonstration data includes, at the grasping step of the operation, position and orientation of the hand coordinate frame, the gripper coordinate frame corresponding to the hand coordinate frame (“The orientation of the hand of the operator is used to control the orientation of the robot-manipulator end-effector and is computed from the 3-D coordinates of the centroids of the three hand markers as shown in Fig. 3. Firstly, the midpoint of the line segment joining the thumb and index-finger marker centroids, T and I, respectively, is defined as M (Fig. 3(a)). A coordinate system X o YoZo with origin at wrist Wis then defined by a translation of the local-site global reference coordinate system XY Z to the wrist [Fig. 3(b)]. Through yaw, pitch, and roll rotations, explained below, the final axes X3Y3Z3 to be used to determine the tool axes of the robot-end-effector are obtained with X3 collinear with WM, WT I coplanar with X3Y3, and T lying in the first quadrant of X3Y3, as shown in Fig. 3(b). The yaw-pitch-roll tool rotation angles are determined directly from the hand rotation angles of WM and TI as follows: yaw rotation a of coordinate system XoYoZo about Zo to X1Y1Z1, pitch rotation (3 of X1Y1Z1 about Y1 to X2Y2Z2, shown in Fig. 3(c) using -(3 for clarity, and roll rotation I of X2Y2Z2 about X2 to X3Y3Z3, as shown in Fig. 3(d).” See at least Pg. 4, Col. 2, lines 1-20; Fig. 3 shows the hand coordinate frame which corresponds to the robot coordinate frame.; Fig. 1 shows a human demonstrating a grasp corresponding to the robot which is actually grasping an object.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca to further include the teachings of Kofman regarding corresponding coordinate frames, to remotely control a robot based on position and orientation of a human operator’s hand in a demonstration of the operation without the restraints of sensors and wires on the human hand. See at least the introduction on Pgs. 1-2.

Kofman also does not explicitly teach
and a workpiece coordinate frame.
However, Yunde teaches
A workpiece coordinate frame (Hand objects in the workspace are tracked and the hand as well as the objects are given coordinate frames in which transformations between them and the world frame are performed to track their positions and orientations. See at least Pg. 3, Col. 2 and Fig. 4.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca and Kofman to further include the teachings of Yunde to teach robot operations by a human demonstration manipulating an object so that the same object manipulation can be performed by the robot (“we describe a more general framework of hand action perception using depth image sequences. We aim to build a robot system which should learn not only the initial poses and final destinations of the objects and the order of assembly, but also should perceive how objects move and how they are manipulated. Thus, tracking and understanding both object motion and human hand action will be crucial for a robot to learn a general assembly task. Our system has two multibaseline stereo vision systems with the same configuration located respectively in a human world and a robot world. The human instructor must simply demonstrate the task in front of the vision system in the human world, no dataglove or special markings are necessary. The recorded image sequences are used to recover a depth image sequence for model-based human hand and object tracking to form perceptual data streams. The data streams are segmented and precisely interpreted to create a task sequence of the description of the human hand action and object motion for generating the robot control sequence or reporting what is going on in the workspace. This paper follows the assembly plan from observation (APO) paradigm proposed by Ikeuchi and Suehiro[7].” See at least introduction on Pg. 1.).

Regarding Claims 9 and 24,
Modified Boca teaches
The method according to Claim 8
The system according to Claim 23
Boca further teaches
for the plurality of points in the move step of the operation (“At step 310 the calculated location and orientation of the robot tool are sent to the computation device. Query 312 asks if more location points are needed to complete the robot path. Query 312 can be another gesture. If the answer is yes, the method 300 asks at query 314 if there is a need to reposition the camera. If the answer to query 314 is no, then the method 300 returns to step 304 where the operator makes the hand gesture associated with the next location point. While not shown in FIG. 3, if the answer to query 314 is yes, then the method 300 returns to step 302 where the camera is repositioned. If the answer to query 312 is no, then method 300 ends since no more robot path points have to be acquired.” [0039]),

Boca does not explicitly teach
wherein the demonstration data further includes positions of the hand coordinate frame and the workpiece coordinate frame
and position and orientation of the workpiece coordinate frame for the place step of the operation.
However, Yunde teaches
wherein the demonstration data further includes positions of the hand coordinate frame and the workpiece coordinate frame (Hand objects in the workspace are tracked and the hand as well as the objects are given coordinate frames in which transformations between them and the world frame are performed to track their positions and orientations. See at least Pg. 3, Col. 2 and Fig. 4.)
for the plurality of points in the move step of the operation (“The robot system should learn not only the initial poses and destinations of the objects and the orders of assembly, but also should perceive how objects move for replicating the task. For this purpose, the system has to perceive the trajectories of all objects with orientations at each point in the sequence for forming perceptual data streams. This is sufficient for simple tasks and structured environment. In general, the system has to track not only objects but also the human hand in able to robustly and completely understand a given task for the purpose of efficiently replicating the task, Otherwise, an APO approach will encounter difficulties, especially with partial occlusion case and motions such as screw-turning a bolt. After tracking, the perceptual data streams of each object and each part of the hand are created for interpretation” See at least Pg. 2, Col. 2.),
and position and orientation of the workpiece coordinate frame for the place step of the operation (“The frame kem at which the hand stops manipulating the object occurs when the hand is departing from the manipulated object.” See at least Pg. 5, Col. 2, lines 12-15.; Examiner Interpretation: The frame kem is the workpiece coordinate frame which has a known position and orientation.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca to further include the teachings of Yunde to teach robot operations by a human demonstration manipulating an object so that the same object manipulation can be performed by the robot (“we describe a more general framework of hand action perception using depth image sequences. We aim to build a robot system which should learn not only the initial poses and final destinations of the objects and the order of assembly, but also should perceive how objects move and how they are manipulated. Thus, tracking and understanding both object motion and human hand action will be crucial for a robot to learn a general assembly task. Our system has two multibaseline stereo vision systems with the same configuration located respectively in a human world and a robot world. The human instructor must simply demonstrate the task in front of the vision system in the human world, no dataglove or special markings are necessary. The recorded image sequences are used to recover a depth image sequence for model-based human hand and object tracking to form perceptual data streams. The data streams are segmented and precisely interpreted to create a task sequence of the description of the human hand action and object motion for generating the robot control sequence or reporting what is going on in the workspace. This paper follows the assembly plan from observation (APO) paradigm proposed by Ikeuchi and Suehiro[7].” See at least introduction on Pg. 1.). 

Claims 10 and 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Boca (US 20150314442 A1) in view of Yusuke (translated JP2015044257A), Kofman (IDS: “Teleoperation of a robot manipulator using a vision-based human-robot interface”), JETTÉ (US 20190210217 A1), Luo (US 20150100910 A1), Itkowitz (US 9901402 B2), Iqbal (US 10929654 B2), Yunde (NPL: “Hand Action Perception for Robot Programming”), and Pham (NPL: “A proposal of extracting of motion primitives by analyzing tracked data of hand motion from human demonstration”).

Regarding Claims 10 and 25,
Modified Boca teaches
The method according to Claim 8
The system according to Claim 23
Boca does not explicitly teach
wherein the hand coordinate frame has an origin at a point midway between the base knuckles of the thumb and forefinger,
a Z axis passing through a point midway between the tips of the thumb and forefinger,
and a Y axis normal to a plane containing the thumb and forefinger.
However, Pham teaches
 Wherein the hand coordinate frame has an origin at a point midway between the base knuckles of the thumb and forefinger (Fig. 4 Shows the origin of the hand frame being located at (B), between the base of the thumb and forefinger.; “We define three important points including: the center point of red color area on the part between the index finger and the thumb, denoted by point B” See Pg. 2, Col. 2, lines 27-30),
and a Y axis normal to a plane containing the thumb and forefinger (Fig. 4 shows an axis perpendicular to the plane containing the thumb and forefinger with the orange axis pointing towards the top left of the page that corresponds to the red axis pointing upwards on the hand in the first image.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca to further include the teachings of Pham to allow for robot programming by demonstration with the use of an inexpensive sensor that can use hand motion data to help the robot execute actions in new situations where the gripper orientation may need to be different than a previous situation to successfully complete the task. See at least the section “Overview of method” on Pg. 2.

Pham also does not explicitly teach
a Z axis passing through a point midway between the tips of the thumb and forefinger,
However, Kofman teaches
 A Z axis passing through a point midway between the tips of the thumb and forefinger (Fig. 3 shows the hand coordinate frame with the X axis passing through the midway point (M) between the thumb (T) and index finger (I) fingertips. See at least Pg. 4, Col. 2 and Fig. 2(b) and Fig. 3.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca to further include the teachings of Kofman to remotely control a robot based on position and orientation of a human operator’s hand in a demonstration of the operation without the restraints of sensors and wires on the human hand. See at least the introduction on Pgs. 1-2.

Claims 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Boca (US 20150314442 A1) in view of Kofman (IDS: “Teleoperation of a robot manipulator using a vision-based human-robot interface”), JETTÉ (US 20190210217 A1), Sager (US 5040056 A), Luo (US 20150100910 A1), Itkowitz (US 9901402 B2), and Iqbal (US 10929654 B2).

Regarding Claim 15,
Boca teaches
A method for programming a robot to perform an operation by human demonstration, said method comprising (“There is described below the use of hand gestures to teach a path to be followed by the industrial robot 12 in performing work on workpiece 14.” [0021]):
demonstrating the operation on a workpiece by a human hand (“the instructions to the robot 12 that will be assembled from the hand gestures from the one or two hands seen by the camera and as described herein the object being pointed to, that is the scene data to create the path and instructions to be followed by the robot 12 when the robot performs work on the workpiece 14. For example, one hand is used to teach a robot target and the other hand is used to generate a grab or drop instruction. It is up to the robot operator to associate a particular hand gesture with a particular instruction.” [0023]; Examiner Interpretation: Despite the human hand not physically touching the workpiece in the demonstration, the operation to be performed on the workpiece is demonstrated by the human hand.);
analyzing camera images of the hand demonstrating the operation on the workpiece to create demonstration data (“the image of the location pointing hand gesture of step 304 and the associated location on the object are captured by the camera 11 and sent to the computation device 13 for processing. At step 308, the computation device 13 calculates from the image the corresponding location and orientation of the robot tool in the robot scene.” See at least [0034]; Examiner Interpretation: The location of the robot tool corresponding to the taught locations is the demonstration data.),
determining a position and orientation of the gripper (“the hand and finger location and orientation can be used to calculate the corresponding location and orientation of the robot tool in the robot scene.” [0038]; “location, orientation and associated action can be sent to the robot individually or all at once at the end of the teaching process; … with the image of the scene the part can be recognized and then the processing of the gesture has to be in relationship to the part; the robot targets can be defined relative to a part coordinate system” [0061-0064]);
generating robot motion commands, by a robot controller having a processor and memory, based on the demonstration data … to cause the robot to perform the operation on the new workpiece (“At step 514, the identified gesture is stored in the memory of the computation device 13 or in the absence of such a device in the memory of the robot controller 15.” [0050]; “creating robot instructions from the gestures by using the gesture context to the scene data from the same image or as additional data or extra processing to calculate/generate robot instructions (step 704 and optional step 706), storing the created instructions (step 708), asking if more created instructions are needed (step 710) and in step 712 sending the created instructions to the robot if no more created instructions are needed and performing in FIG. 7b all of the steps shown in FIG. 7a except the step 712 of sending the created instructions to the robot. The optional step 706 in these flowcharts of providing the scene 3D model to convert the gesture to a robot instruction step 704 is only needed if the scene will be subtracted from the image of the gesture.” [0052]; “In general a robot move instruction has information about the robot tool and coordinate system used for the robot target” [0057]),
and the position and orientation of the gripper relative to the workpiece contained in the demonstration data (“the hand and finger location and orientation can be used to calculate the corresponding location and orientation of the robot tool in the robot scene.” [0038]; “location, orientation and associated action can be sent to the robot individually or all at once at the end of the teaching process; … with the image of the scene the part can be recognized and then the processing of the gesture has to be in relationship to the part; the robot targets can be defined relative to a part coordinate system” [0061-0064]);
and performing the operation on the new workpiece by the robot (“that is the scene data to create the path and instructions to be followed by the robot 12 when the robot performs work on the workpiece 14.” [0023]; “By work is meant those actions performed by a robot such as painting, grinding, polishing, deburring, welding etc. that make a physical change to the workpiece and those interactions that a robot has with a workpiece such as picking up the workpiece from one location and moving it to another location or inserting the workpiece into a specific location that does not physically change the workpiece.” [0003]).

Boca does not explicitly teach
including analyzing image pixel data of the hand, from a two-dimensional (2D) camera, to identify tip, base knuckle and second knuckle points on a thumb and forefinger of the hand, 
and compute a hand coordinate frame and a corresponding gripper coordinate frame, 
where the gripper coordinate frame represents a gripper type selected from a group including a finger- type gripper and a vacuum-type gripper,
where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, 
analyzing the sizing image to compute transformations from a marker coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, and calculating the true lengths of the segments of the digits of the human hand;
analyzing camera images of a new workpiece to determine an initial position and orientation of the new workpiece, … and the initial position and orientation of the new workpiece,
including adjusting the initial position and orientation of the new workpiece based on a conveyor position index;
including motion commands causing the gripper to move to a grasping position and orientation based on the initial position and orientation of the new workpiece

However, Kofman teaches
compute a hand coordinate frame and a corresponding gripper coordinate frame (Fig. 3 (provided below) shows the hand coordinate frame.

    PNG
    media_image6.png
    447
    511
    media_image6.png
    Greyscale
 This coordinate frame corresponds to the robot coordinate frame shown in fig. 5(c) (provided below)

    PNG
    media_image7.png
    252
    346
    media_image7.png
    Greyscale
).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca to further include the teachings of Kofman to remotely control a robot based on position and orientation of a human operator’s hand in a demonstration of the operation without the restraints of sensors and wires on the human hand. See at least the introduction on Pgs. 1-2.

Kofman also does not explicitly teach
including analyzing image pixel data of the hand, from a two-dimensional (2D) camera, to identify tip, base knuckle and second knuckle points on a thumb and forefinger of the hand, 
where the gripper coordinate frame represents a gripper type selected from a group including a finger- type gripper and a vacuum-type gripper,
where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, 
analyzing the sizing image to compute transformations from a marker coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, and calculating the true lengths of the segments of the digits of the human hand;
analyzing camera images of a new workpiece to determine an initial position and orientation of the new workpiece, … and the initial position and orientation of the new workpiece,
including adjusting the initial position and orientation of the new workpiece based on a conveyor position index;
including motion commands causing the gripper to move to a grasping position and orientation based on the initial position and orientation of the new workpiece

However, JETTÉ teaches
where the gripper coordinate frame represents a gripper type selected from a group including a finger-type gripper and a vacuum-type gripper (“In this specific embodiment, the vacuum cup is made of a flexible, resilient material, and the relative distance between the robot and the workpiece held by the robot can vary based on this flexibility and operating conditions. Such variations in the relative distance between a given robot and the workpiece it holds was a source of positioning uncertainty in the reference frame of the robots. This gripper type was found to provide satisfactory gripping capability in the embodiment shown in FIG. 1, but it will be understood that other gripper types can be used in other embodiments. Moreover, more than one gripper, possibly of different gripper types, can be used as the end effector per robot if desired. For instance, a clamp gripper can be used in addition to a vacuum cup for a given robot, or for all robots, for instance. The gripper type or types can vary from one robot to another within a given workpiece holding system embodiment. Indeed, the exact type of gripper can be selected from the following general categories: impactive—e.g. jaws, clamps or claws which physically grasp by direct impact upon the object; ingressive—pins, needles or hackles which physically penetrate the surface of the object (e.g. an aperture or bore of the workpiece); astrictive—forces applied to the objects surface (e.g. by vacuum, magneto- or electroadhesion); and contigutive—requiring direct contact for adhesion to take place (e.g. surface tension or freezing).” [0043-0044])
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca and Kofman to further include the teachings of JETTÉ so that gripper type can vary as needed for different application requirements (see at least [0043-0044]).

JETTÉ also does not explicitly teach
including analyzing image pixel data of the hand, from a two-dimensional (2D) camera, to identify tip, base knuckle and second knuckle points on a thumb and forefinger of the hand, 
where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, 
analyzing the sizing image to compute transformations from a marker coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, and calculating the true lengths of the segments of the digits of the human hand;
analyzing camera images of a new workpiece to determine an initial position and orientation of the new workpiece, … and the initial position and orientation of the new workpiece,
including adjusting the initial position and orientation of the new workpiece based on a conveyor position index;
including motion commands causing the gripper to move to a grasping position and orientation based on the initial position and orientation of the new workpiece

However, Sager teaches
analyzing camera images of a new workpiece to determine an initial position and orientation of the new workpiece and generating robot motion commands based on the initial position and orientation of the new workpiece (“This invention provides a method and apparatus which uses a vision-equipped robotic system to locate, identify and determine the orientation of objects, and to pick them up and transfer them to a moving or stationary destination.” Col. 1, lines 50-54; Examiner Interpretation: The determined locations and orientations of the objects are initial locations and orientations because they are in that position before being picked up. They are new workpieces since they are different from the demonstrated workpiece.).
adjusting the initial position and orientation of the new workpiece based on a conveyor position index (“A video camera periodically records images of objects located on a moving conveyor belt. The images are identified and their position and orientation is recorded in a moving conveyor belt coordinate system. The information is transmitted to a motion control device associated with a first robot. The motion control device coordinates the robot with the moving belt coordinate system and instructs the robot's arm to pick up certain objects” See at least Col. 1, lines 50-62; “In determining whether an object has moved past the pick-up window or has not yet moved into the pick-up window, the motion controller considers the time it would take the robot arm to move from its current position to the location of the object and the distance the object would travel on the belt during that time. In other words, an object that is in the pick-up window when the robot is prepared to pick it up, may move out of the pick-up window by the time the robot can reach it. The motion controller considers that movement and will not attempt to pick it up.” Col. 7, lines 1-11; Examiner Interpretation: The moving conveyor belt coordinate system is the conveyor position index and adjusting the initial position and orientation is done by accounting for the movement of the objects).
Motion commands causing the gripper to move to a grasping position and orientation based on the initial position and orientation of the new workpiece (“The motion controller will go ahead and direct the robot to pick-up that object after accounting for its movement during the time it takes for the robot to reach it.” See at least Col. 7, lines 14-17; “the motion controller considers the time it would take the robot arm to move from its current position to the location of the object” Col. 7, lines 2-5; “The motion controller considers the object orientation in picking up the object” Col. 7, lines 60-61).
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca, Kofman, and JETTÉ to further include the teachings of Sager to increase the flexibility of the robot to pick up randomly positioned objects (“known methods and apparatus are generally not effective with randomly positioned and randomly oriented objects. This is typically the case with objects that are deposited onto a conveyor belt, such as flat components that are asymmetrical about at least one axis. For these parts, the system must locate them on the moving conveyor belt and also determine their orientation. This requires a relatively sophisticated vision system.” See at least Col. 1, lines 17-24).

Sager also does not explicitly teach
including analyzing image pixel data of the hand, from a two-dimensional (2D) camera, to identify tip, base knuckle and second knuckle points on a thumb and forefinger of the hand, 
where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, 
analyzing the sizing image to compute transformations from a marker coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, and calculating the true lengths of the segments of the digits of the human hand;

However, Luo further teaches
and the true lengths of the plurality of segments of the digits of the human hand were previously determined using a hand size image analysis step including providing a sizing image of the human hand on a fiducial marker grid, … calculating the true lengths of the segments of the digits of the human hand (“FIG. 14 depicts how the simplified exemplary user's hand or hands may be photographed by the device's camera or other camera, and this image information may be used to refine the default parameters of the biomechanical and/or anatomical model of the user's hand, in accordance with one embodiment of the present invention. In acquiring such images, often it is useful to have the system provide a standardized background, such as a series of distance markings, grid, graph paper, and the like (1400) in order to better calibrate the image of the hand and correct for image distortions. This standardized background may additionally include various color, shades of gray, and resolution test targets as well. The background may be conveniently provided by, for example, electronically providing one or more background image sheets (e.g. a jpeg, png, pdf or other image file) for printing on the user's printer.” [0204]; “the user may put each hand on background (1400), and take a photo of the hand(s) (1402) with either the computerized device's camera or other camera. This image may then be analyzed, preferably by an image analysis program. The background image will help correct for any image distortions caused by different camera angles, and the like. The user hand image analysis may be done onboard the user's handheld computerized device, but it need not be. In an alternative embodiment, the user may upload one or more images of the hand taken by any imaging device to an external image analyzer, such as a remote internet server. In either event, the image analyzer will analyze the user's skin or hand outline appearance (1404), deduce the most probable lengths one or more bones of the user's hand, such as the user's various finger and thumb bones, and send this data or other data to correct the default biomechanical and/or anatomical model of the user's hand(s) back to the user's computerized device, such as for example during calibration step 906 referenced in FIG. 9 above.” [0205] Examiner Interpretation: The standardized background, such as a series of distance markings, grid, graph paper is equivalent to a fiducial marker grid.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca, Kofman, JETTÉ, and Sager to further include the teachings of Luo to improve the accuracy of hand gesture identification and tracking by using a personalized anatomical hand model in place of a default hand model (“In some embodiments, to improve accuracy (that is to replace standard human hand biomechanical and/or anatomical model default parameters with actual user calibration parameters), it will be useful to acquire an image of the user's hands, and to employ various image processing and analysis techniques to analyze this image of the user's one or more hands to better estimate the relative length of the various bones of the user's hands. Indeed, in the event that the user has lost one or more fingers, the system may then use this information to make corresponding changes in its biomechanical and/or anatomical model of the human hand. In other words, the model may include calibration information associated with an image of at least a portion of the hand of the user.” [0203]. Also see at least [0010-0018]).

Luo also does not explicitly teach
including analyzing image pixel data of the hand, from a two-dimensional (2D) camera, to identify tip, base knuckle and second knuckle points on a thumb and forefinger of the hand, 
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
analyzing the sizing image to compute transformations from a marker  coordinate system to a screen coordinate system, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, using the transformations to compute coordinates of the key points in the marker coordinate system, 

However, Itkowitz teaches
analyzing the sizing image to compute transformations from a marker coordinate system to a screen coordinate system (“The three-dimensional reconstruction accuracy relies heavily on the accuracy of camera calibration. Some fiducial markers attached to known locations on the surgeon's console can be used to determine the extrinsic parameters (rotation and translation) of multiple cameras with respect to the surgeon's console. This process can be done automatically. Active fiducial markers can be used for the calibration fiducial markers since such markers are only turned on during a calibration process and before the procedure. During the procedure, the calibration fiducial markers are turned off to avoid confusion with the fiducial markers used to localize the surgeon's hands.” Col. 16, lines 4-15; Examiner Interpretation: The determined rotation and translation of a camera from the surgeons console with fiducial markers at known locations is the transformation from a marker coordinate system to a screen coordinate system.)
using the transformations to compute coordinates of the key points in the marker coordinate system (“FIG. 7 is an illustration of sensor 212 mounted on forefinger 292B with a location 713 in tracking coordinate system 750, and a sensor 211 mounted on thumb 292A with a location 711 in tracking coordinate system 750.” Col. 17, lines 28-31; Examiner Interpretation: The tracking coordinate system is the same as the marker coordinate system.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca, Kofman, JETTÉ, Sager, and Luo to further include the calibration teachings of Itkowitz to accurately recreate the three-dimensional hand poses to allow for remote operation of a robot (“The three-dimensional reconstruction accuracy relies heavily on the accuracy of camera calibration.” Col. 16, lines 4-5; “the use of hand gesture poses and hand gesture trajectories to control system 100 makes it is unnecessary for the surgeon to take the surgeon's eyes off the patient and/or viewing screen” See at least Col. 11, lines 26-47.).

Itkowitz also does not explicitly teach
including analyzing image pixel data of the hand, from a two-dimensional (2D) camera, to identify tip, base knuckle and second knuckle points on a thumb and forefinger of the hand, 
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images, 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments, 
processing the sizing image in a neural network convolution layer to identify key points on the human hand in the sizing image, 

However, Iqbal teaches
including analyzing image pixel data of the hand, from a two-dimensional (2D) camera, to identify tip, base knuckle and second knuckle points on a thumb and forefinger of the hand (“Estimating a three-dimensional (3D) pose of an object from a 2D image can be implemented using deep learning. A hand pose is represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement in the 2D image and a third coordinate represents the depth of every point with respect to the camera. A monocular camera may be used to capture an image of the 3D pose, but does not capture depth information.” See at least col. 1, lines 44-51; “the 2.5D keypoint locations are generated using a neural network model, such as the neural network model 210 or 212 shown in FIG. 2A or 2C, respectively. The 2.5D keypoint locations ({circumflex over (P)}.sub.k.sup.2.5D) include three components for each keypoint, namely, pixel coordinates (x.sub.k, y.sub.k) and scale normalized depth values ({circumflex over (Z)}.sub.k.sup.r), where k is the keypoint and r is a root or reference keypoint.” See col. 3, lines 28-34; See the keypoints in fig. 1B (provided below)

    PNG
    media_image5.png
    659
    594
    media_image5.png
    Greyscale
), 
and where the hand coordinate frame is computed from the camera images by processing the camera images in a neural network convolution layer to identify key points on the human hand in the camera images … processing the (“the 2.5D keypoint locations are generated using a neural network model, such as the neural network model 210 or 212 shown in FIG. 2A or 2C, respectively. The 2.5D keypoint locations ({circumflex over (P)}.sub.k.sup.2.5D) include three components for each keypoint, namely, pixel coordinates (x.sub.k, y.sub.k) and scale normalized depth values ({circumflex over (Z)}.sub.k.sup.r), where k is the keypoint and r is a root or reference keypoint. In an embodiment, the depth values are determined relative to a reference keypoint. For example, in an embodiment where the 3D pose reconstruction unit 110 is configured for estimating the pose of a hand, the reference keypoint is located on the palm of the hand. The reference keypoint can be chosen to be any keypoint as a function of the object's skeleton model.” See at least col. 3, lines 28-41; Examiner Interpretation: The reference keypoint is equivalent to the hand coordinate frame.), 
performing a Point-n-Perspective calculation using the key points on the human hand in the camera images and previously determined true lengths of a plurality of segments of digits of the human hand, and calculating a three-dimensional pose of the plurality of segments (See fig. 1C (provided below)

    PNG
    media_image1.png
    598
    616
    media_image1.png
    Greyscale
 
“Given an RGB image I of a hand, the goal is to estimate the 2D and 3D positions of K keypoints of the hand. The 2D hand pose is defined as p={p.sub.k}.sub.k∈K and the 3D pose is defined as P={P.sub.k}.sub.k∈K where P.sub.k=(x.sub.k, y.sub.k)∈custom character.sup.2 represents the 2D pixel coordinates of the keypoint kin image I and P.sub.k=(X.sub.k, Y.sub.k, Z.sub.k)∈custom character.sup.3 denotes the location of the keypoint in the 3D camera coordinate frame measured in millimeters. The Z-axis corresponds to the optical axis. Given the intrinsic camera parameters K, the relationship between the 3D location P.sub.k and corresponding 2D projection p.sub.k can be written as follows under a perspective projection: 
    PNG
    media_image2.png
    96
    306
    media_image2.png
    Greyscale
where k∈1, . . . K, Z.sub.root is the depth of the root keypoint, and Z.sub.k.sup.r=Z.sub.k−Z.sub.root corresponds to the depth of the k.sup.th keypoint relative to the root keypoint. In an embodiment, the palm of the hand is used as the root keypoint.” See at least col. 4, lines 24-47; “The 3D pose reconstruction unit 110 calculates the normalized 3D pose based on the 2.5D keypoint locations and intrinsic parameters of the monocular camera corresponding to the captured image of the object. In an embodiment, the 3D pose reconstruction unit 110 provides an exact solution to obtain the absolute 3D pose up to a scale factor. The full absolute 3D pose can be recovered by knowing the measured length of the component of the object's structural skeletal model or using optimization under assumption of prior component length distribution.” See at least Col. 3, lines 50-59; Examiner Interpretation: The perspective projection and scaling step are point-n-perspective calculations.),
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Boca, Kofman, JETTÉ, Sager, Luo, and Itkowitz to further include the teachings of Iqbal to estimate a 3D pose of a human hand with the use of a single 2D camera for the use of human-computer interaction while reducing the impact of the hand’s appearance variation, complex poses, and self-occlusions (“Estimating a 3D pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is useful for human-computer interaction. Hand pose can be represented by a fixed set of points in 3D space, usually joints, called landmarks or keypoints. Estimating the 3D pose accurately is a difficult task due to the large amounts of appearance variation, self-occlusions, and complexity of articulated hand poses. 3D hand pose estimation escalates the difficulties even further because a depth of each of the hand keypoints also has to be estimated. Conventional techniques for determining locations of the landmarks of a hand in 3D space include one or more of multi-view camera systems, depth sensors, and color markers/gloves. Each of the conventional techniques requires a constrained environment and/or specialized equipment. Furthermore, environmental conditions such as sunlight, occlusions, and complexity of non-rigid hand poses present challenges to landmark detection and determination. There is a need for addressing these issues and/or other issues associated with the prior art.” See at least Col. 1, lines 21-40.).
Even though Iqbal doesn’t teach processing a sizing image with a neural network, it would have been obvious to apply a neural network to identify key points on the human hand in a sizing image because a “scale and translation invariant 2.5D pose representation can be easily obtained using the neural network models” (See at least Col. 13, lines 46-48) and neural networks can accurately and quickly solve complex problems in the field of image processing (See the machine learning section in cols. 24-25).

Regarding Claim 16,
Modified Boca teaches
The method according to Claim 15
Boca further teaches
	wherein demonstrating the operation on the workpiece by the human hand and performing the operation on the new workpiece by the robot are both performed in a robotic work cell, and the (“There is described below the use of hand gestures to teach a path to be followed by the industrial robot 12 in performing work on workpiece 14. As shown in FIG. 2a, an operator 16 uses hand gestures to point to a location in the robot workspace. The camera, which is a 3D vision sensor 11, is attached to the robot and takes an image of the hand gesture and the relationship of the operator's hand 16a to the workpiece 14. It should be appreciated that the workpiece 14 while shown in FIG. 2a may not be in the view seen by the camera. The workpiece image may have been taken at a different time and as is described below the image of the hand without the workpiece and the workpiece without the hand need to be referenced to a common coordinate system. FIG. 2b shows one example of the image and the relationship of the operator's hand 16a to the workpiece 14.” See at least [0021-0022] and figs. 2a and 2b.; Examiner Interpretation: The demonstration is performed in the robotic workcell when hand gestures are pointed at a location within the robotic workspace. From Fig. 2a, you can see the hand within the workspace of the robot. Despite the human hand not physically touching the workpiece in the demonstration, the operation to be performed on the workpiece is demonstrated by the human hand.).

Boca does not explicitly teach
the 2D camera provides the camera images of the hand
and the camera images of the new workpiece
However, Sager teaches
and the camera images of the new workpiece (“This invention provides a method and apparatus which uses a vision-equipped robotic system to locate, identify and determine the orientation of objects, and to pick them up and transfer them to a moving or stationary destination. A video camera periodically records images of objects located on a moving conveyor belt.” Col. 1, lines 50-56; Examiner Interpretation: The determined locations and orientations of the objects are initial locations and orientations because they are in that position before being picked up. They are new workpieces since they are different from the demonstrated workpiece.)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca to further include the teachings of Sager to increase the flexibility of the robot to pick up randomly positioned objects (“known methods and apparatus are generally not effective with randomly positioned and randomly oriented objects. This is typically the case with objects that are deposited onto a conveyor belt, such as flat components that are asymmetrical about at least one axis. For these parts, the system must locate them on the moving conveyor belt and also determine their orientation. This requires a relatively sophisticated vision system.” See at least Col. 1, lines 17-24).

Sager also does not explicitly teach
the 2D camera provides the camera images of the hand
However, Iqbal teaches
“Estimating a three-dimensional (3D) pose of an object from a 2D image can be implemented using deep learning. A hand pose is represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement in the 2D image and a third coordinate represents the depth of every point with respect to the camera. A monocular camera may be used to capture an image of the 3D pose, but does not capture depth information.” See at least col. 1, lines 44-51.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Boca and Sager to further include the teachings of Iqbal to estimate a 3D pose of a human hand with the use of a single 2D camera for the use of human-computer interaction (“Estimating a 3D pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is useful for human-computer interaction. Hand pose can be represented by a fixed set of points in 3D space, usually joints, called landmarks or keypoints. Estimating the 3D pose accurately is a difficult task due to the large amounts of appearance variation, self-occlusions, and complexity of articulated hand poses. 3D hand pose estimation escalates the difficulties even further because a depth of each of the hand keypoints also has to be estimated. Conventional techniques for determining locations of the landmarks of a hand in 3D space include one or more of multi-view camera systems, depth sensors, and color markers/gloves. Each of the conventional techniques requires a constrained environment and/or specialized equipment. Furthermore, environmental conditions such as sunlight, occlusions, and complexity of non-rigid hand poses present challenges to landmark detection and determination. There is a need for addressing these issues and/or other issues associated with the prior art.” See at least Col. 1, lines 21-40.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Sills (US 20200401232 A1) is pertinent because it discusses control of a robot to perform tasks with a workpiece by replicating motions of a human hand. The human hand is modeled with finger segment lengths.
Kim (US 20210142568 A1) is pertinent because it discusses solving a P-n-P problem with a human hand involving hand skeleton joint measurements.
Wöhlke (NPL: “Model-based Hand Pose Estimation for Generalized Hand Shape with Appearance Normalization”) is pertinent because it discusses hand pose estimation with 2D or 3D CNNs and fixed bone lengths.
Bai (NPL: “A Robust Method For Hands Gesture Recognition From Egocentric Depth Sensor) is pertinent because it discusses hand pose estimation with CNNs and pre-set bone lengths.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Karston G Evans whose telephone number is (571)272-8480. The examiner can normally be reached Mon-Fri 9:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abby Lin can be reached on (571)270-3976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/K.G.E./Examiner, Art Unit 3664                 
/ABBY Y LIN/Supervisory Patent Examiner, Art Unit 3664