DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim 1, 15, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Ge et al. (“Robust 3D Hand Pose Estimation in Single Depth Images: from Single-View CNN to Multi-View CNNs” CVPR, .
Regarding claim 1, Ge teaches  a method (Articulated hand pose estimation plays an important role in human-computer interaction. Despite the recent progress, the accuracy of existing methods is still not satisfactory, partially due to the difficulty of embedded high dimensional and non-linear regression problem. Different from the existing discriminative methods that regress for the hand pose with a single depth image, we propose to first project the query depth image onto three orthogonal planes and utilize these multi-view projections to regress for 2D heat-maps which estimate the joint positions on each plane See Abstract) 
comprising: obtaining a first plurality of images that include respective representations of a hand (See figure 1, projections, x-y; y-z; z-x are the plurality of images that include respective representations of a hand); 
training a first machine learning technique based on a first feature of the first plurality of images (Convolutional Network architecture for each view. The network contains convolutional layers and fully connected layers. In convolutional layers, there are three banks for multi-resolution inputs. The network generates 21 heat-maps with the size of 18x18 pixels. All of the three views have the same network architecture and the same architectural parameters. See figure 4); 
training a second machine learning technique based on a second feature of the first plurality of images separately from the first machine learning technique (Convolutional Network architecture for each view. The network contains convolutional layers and fully connected layers. In convolutional layers, there are three banks for multi-resolution inputs. The network generates 21 heat-maps with the size of 18x18 pixels. All of the three views have the same network architecture and the same architectural parameters. See figure 4), but is silent to 
and training the first and second machine learning techniques together with a graph convolutional neural network (CNN) based on the first plurality of images.
Edwards teaches utilizing a graph convolutional neural network to provide gradient calculations on the input data and spectral filters which allows for the deep learning of an irregular spatial domain problem. (One issue with CNNs is that the convolution of a filter across the spatial domain is non-trivial when considering domains in which there is no regular structure. One solution is to utilize the multiplication in the spectral graph domain to perform convolution in the spatial domain, obtaining the feature maps via graph signal processing techniques. The graph based CNN follows a similar architecture to standard CNNs; with randomly initialized spectral multiplier based convolution learnt in the spectral domain of the graph signal and graph coarsening based pooling layers, see Figure 1 for a pipeline. Training is compromised of a feed-forward pass through the network to obtain outputs, with loss propagated backwards through the network to update the randomly initialized weights. See section 2 methods, first paragraph)( We also provide gradient calculations on the input data and spectral filters, which allow for the deep learning of an irregular spatial domain problem. Signal filters take the form of spectral multipliers, applying convolution in the graph spectral domain. See abstract).
Ge and Edwards teach of using convolutional neural networks and Edwards teaches that by using a graph convolutional neural network the system can allow for deep learning of an irregular spatial domain, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the system of Ge with the Graph convolutional neural network of Edwards to allow the system to perform deep learning of an irregular spatial domain problem.


Regarding claim 15, Ge teaches A system (Articulated hand pose estimation plays an important role in human-computer interaction. Despite the recent progress, the accuracy of existing methods is still not satisfactory, partially due to the difficulty of embedded high dimensional and non-linear regression problem. Different from the existing discriminative methods that regress for the hand pose with a single depth image, we propose to first project the query depth image onto three orthogonal planes and utilize these multi-view projections to regress for 2D heat-maps which estimate the joint positions on each plane See Abstract) comprising: 
a processor configured to perform operations (Articulated hand pose estimation plays an important role in human-computer interaction. Despite the recent progress, the accuracy of existing methods is still not satisfactory, partially due to the difficulty of embedded high dimensional and non-linear regression problem. Different from the existing discriminative methods that regress for the hand pose with a single depth image, we propose to first project the query depth image onto three orthogonal planes and utilize these multi-view projections to regress for 2D heat-maps which estimate the joint positions on each plane See Abstract) comprising: 
obtaining a first plurality of images that include respective representations of a hand (See figure 1, projections, x-y; y-z; z-x are the plurality of images that include respective representations of a hand); 
training a first machine learning technique based on a first feature of the first plurality of images  (Convolutional Network architecture for each view. The network contains convolutional layers and fully connected layers. In convolutional layers, there are three banks for multi-resolution inputs. The network generates 21 heat-maps with the size of 18x18 pixels. All of the three views have the same network architecture and the same architectural parameters. See figure 4); 
training a second machine learning technique based on a second feature of the first plurality of images separately from the first machine learning technique (Convolutional Network architecture for each view. The network contains convolutional layers and fully connected layers. In convolutional layers, there are three banks for multi-resolution inputs. The network generates 21 heat-maps with the size of 18x18 pixels. All of the three views have the same network architecture and the same architectural parameters. See figure 4), but is silent to 
and 
training the first and second machine learning techniques together with a graph convolutional neural network (CNN) based on the first plurality of images.
Edwards teaches utilizing a graph convolutional neural network to provide gradient calculations on the input data and spectral filters which allows for the deep learning of an irregular spatial domain problem. (One issue with CNNs is that the convolution of a filter across the spatial domain is non-trivial when considering domains in which there is no regular structure. One solution is to utilize the multiplication in the spectral graph domain to perform convolution in the spatial domain, obtaining the feature maps via graph signal processing techniques. The graph based CNN follows a similar architecture to standard CNNs; with randomly initialized spectral multiplier based convolution learnt in the spectral domain of the graph signal and graph coarsening based pooling layers, see Figure 1 for a pipeline. Training is compromised of a feed-forward pass through the network to obtain outputs, with loss propagated backwards through the network to update the randomly initialized weights. See section 2 methods, first paragraph)( We also provide gradient calculations on the input data and spectral filters, which allow for the deep learning of an irregular spatial domain problem. Signal filters take the form of spectral multipliers, applying convolution in the graph spectral domain. See abstract).
Ge and Edwards teach of using convolutional neural networks and Edwards teaches that by using a graph convolutional neural network the system can allow for deep learning of an irregular spatial domain, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the system of Ge with the Graph convolutional neural network of Edwards to allow the system to perform deep learning of an irregular spatial domain problem.



Regarding claim 18, Ge teaches A non-transitory machine-readable storage medium that includes instructions that, when executed by one or more processors of a machine, cause the machine to perform operations (Articulated hand pose estimation plays an important role in human-computer interaction. Despite the recent progress, the accuracy of existing methods is still not satisfactory, partially due to the difficulty of embedded high dimensional and non-linear regression problem. Different from the existing discriminative methods that regress for the hand pose with a single depth image, we propose to first project the query depth image onto three orthogonal planes and utilize these multi-view projections to regress for 2D heat-maps which estimate the joint positions on each plane See Abstract) 
comprising: 
obtaining a first plurality of images that include respective representations of a hand (See figure 1, projections, x-y; y-z; z-x are the plurality of images that include respective representations of a hand);
training a first machine learning technique based on a first feature of the first plurality of images (Convolutional Network architecture for each view. The network contains convolutional layers and fully connected layers. In convolutional layers, there are three banks for multi-resolution inputs. The network generates 21 heat-maps with the size of 18x18 pixels. All of the three views have the same network architecture and the same architectural parameters. See figure 4); 
training a second machine learning technique based on a second feature of the first plurality of images separately from the first machine learning technique(Convolutional Network architecture for each view. The network contains convolutional layers and fully connected layers. In convolutional layers, there are three banks for multi-resolution inputs. The network generates 21 heat-maps with the size of 18x18 pixels. All of the three views have the same network architecture and the same architectural parameters. See figure 4), but is silent to 
 and training the first and second machine learning techniques together with a graph convolutional neural network (CNN) based on the first plurality of images.
Edwards teaches utilizing a graph convolutional neural network to provide gradient calculations on the input data and spectral filters which allows for the deep learning of an irregular spatial domain problem. (One issue with CNNs is that the convolution of a filter across the spatial domain is non-trivial when considering domains in which there is no regular structure. One solution is to utilize the multiplication in the spectral graph domain to perform convolution in the spatial domain, obtaining the feature maps via graph signal processing techniques. The graph based CNN follows a similar architecture to standard CNNs; with randomly initialized spectral multiplier based convolution learnt in the spectral domain of the graph signal and graph coarsening based pooling layers, see Figure 1 for a pipeline. Training is compromised of a feed-forward pass through the network to obtain outputs, with loss propagated backwards through the network to update the randomly initialized weights. See section 2 methods, first paragraph)( We also provide gradient calculations on the input data and spectral filters, which allow for the deep learning of an irregular spatial domain problem. Signal filters take the form of spectral multipliers, applying convolution in the graph spectral domain. See abstract).
Ge and Edwards teach of using convolutional neural networks and Edwards teaches that by using a graph convolutional neural network the system can allow for deep learning of an irregular spatial domain, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the system of Ge with the Graph convolutional neural network of Edwards to allow the system to perform deep learning of an irregular spatial domain problem.

7, 17, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ge et al. (“Robust 3D Hand Pose Estimation in Single Depth Images: from Single-View CNN to Multi-View CNNs” CVPR, 2016.)(Hereinafter referred to as Ge) in view of Edwards et al. (“Graph Based Convolutional Neural Network”, 2016.)(Hereinafter referred to as Edwards) in view of Newell et al. (“Stacked Hourgalss Networks for Human Pose Estimation”, 2016.)(Hereinafter referred to as Newell).

Regarding claim 7, Ge in view of Edwards teaches the method of claim 1, but is silent to wherein the first machine learning technique comprises a stacked hourglass network, and wherein the second machine learning technique comprises a residual network.
Newell teaches a stacked hourglass network in which different portions correspond to residual modules (Our network for pose estimation consists of multiple stacked hourglass modules
which allow for repeated bottom-up, top-down inference. See fig. 1, caption)( An illustration of a single “hourglass” module. Each box in the figure corresponds to a residual module as seen in Fig. 4. The number of features is consistent across the whole hourglass. See fig. 3, caption).
Ge in view of Edwards and Newell teach of machine learning and Newell teaches that by utilizing the hourglass design with integrated residual modules the system can determine a final pose estimate of the individual by having a coherent understanding of the full body (The design of the hourglass is motivated by the need to capture information at every scale. While local evidence is essential for identifying features like faces and hands, a final pose estimate requires a coherent understanding of the full body. See section 3.1 Hourglass design, first paragraph), therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the system of Ge in view of Edwards with the hourglass structure of Newell such that each of the machine learning techniques would be able to have a full understanding of the body to get a more accurate pose of the individual.

Regarding claim 17, Ge teaches The system of claim 15, but is silent to wherein the first machine learning technique comprises a stacked hourglass network, and wherein the second machine learning technique comprises a residual network.
Newell teaches a stacked hourglass network in which different portions correspond to residual modules (Our network for pose estimation consists of multiple stacked hourglass modules
which allow for repeated bottom-up, top-down inference. See fig. 1, caption)( An illustration of a single “hourglass” module. Each box in the figure corresponds to a residual module as seen in Fig. 4. The number of features is consistent across the whole hourglass. See fig. 3, caption).
Ge in view of Edwards and Newell teach of machine learning and Newell teaches that by utilizing the hourglass design with integrated residual modules the system can determine a final pose estimate of the individual by having a coherent understanding of the full body (The design of the hourglass is motivated by the need to capture information at every scale. While local evidence is essential for identifying features like faces and hands, a final pose estimate requires a coherent understanding of the full body. See section 3.1 Hourglass design, first paragraph), therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the system of Ge in view of Edwards with the hourglass structure of Newell such that each of the machine learning techniques would be able to have a full understanding of the body to get a more accurate pose of the individual.


Regarding claim 20, Ge teaches  The non-transitory machine-readable storage medium of claim 18, but is silent to wherein the first machine learning technique comprises a stacked hourglass network, and wherein the second machine learning technique comprises a residual network.
(Our network for pose estimation consists of multiple stacked hourglass modules
which allow for repeated bottom-up, top-down inference. See fig. 1, caption)( An illustration of a single “hourglass” module. Each box in the figure corresponds to a residual module as seen in Fig. 4. The number of features is consistent across the whole hourglass. See fig. 3, caption).
Ge in view of Edwards and Newell teach of machine learning and Newell teaches that by utilizing the hourglass design with integrated residual modules the system can determine a final pose estimate of the individual by having a coherent understanding of the full body (The design of the hourglass is motivated by the need to capture information at every scale. While local evidence is essential for identifying features like faces and hands, a final pose estimate requires a coherent understanding of the full body. See section 3.1 Hourglass design, first paragraph), therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the system of Ge in view of Edwards with the hourglass structure of Newell such that each of the machine learning techniques would be able to have a full understanding of the body to get a more accurate pose of the individual.

Allowable Subject Matter
Claim 2-6, 8-14, 16 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter:  The prior art of record alone or in combination is silent to the limitations “and generating a pseudo-ground truth mesh of each of the real-world depictions of the hand using the graph CNN that has been trained.” of .
Claim 3 contains allowable subject matter because it depend on a claim containing allowable subject matter.

The prior art of record alone or in combination is silent to the limitations “modeling a pose of the hand depicted in the monocular image by adjusting skeletal joint positions of a three-dimensional (3D) hand mesh using the graph CNN, the graph CNN estimating 3D coordinates of vertices in the 3D hand mesh. ” of claim 4 when read in light of the rest of the limitations in claim 4 and the claims to which claim 4 depends and thus claim 4 contains allowable subject matter.


The prior art of record alone or in combination is silent to the limitations “linearly regressing the joint positions using a linear graph CNN; and generating, for display, the 3D hand mesh adjusted to model the pose of the hand depicted in the monocular image. ” of claim 5 when read in light of the rest of the limitations in claim 5 and the claims to which claim 5 depends and thus claim 5 contains allowable subject matter.

The prior art of record alone or in combination is silent to the limitations “and encoding the 2D heat map and the image feature map using the second machine learning technique to generate a feature vector. ” of claim 6 when read in light of the rest of the limitations in claim 6 and the claims to which claim 6 depends and thus claim 6 contains allowable subject matter.



The prior art of record alone or in combination is silent to the limitations “ a shape of the hand in the monocular image by adjusting blend shape values of a 3D hand mesh representing surface features of the hand depicted in the monocular image using the graph CNN. ” of claim 8 when read in light of the rest of the limitations in claim 8 and the claims to which claim 8 depends and thus claim 8 contains allowable subject matter.


The prior art of record alone or in combination is silent to the limitations “ further comprising generating an image of the first plurality of images by: generating a 3D hand model by combining a plurality of hand joints with a plurality of surface textures; and combining the generated hand model with a background image. ” of claim 9 when read in light of the rest of the limitations in claim 9 and the claims to which claim 9 depends and thus claim 9 contains allowable subject matter.
 	Claims 10 and 11 contain allowable subject matter because they depend on a claim containing allowable subject matter.

The prior art of record alone or in combination is silent to the limitations “ further comprising training the first machine learning technique based on a heat map loss function and training the second machine learning technique based on a 3D pose loss function, and wherein training the first and second machine learning techniques together with the graph CNN comprises training the first and second machine learning techniques together based on the heat map loss function, the 3D pose loss function, and a mesh loss function. ” of claim 12 when read in light of the rest of the limitations in claim 12 and the claims to which claim 12 depends and thus claim 12 contains allowable subject matter.


The prior art of record alone or in combination is silent to the limitations “further comprising: receiving a second plurality of images that include real-world depictions of a hand and reference 3D depth maps of the real-world depictions of the hand captured using a depth camera; generating a pseudo-ground truth mesh of the real-world depictions of the hand using the graph CNN; and training the first and second machine learning techniques and the graph CNN based on the generated pseudo-ground truth mesh, the real-world depictions of the hand, and the reference 3D depth maps of the real-world depictions of the hand. ” of claim 13 when read in light of the rest of the limitations in claim 13 and the claims to which claim 13 depends and thus claim 13 contains allowable subject matter.


The prior art of record alone or in combination is silent to the limitations “continuously changing an appearance of a 3D hand mesh by continuously capturing new monocular images of the hand in different positions, wherein the appearance of the 3D hand mesh changes to resemble the different positions of the hand as the hand changes from one position to another position. ” of claim 14 when read in light of the rest of the limitations in claim 14 and the claims to which claim 14 depends and thus claim 14 contains allowable subject matter.
 

The prior art of record alone or in combination is silent to the limitations “generating a pseudo-ground truth mesh of each of the real-world depictions of the hand using the graph CNN that has been trained. ” of claim 16 when read in light of the rest of the limitations in claim 16 and the claims to which claim 16 depends and thus claim 16 contains allowable subject matter.


The prior art of record alone or in combination is silent to the limitations “wherein the operations further comprise: obtaining a second plurality of images that include real-world depictions of a hand and reference three-dimensional (3D) depth maps; and generating a pseudo-ground truth mesh of each of the real-world depictions of the hand using the graph CNN that has been trained. ” of claim 19 when read in light of the rest of the limitations in claim 19 and the claims to which claim 19depends and thus claim 19 contains allowable subject matter.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS R WILSON whose telephone number is (571)272-0936. The examiner can normally be reached M-F 7:30-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on (572)-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like 





/NICHOLAS R WILSON/Primary Examiner, Art Unit 2611