DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 12 and 17-20 are directed to a “computer-readable storage medium”.  However, the claims are not limited to nontransitory embodiments, and the specification does not provide a definition limiting the meaning of this term to only nontransitory embodiments.  The claim therefore can be reasonably interpreted as encompassing transitory signal embodiments, which are nonstatutory (In re Nuijten, 500 F.3d 1346, 84 USPQ2d 1495 (Fed. Cir. 2007)).  If the specification includes written description support, this rejection can be overcome by including the term “nontransitory” in the claim (see USPTO Official Gazette notice 1351 OG 212.).

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-6 and 10-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wu et al. (CN111696140A; Google Patents machine translation and original publication as filed by Applicant as a Foreign Reference will be relied on in the following art rejection), hereinafter “Wu”.
Regarding claim 1, Wu teaches:
A hand tracking method (See the Abstract.), comprising: 
acquiring frame image information of videos of multiple tracking cameras respectively, and determining a current frame image corresponding to each tracking camera respectively according to the frame image information (See pages 4-5: “if a plurality of head tracking cameras are used for position tracking, each frame of the transmitted hand detection model and the skeleton point identification model is image data,”. Also see page 6: “even if two monocular cameras are installed”.); 
executing at least one of a detection module, a tracking module and a skeleton point identification module according to the current frame image, tracking information of a last frame image of the current frame image and a preset module execution sequence (See page 3: “Preferably, in the process of starting the hand detection model and the tracking module according to the detected number of hands in the last frame of image, if the number of the detection is 0 or 1, starting the hand detection model and the tracking module; if the number of the detection is 2, only the tracking module is started.”), to acquire tracking information of a hand location corresponding to the to-be-detected frame image and two-dimensional coordinates of a preset quantity of skeleton points corresponding to the to-be-detected frame image (See page 3: “Preferably, in the process of identifying the bone point of the region of interest of the current frame in Trackhand, the region of interest comprises position coordinates of the hand in the image and a region size corresponding to the hand; the number of the bone points is 21.”); 
determining three-dimensional coordinates of the preset quantity of skeleton points according to the two-dimensional coordinates and pre-acquired tracking data of a head location corresponding to the hand location (See page 3: “Preferably, in the process of determining the three-dimensional bone coordinates of the bone points after the smoothing filtering processing by combining the dataof the Trackhead, reading the data of the Trackhead middle header, and acquiring a transfer matrix T and a rotation matrix R of the current frame relative to the previous frame header; and determining the three-dimensional coordinates of the skeleton points according to preset calibration parameters of the tracking camera, the transfer matrix Tand the rotation matrix R.”); 
carrying out smoothing filter processing on the three-dimensional coordinates of the skeleton points and historical three-dimensional coordinates of a same hand location of the last frame image, to acquire stable skeleton points of a processed hand location (See page 4: “s130: acquiring an interested area on the current frame image from the Trackhand, carrying out bone point identification on the interested area of the current frame in the Trackhand through the bone point identification model, and carrying out smooth filtering processing on the identified bone point according to historical data in the Trackhand;”); and 
fusing, rendering and displaying the stable skeleton points and the tracking data of the head location successively, to complete tracking and display of the hand location (See page 4: “and transmitting the three-dimensional gesture information to a game engine, rendering the three-dimensional gesture information, and then transmitting the three-dimensional gesture information back to the VR virtual head in real time for display processing to finish gesture tracking.”).

Regarding claim 2, Wu teaches:
The hand tracking method according to claim 1, wherein executing at least one of the detection module, the tracking module and the skeleton point identification module according to the current frame image and the detection result of the last frame image of the current frame image comprises: acquiring a quantity of hands detected in the last frame image according to the tracking information of the last frame image; wherein, in a case where the quantity of the hands is less than 2, executing each of the detection module, the tracking module and the skeleton point identification module; and otherwise, in a case where the quantity of the hands is 2, executing each of the tracking module and the skeleton point identification module See page 3: “Preferably, in the process of starting the hand detection model and the tracking module according to the detected number of hands in the last frame of image, if the number of the detection is 0 or 1, starting the hand detection model and the tracking module; if the number of the detection is 2, only the tracking module is started.”.

Regarding claim 3, Wu teaches:
The hand tracking method according to claim 2, wherein the detection module is configured to detect and position a hand in the current frame image via a pre-trained hand detection model, acquire a hand location and an Region Of Interest (ROI) corresponding to the hand location, and send the hand location and the ROI corresponding to the hand location to the tracking module and the skeleton point identification module (See page 4: “s110: training a hand detection model and a skeleton point identification model to enable the hand detection model to automatically lock a hand area of an image as an interesting area and enable the skeleton point identification model to automatically identify skeleton points in the interesting area;”.); the tracking module is configured to track a predicted ROI of a next frame image of the current frame image according to the ROI corresponding to the hand location and an optical flow tracking algorithm, and store tracking information corresponding to the prediction ROI to a hand tracking queue so as to update tracking information of the hand location (See page 5: “meanwhile, the region of interest of the next frame is estimated according to the region of interest of the current frame based on an optical flow tracking algorithm, so as to provide a reference for performing skeletal point identification on the next frame.”); the skeleton point identification module is configured to acquire the ROI corresponding to the hand location from the hand tracking queue, and carrying out identification of the preset quantity of skeleton points on the acquired ROI via a pre-trained skeleton point identification model (See page 5: “As shown in fig. 1, in the single-objective three-dimensional gesture tracking method provided by the present invention, in step S130, an area of interest of a hand on a current frame image is obtained from Trackhand, skeleton point recognition of the hand is performed on the area of interest of image data through a skeleton point recognition model, and then smooth filtering processing is performed on each skeleton point by comparing with historical data of each skeleton point, so that the possibility that recognition of a certain skeleton point in a certain frame is not stable is avoided, and the hand skeleton point recognition accuracy and stability are improved;”).

Regarding claim 4, Wu teaches:
The hand tracking method according to claim 1, wherein the tracking data of the head location comprise location data of the head location and tracking data of pose data; wherein the location data and the pose data are determined by video data collected by a tracking camera arranged at a head and a pose estimation algorithm of the head (See page 4: “s140: and counting data of the head part in each frame image about the position and the posture, storing the data of the head part into a queue Trackhead of the tracking module in real time, and determining three-dimensional bone coordinates of the bone points subjected to smoothing filtering processing by combining the data of the head part in the Trackhead so as to finish gesture tracking.”).

Regarding claim 5, Wu teaches:
The hand tracking method according to claim 1, wherein determining the three-dimensional coordinates of the preset quantity of the skeleton points comprises: determining any skeleton point in the skeleton points of the current frame as a target skeleton point, and acquiring three-dimensional coordinates of the target skeleton point; determining three-dimensional coordinates of all skeleton points according to the three-dimensional coordinates of the target skeleton point; wherein the three-dimensional coordinates of the target skeleton point are determined by using a following formula: P2 = R * P1 + T
wherein P2 represents the three-dimensional coordinates of the target skeleton point, P1 represents historical three-dimensional coordinates of the target skeleton point of the last frame image, R represents a rotation matrix of the head location of the current frame in the tracking data of the head location relative to the head location of the last frame image, and T represents a transfer matrix of the head location of the current frame in the tracking data of the head location relative to the head location of the last frame image (See the bottom half of page 3.).

Regarding claim 6, Wu teaches:
The hand tracking method according to claim 5, wherein the three-dimensional coordinates of the target skeleton point are calculated by using a following formula:

    PNG
    media_image1.png
    663
    632
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    193
    560
    media_image2.png
    Greyscale

represents acquired calibration parameters of the tracking cameras of the videos; wherein fx and fy represent pixel focal lengths, cx and cy represent coordinate locations of optical axes of the tracking cameras in the current frame image; R represents a rotation matrix of the head location of the current frame in the tracking data of the head location relative to the head location of the last frame image; and T represents a transfer matrix of the head location of the current frame in the tracking data of the head location relative to the head location of the last frame image (See the original publication cited by Applicant, pages 8-9.).

Regarding claim 10, Wu teaches:
An electronic device, the electronic device comprising: a memory, a processor and an image pick-up device, the memory comprising a hand tracking program implementing the steps of the hand tracking method according to claim 1 when executed by the processor (See page 3: “acquiring hand image data of at least 100 users as action behavior cases by adopting a head tracking camera;”. The presence of a head tracking camera and the algorithm described in the publication implies the presence of an electronic device with a memory and processor.).

Wu teaches claim 11 for the reasons given in the treatment of claim 1. Wu further teaches:
A hand tracking system, comprising a memory storing instructions and a processor in communication with the memory, wherein the processor is configured to execute the instructions to (See page 3: “acquiring hand image data of at least 100 users as action behavior cases by adopting a head tracking camera;”. The presence of a head tracking camera and the algorithm described in the publication implies the presence of an electronic device with a memory and processor.): 

Regarding claim 12, Wu teaches:
A computer-readable storage medium stored with a computer program thereon, the computer program implementing the method of claim 1 when executed by the processor (See page 3: “acquiring hand image data of at least 100 users as action behavior cases by adopting a head tracking camera;”. The presence of a head tracking camera and the algorithm described in the publication implies the presence of an electronic device with a memory and processor.). 

Regarding claim 13, Wu teaches:
An electronic device, the electronic device comprising: a memory, a processor and an image pick-up device, the memory comprising a hand tracking program implementing the steps of the hand tracking method according to claim 2 when executed by the processor (See page 3: “acquiring hand image data of at least 100 users as action behavior cases by adopting a head tracking camera;”. The presence of a head tracking camera and the algorithm described in the publication implies the presence of an electronic device with a memory and processor.).

Regarding claim 14, Wu teaches:
An electronic device, the electronic device comprising: a memory, a processor and an image pick-up device, the memory comprising a hand tracking program implementing the steps of the hand tracking method according to claim 3 when executed by the processor (See page 3: “acquiring hand image data of at least 100 users as action behavior cases by adopting a head tracking camera;”. The presence of a head tracking camera and the algorithm described in the publication implies the presence of an electronic device with a memory and processor.).

Regarding claim 15, Wu teaches:
An electronic device, the electronic device comprising: a memory, a processor and an image pick-up device, the memory comprising a hand tracking program implementing the steps of the hand tracking method according to claim 4 when executed by the processor (See page 3: “acquiring hand image data of at least 100 users as action behavior cases by adopting a head tracking camera;”. The presence of a head tracking camera and the algorithm described in the publication implies the presence of an electronic device with a memory and processor.).

Regarding claim 16, Wu teaches:
An electronic device, the electronic device comprising: a memory, a processor and an image pick-up device, the memory comprising a hand tracking program implementing the steps of the hand tracking method according to claim 5 when executed by the processor (See page 3: “acquiring hand image data of at least 100 users as action behavior cases by adopting a head tracking camera;”. The presence of a head tracking camera and the algorithm described in the publication implies the presence of an electronic device with a memory and processor.).

Regarding claim 17, Wu teaches:
A computer-readable storage medium stored with a computer program thereon, the computer program implementing the method of claim 2 when executed by the processor (See page 3: “acquiring hand image data of at least 100 users as action behavior cases by adopting a head tracking camera;”. The presence of a head tracking camera and the algorithm described in the publication implies the presence of an electronic device with a memory and processor.).

Regarding claim 18, Wu teaches:
A computer-readable storage medium stored with a computer program thereon, the computer program implementing the method of claim 3 when executed by the processor (See page 3: “acquiring hand image data of at least 100 users as action behavior cases by adopting a head tracking camera;”. The presence of a head tracking camera and the algorithm described in the publication implies the presence of an electronic device with a memory and processor.).

Regarding claim 19, Wu teaches:
A computer-readable storage medium stored with a computer program thereon, the computer program implementing the method of claim 4 when executed by the processor (See page 3: “acquiring hand image data of at least 100 users as action behavior cases by adopting a head tracking camera;”. The presence of a head tracking camera and the algorithm described in the publication implies the presence of an electronic device with a memory and processor.).

Regarding claim 20, Wu teaches:
A computer-readable storage medium stored with a computer program thereon, the computer program implementing the method of claim 5 when executed by the processor (See page 3: “acquiring hand image data of at least 100 users as action behavior cases by adopting a head tracking camera;”. The presence of a head tracking camera and the algorithm described in the publication implies the presence of an electronic device with a memory and processor.).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 7-9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu (CN111696140A) in view of Raza et al. (An Integrative Approach to Robust Hand Detection Using CPM-YOLOv3 and RGBD Camera in Real Time, 2019, IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking, Pages 1131-1138), hereinafter “Raza”.
Claim 7 is met by the combination of Wu and Raza, wherein
Wu teaches:
The hand tracking method according to claim 3, wherein a pre-training process of the hand detection model comprises: 
Wu does not appear to disclose the following; however, Raza teaches:
annotating a target region in acquired training image data, and acquiring annotated location information; wherein the target region is a hand region; carrying out parameter training on the annotated location information by using a yolo model until the yolo model converges within a corresponding preset range to complete training of the hand detection model (See page 1136: “The HHDNet model is trained using the Oxford hand dataset [31], this data is very comprehensive, because it is collected from various public sources, and there is no restriction imposed on the position, orientation and visibility. Furthermore, it is saved under different environments and illuminations, which is crucial to train a robust classifier. People hands in each image are annotated properly according to the YOLO bounding box rectangle scale, which is normalized from 0 to 1.”).
Motivation to combine:
Wu and Raza together teach the limitations of claim 7. Raza is directed to a similar field of art (hand tracking using neural networks). Therefore, Wu and Raza are combinable. Modifying the system and method of Wu by adding the capability of “annotating a target region in acquired training image data, and acquiring annotated location information; wherein the target region is a hand region; carrying out parameter training on the annotated location information by using a yolo model until the yolo model converges within a corresponding preset range to complete training of the hand detection model”, as taught by Raza, would yield the expected and predictable result of an improved training method. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine Wu and Raza in this way.

Claim 8 is met by the combination of Wu and Raza, wherein
The combination of Wu and Raza teaches:
The hand tracking method according to claim 7, wherein 
And Raza further teaches:
a pre-training process of the skeleton point identification model comprises: training a basic neural network model based on the training image data until the neural network model converges within a corresponding preset range to complete training of the skeleton point identification model; wherein the basic neutral network model comprises: a yolo model, a CNN model, an SSD model or an FPN model (See section B. on page 1135.).
Motivation to combine:
See the motivation to combine in the treatment of claim 1.

Claim 9 is met by the combination of Wu and Raza, wherein
The hand tracking method according to claim 7, wherein 
And Wu further teaches:
the training image data are acquired by multiple tracking fisheye cameras on a head-mounted integrated device (See page 2: “the FOV required by the head-wearing is generally about 110 degrees”. Also see pages 4-5: “if a plurality of head tracking cameras are used for position tracking, each frame of the transmitted hand detection model and the skeleton point identification model is image data,”. Further see page 6: “even if two monocular cameras are installed”.).


Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN S LEE whose telephone number is (571)272-1981. The examiner can normally be reached 11 AM - 7 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on 571-272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Jonathan S Lee/Primary Examiner, Art Unit 2661