DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
Claims 1-19 are interpreted under 35 USC 112(f) because they recite generic place holder (modules)  coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  

Claim 22 is  interpreted under 35 USC 112(f) because they recite means coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
The support for means and the modules are available in specification: “ [0044] FIG. 1 is a functional block diagram of an example implementation of a computing device 100.  The computing device 100 may be, for example, a smartphone, a tablet device, a laptop computer, a desktop computer, or another suitable type of computing device.”
.Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4, 11, 14, and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Xiang et al. ("Monocular total capture: Posing face, body, and hands in the wild." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.) in view of Dariush et al. (US patent Publication: 2009/0175540, “Dariush”).
Regarding claim 1, Xiang teaches, a method for generating whole body poses, comprising:
a body regression module (section 7.2.1 algorithm) configured to generate a first pose of a body of an animal in an input image by regressing from a stored body anchor pose; ( Section 7.2.1 performs generation of body pose using regression of OpenPose network which is the stored anchor body pose.)
generate a second pose of a face of the animal in the input image (Section 3 Method Overview:  Our method takes as input a sequence of images capturing the motion of a single person from a monocular RGB camera, and outputs the 3D total body motion (including the motion from body, face, hands, and feet) of the target person in the form of a deformable 3D human model [30, 26
an extremity regression module (Section 7.2.2 algorithm) configured to generate a third pose of an extremity of the animal in the input image by regressing from a stored extremity anchor pose; ( Section 7.2.2 3D hand pose estimation is done using a regression of STB dataset which is stored anchor pose.)  and 
a pose module (Section 5.2 algorithm) configured to generate a whole body pose of the animal in the input image based on the first pose, the second pose, and the third pose. (Section 5.2: “Once we fit the body and hand parts of the deformable model to the CNN outputs, the projection of the model on the image is already well aligned to the target person. Then we can reconstruct other body parts by simply adding more 2D joint constraints using additional 2D keypoint measurements. In particular, we include 2D face and foot keypoints from the OpenPose detector.”…..”)
While Xiang teaches generating a pose of face  and uses the face pose in generating whole body pose as above, Xing doesn’t expressly teach the second pose (face pose) is generated a face regression model  by regressing from a stored face anchor pose;
However as Xiang generates body pose and hand extremity pose by using regression of anchor poses, it would have been obvious for an ordinary skilled person in the art before the effective filing date of the claimed invention to have modified Xiang to have included a face regression module configured to generate a second pose of a face of the animal in the input image by regressing from a stored face anchor pose  for the purpose of using standard technology to create face pose as regression method is already used by Xiang for body and hand extremity.
Xiang teaches a method for whole body pose generation as above but doesn’t expressly teach a device for generating whole body pose. 
However Dariush teaches a device for generating whole body pose (“[0035] The pose estimation system 100, or any of its components described above, may be configured as software (e.g., modules that comprise instructions executable by a processor), hardware (e.g., an application specific integrated circuit), or a combination thereof.”)
Therefore it would have been obvious for an ordinary skilled person in the art before the effective filing date of the claimed invention to have modified Xiang to have 


Regarding claim 2, Xiang as modified by Dariush teaches, wherein the pose module is configured to generate the whole body pose by: 
connecting a first keypoint of the first pose of the body with a second keypoint of the second pose of the face; and connecting a third keypoint of the first pose of the body with a fourth keypoint of the third pose of the extremity. (Figure 4: Human model fitting on estimated POFs and joint confidence maps. We extract 2D joint locations from joint confidence maps (left) and then body part orientation from POFs (middle). Then we optimize a cost function (Eq. 3) that minimizes the distance between Π(J˜B m) and j B m and angle between P˜ B (m,n) and Pˆ B (m,n) “ The parts are connected with joints or key points).

Regarding claim 3, Xiang as modified by Dariush teaches, wherein the whole body pose is two dimensional.  ( Xiang, Section 2: “Single Image 2D Human Pose Estimation: Over the last few years, great progress has been made in detecting 2D human body keypoints from a single image [64, 63, 11, 68, 38, 15] by leveraging large-scale manually annotated datasets [28, 5] with deep Convolutional Neural Network (CNN) framework.”)

 Xiang as modified by Dariush teaches, wherein the whole body pose is three dimensional. (Xiang, Abstract:  “We present the first method to capture the 3D total motion of a target person from a monocular view input.”)
Regarding claim 11, Xiang as modified by Dariush teaches, wherein the body pose includes a pose of a torso of a human, a leg of the human, and an arm of the human. (Xiang, 4. Predicting 3D Part Orientation Fields: “The 3D Part Orientation Field (POF) encodes the 3D orientation of a body part of an articulated structure (e.g., limbs, torso, and fingers) in 2D image space.”)

Regarding claim 14, Xiang as modified by Dariush teaches,  a camera configured to capture the input image. (Xiang,  Section 3 Method overview: “Our method takes as input a sequence of images capturing the motion of a single person from a monocular RGB camera”).

Claim 20 is directed to a method and its steps are similar in scope and function of the elements of the device claim1 and therefore claim 20 is rejected with same rationales as specified in the rejection of claim 1.

Regarding claim 21, Xiang as modified by Dariush teaches, wherein: 
the animal is a human; the first pose is a pose of a body of the human; the second pose is a pose of a face of the human; and  the third pose is a pose of a hand of the human. (Page 10966 right column second paragraph:  “We introduce an optimization framework to fit a deformable human model on 3D POFs and 2D keypoint 

Claim 22 is directed to a system whose elements are similar in scope and function of the elements of the device claim1 (a means for generating a first pose,( Xiang, section 7.2.1 algorithm),   a means for generating a second pose of a face (Xiang,  obvious algorithm for the teaching of first pose and third pose using regression),  a means for generating a third pose (Xiang, Section 7.2.2 algorithm)  and a means for generating a whole body pose ( Xiang, Section 5.2)  there therefore claim 22 is rejected with same rationales as specified in the rejection of claim 1.


Claim 5, 8, 10 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Xiang as modified by Dariush and further in view of Rogez et al., "Lcr-net: Localization-classification-regression for human pose." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, “Rogez”).

Regarding claim 5, Xiang as modified by Dariush  doesn’t expressly teach, an image classification module configured to receive the input image and to generate classifications for boxes of pixels in the input image; and a regional proposal network 
However, Rogez teaches, an image classification module configured to receive the input image and to generate classifications for boxes of pixels in the input image; (See Fig.2 and “3.2. Classification The classification component aims at predicting the closest anchor-pose, i.e., the correct label, for each bounding box B. In other words, each bounding box is assigned a probability for each anchor-pose (and the background class)”) 
 a regional proposal network (RPN) module configured to generate the boxes based on an input from the image classification module. (Section 3.1 and Fig.2 : “3.1. Localization: pose proposals network The Pose Proposal Network outputs a set of pose proposals, i.e., candidate localized poses. To this end, we hypothesize a set of anchor-poses into a set of bounding boxes, that will be scored and refined by the classification and regression branches respectively. The set of bounding boxes is obtained using a Region Proposal Network (RPN) [22], see Figure 2.”)
Rogez and Xiang as modified by Dariush are analogous as they are from the field of pose generation.
Therefore it would have been obvious for an ordinary skilled person in the art before the effective filing date of the claimed invention to have modified Xiang as modified by Dariush to have included an image classification module configured to receive the input image and to generate classifications for boxes of pixels in the input image; and a regional proposal network (RPN) module configured to generate the boxes based on an input from the image classification module as taught by Rogez for 


Regarding claim 8, Xiang as modified by Dariush and Rogez teaches, wherein the classifications are selected from a group consisting of a body classification, a face classification, and a hand classification. (Rogez teaches group classification and Xiang provides a set of stored image of body, face and hand images.)


Regarding claim 10, Xiang as modified by Dariush and Rogez teaches, wherein the RPN module is configured to generate the boxes using a region of interest (ROI) alignment algorithm. (Rogez, Section 3: LCR-net: “…..Figure 2 shows an overview of our LCR-Net architecture. Given an image, we first compute convolutional features. The Localization component, also called Pose Proposals Network in the context of pose detection, outputs a list of pose proposals. Pose proposals consist of a set of candidate locations where the anchor-poses are hypothesized. Next, a Region-of-Interest (RoI) pooling layer aggregates the features inside each candidate region.”)



Regarding claim 15, Xiang as modified by Dariush doesn’t expressly teach, 

wherein the body regression module is configured to select the stored body anchor pose from the plurality of stored body anchor poses based on the body scores;
a face classification module configured to determine face scores based on comparisons of the face of the human in the input image with a plurality of stored face anchor poses,
wherein the face regression module is configured to select the stored face anchor pose from the plurality of stored face anchor poses based on the face scores; and
a hand extremity classification module configured to determine hand extremity scores based on comparisons of a hand of the human in the input image with a plurality of stored hand anchor poses, 
wherein the extremity regression module is a hand extremity regression module configured to, based on the hand scores, select the stored hand anchor pose from the plurality of stored extremity anchor poses that are hand anchor poses.
However,  Rogez teaches, a body classification module configured to determine body scores based on comparisons of the body of a human in the input image with a plurality of stored body anchor poses, (Figure 2. Overview of our LCR-Net architecture (poses only shown in 2D for better readability). We first extract candidate regions using a RPN network and obtain pose proposals by placing a fixed set of anchor-poses into 
wherein the body regression module is configured to select the stored body anchor pose from the plurality of stored body anchor poses based on the body scores; (Figure 2. Overview of our LCR-Net architecture (poses only shown in 2D for better readability). We first extract candidate regions using a RPN network and obtain pose proposals by placing a fixed set of anchor-poses into these boxes (top). These pose proposals are then scored by a classification branch and regressed using a regressor, learned independently for each anchor-pose.”)
a face classification module configured to determine face scores based on comparisons of the face of the human in the input image with a plurality of stored face anchor poses, (Figure 2. Overview of our LCR-Net architecture (poses only shown in 2D for better readability). We first extract candidate regions using a RPN network and obtain pose proposals by placing a fixed set of anchor-poses into these boxes (top). These pose proposals are then scored by a classification branch and regressed using a regressor, learned independently for each anchor-pose.”)
wherein the face regression module is configured to select the stored face anchor pose from the plurality of stored face anchor poses based on the face scores; (Figure 2. Overview of our LCR-Net architecture (poses only shown in 2D for better readability). We first extract candidate regions using a RPN network and obtain pose proposals by placing a fixed set of anchor-poses into these boxes (top). These pose proposals are then scored by a classification branch and regressed using a regressor, learned independently for each anchor-pose.”) and

wherein the extremity regression module is a hand extremity regression module configured to, based on the hand scores, select the stored hand anchor pose from the plurality of stored extremity anchor poses that are hand anchor poses. (Figure 2. Overview of our LCR-Net architecture (poses only shown in 2D for better readability). We first extract candidate regions using a RPN network and obtain pose proposals by placing a fixed set of anchor-poses into these boxes (top). These pose proposals are then scored by a classification branch and regressed using a regressor, learned independently for each anchor-pose.”)
Xiang as modified by Dariush and Rogez are analogous as they are from the field of pose generation.
Therefore it would have been obvious for an ordinary skilled person in the art before the effective filing date of the claimed invention to have modified Xiang as modified by Dariush to have included a body classification module configured to determine body scores based on comparisons of the body of a human in the input image with a plurality of stored body anchor poses, wherein the body regression module is configured to select the stored body anchor pose from the plurality of stored body Rogez the purpose of getting a better anchor for each type of the part of the body for proving better regression and thereby get a better pose of each part.

Regarding claim 16,  Xiang as modified by Dariush and Rogez  teaches, wherein: the body regression module is configured to select the stored body anchor pose from the plurality of stored body anchor poses based on the body score of the stored body anchor pose being higher than the body scores of all of the other ones of the stored body anchor poses;  ( Rogez, Fig.2 …” These pose proposals are then scored by a classification branch and regressed using a regressor, learned independently for each anchor-pose”  Section 3.5 right column: “Let P = {(p, P)} be the set of pose proposals in a group, each one with a classification score s(p, P). We first pick the proposal with the highest score, i.e., (p ∗ , P∗ ) = argmax(p,P )∈P s(p, P) )
Rogez, Fig.2 …” These pose proposals are then scored by a classification branch and regressed using a regressor, learned independently for each anchor-pose”  Section 3.5 right column: “Let P = {(p, P)} be the set of pose proposals in a group, each one with a classification score s(p, P). We first pick the proposal with the highest score, i.e., (p ∗ , P∗ ) = argmax(p,P )∈P s(p, P) )
and 
the hand extremity regression module is configured to select the stored hand anchor pose from the plurality of hand anchor poses based on the hand score of the stored hand anchor pose being higher than the hand scores of all of the other ones of the stored hand anchor poses. ( Rogez, Fig.2 …” These pose proposals are then scored by a classification branch and regressed using a regressor, learned independently for each anchor-pose”  Section 3.5 right column: “Let P = {(p, P)} be the set of pose proposals in a group, each one with a classification score s(p, P). We first pick the proposal with the highest score, i.e., (p ∗ , P∗ ) = argmax(p,P )∈P s(p, P) )


Claim 19 is rejected under 35 USC 103  as being unpatentable over Xiang in view of Dariush and Rogez.

.


Claims 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Xiang as modified by Dariush and further in view of Tarlton et al. ( US patent Publication:  20080104512 “Tarlton”).

Regarding claim 12, Xiang as modified by Dariush doesn’t expressly teach,  an animation module configured to generate an image including an animated avatar based on the whole body pose of a human in the input image.
However, Tarlton teaches, an animation module configured to generate an image including an animated avatar based on the whole body pose of a human in the input image. (“[0013]………….This interpretation 110 is used to drive an avatar display device 112 to display an avatar 114 that exhibits an expression corresponding to the interpretation of the status.  The avatar may be an animated graphical representation of a face that is capable of expressing a range of facial expressions or a complete or partial body of a person that is capable of adopting various body poses to convey expressions using body language.”)
Xiang as modified by Dariush and Tarlton are analogous as they are from the field of human pose.
Therefore it would have been obvious for an ordinary skilled person in the art before the effective filing date of the claimed invention to have modified Xiang  as modified by Darius to have included an animation module configured to generate an image including an animated avatar based on the whole body pose of a human in the input image as taught by Tarlton for the purpose of expressing body language in a display.

Regarding claim 13, Xiang as modified by Dariush and Tarlton teaches, comprising a display control module configured to display the image including the animated avatar on a display. (Tarlton [0013]……“This interpretation 110 is used to drive an avatar display device 112 to display an avatar 114 that exhibits an expression corresponding to the interpretation of the status.  The avatar may be an animated graphical representation of a face that is capable of expressing a range of facial expressions or a complete or partial body of a person that is capable of adopting various body poses to convey expressions using body language.”)



Allowable Subject Matter
Claims 17-18 are allowed.
The following is an examiner’s statement of reasons for allowance: 

Claim 17 is allowed because the combination of the best available prior art teaches the limitation of claim 15 but fails to teach additional elements of claim17 
a training module configured to: 

train a face expert including the face classification and regression modules based on a second dataset including images including faces of humans; and 
train a hand extremity expert including the hand extremity classification and regression modules based on a third dataset including images including hands of humans.”

Claim 18 is allowable by virtue of dependency.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Claims 6-7 and 9 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim 6 is objected to be allowable because the combination of the best available prior art fails to expressly teach the limitation as a whole, “wherein the image classification module includes a ResNet-50 model.”



Claim 9 is objected to be allowable because the combination of the best available prior art fails to expressly teach the limitation as a whole, the body regression module is configured to generate the first pose of the body of a human based on first ones of the boxes having the body classification; the face regression module is configured to generate the second pose of the face of the human based on second ones of the boxes having the body classification; and the extremity regression module is configured to generate the third pose of a hand of the human based on third ones of the boxes having the hand classification.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Tapas Mazumder whose telephone number is (571)270-7466. The examiner can normally be reached M-F 8:00 AM-5:00 PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TAPAS MAZUMDER/           Primary Examiner, Art Unit 2616