Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3-4, 6-10, 12, 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Pumarola et al ("Unsupervised person image synthesis in arbitrary poses.", Proceedings of the IEEE conference on computer vision and pattern recognition, 2018) in view of Xu et al (CN109670444) further in view of Mukherjee (US20210232858).

Regarding Claim 1. Pumarola teaches A method for creating a virtual image based on deep learning by an image application executed by a processor of a computing device (Pumarola, abstract, the paper describes a method for synthesizing photorealistic images of people in arbitrary poses using generative adversarial learning. Given an input image of a person and a desired pose represented by a 2D skeleton, our
model renders the image of the same person under the new pose, synthesizing novel views of the parts visible in the input image and hallucinating those that are not seen.), the method comprising:

Pumarola fails to explicitly teach, however, Xu teaches obtaining a plurality of product images including one product (Xu, abstract, the invention describes an attitude detection model generation method and device, an attitude detection method and device and equipment and a medium. The method comprises the steps of acquiring at least one posture consistency sample set, wherein the posture consistency sample set comprises an original face image and at least one posture transformation face image meeting posture difference conditions with the original face image, and the posture transformation face image is generated through transformation of the original face image; setting attitude parameters of each face image included in each group of attitude consistency sample set as the same standard attitude parameter value; and training a standard detection model by adopting the at least one group of attitude consistency sample set to form an attitude detection model. According to the embodiment of the invention, the situation that the user posture is misjudged by using the photo can be avoided, and the user posture recognition accuracy is improved.);
Pumarola and Xu are analogous art, because they both teach method of detecting object/human posture in obtained image. Pumarola further teaches creating virtual image based on obtain image and an input posture. Xu further teaches obtaining a plurality of images comprising object/human with different postures and extracting attitude parameter values for each posture. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the virtual image with targeted posture method (taught in Pumarola), to further use the posture parameter value obtaining method (taught in Xu), so as to improve the posture detection accuracy (Xu, abstract).

The combination of Pumarola and Xu fails to explicitly teach, however, Mukherjee teaches classifying the obtained product images into a plurality of pose type categories according to a pose included in each of the obtained product images (Mukherjee, abstract, the invention describes a method for training an object detection algorithm. The method includes: (a) selecting a 3D model corresponding to an object; (b) acquiring images of the 3D model, the images being obtained by rendering the 3D model at respective poses; (c) acquiring 2D projections of 3D points on the 3D model at the respective poses; and (d) storing, in a memory, an association between the acquired 2D projections and the respective poses.
[0114] In FIG. 5D, the algorithm can also be trained for view classification. In FIG. 5D, step SS0S replaces step S502 and S504 in FIG. 5A. This means that for each domain adapted image 606, a rough classification (e.g. front, back, side) of the pose is assigned. Specifically, the algorithm model is trained with the synthetic training data as input to map (i) a domain-adapted image 606 containing the 3D model rendered from a viewpoint in the synthetic training data and (ii) its corresponding 2D bounding box and view classification. This view classification is useful in the detection phase because it narrows the number of pose candidates to be analyzed, resulting in greater accuracy and reduced computational load.);
Pumarola, Xu and Mukherjee are analogous art, because they all teach method of detecting object/human posture in obtained image. Mukherjee further teaches classifying the detected pose into different category such as front, back and side. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the virtual image with targeted posture method (taught in Pumarola and Xu), to further use the posture grouping method (taught in Mukherjee), so as to provide efficient and accurate process method for pose detection phase (Mukherjee, [0114]).

The combination of Pumarola, Xu and Mukherjee further teaches selecting at least one target pose type category from the plurality of pose type categories to create the virtual image (Pumarola, page 3, col 1, par 1-2, Given a single-view image of a person, our goal is to train a GAN model in an unsupervised manner, allowing to generate photo-realistic pose transformations of the input image while retaining the person identity and clothes appearance. Formally, we seek to learn the mapping
(Ipo ; pf ) [Wingdings font/0xE0] Ipf  between an image Ipo [Symbol font/0xCE] R3xHxW of a person with pose po and the image 
Ipf [Symbol font/0xCE] R3xHxW of the same person with the desired position pf. Poses are represented by 2D skeletons with N = 18 joints p = (u1,…, uN), where ui = (ui, vi) is the i-th joint pixel location in the image. Further see Figure 2.
Therefore, posture pf  is the selected target pose type.);
creating a virtual image of a pose type corresponding to the selected target pose type category using at least one product image among the plurality of product images by the deep learning; and
outputting the created virtual image (Pumarola, See Figure 1, given an original image of a person (left) and a desired body pose defined by a 2D skeleton (bottom-row), our model generates new photo-realistic images of the person under that pose (top-row). The main contribution of our work is to train this generative model with unlabeled data.).

Regarding Claim 3. The combination of Pumarola, Xu and Mukherjee further teaches The method of claim 1, wherein the virtual image is created to have the pose type corresponding to the selected target pose type category by inputting the at least one product image and information related to the pose type corresponding to the selected target pose type category to a pre-trained deep learning neural network (Xu, [0011] The attitude parameters of each face image included in the attitude consistency sample set of each group are set to the same standard attitude parameter value;
Mukherjee, [0095) FIG. SA is a flow diagram of an example method of performing step S416 of FIG. 4. According to this method, training data can be developed using the CAD model. An object detection algorithm model that is to be trained with the synthetic training data according to this example is a neural network model such as a deep learning neural network model and a CNN (convolutional neural network) model.).
The reasoning for combination of Pumarola, Xu and Mukherjee is the same as described in Claim 1.

Regarding Claim 4. The combination of Pumarola, Xu and Mukherjee further teaches The method of claim 1, wherein the creating of the virtual image of the pose type corresponding to the selected target pose type category includes determining at least one of the plurality of product images as a base image (Pumarola, See Figure 1, given an original image of a person (left) and a desired body pose defined by a 2D skeleton (bottom-row), our model generates new photo-realistic images of the person under that pose (top-row). The main contribution of our work is to train this generative model with unlabeled data.).
The reasoning for combination of Pumarola, Xu and Mukherjee is the same as described in Claim 1.

Regarding Claim 6. The combination of Pumarola, Xu and Mukherjee further teaches The method of claim 4, wherein the creating of the virtual image of the pose type corresponding to the selected target pose type category further includes creating a pose semantic label map of the pose type corresponding to the selected target pose type category by inputting the base image and information related to the pose type corresponding to the selected
target pose type category to a pre-trained deep learning neural network (Pumarola, See Figure 1, given an original image of a person (left) and a desired body pose defined by a 2D skeleton (bottom-row), our model generates new photo-realistic images of the person under that pose (top-row). The main contribution of our work is to train this generative model with unlabeled data.
Therefore, the created virtual image is connected to the desired body pose defined by a 2D skeleton. The correspondence is similar to a label map.
Page 3, par 1-2, Figure 2 shows an overview of our model. It is composed of four main modules: (1) A generator G(I | p) that acts as a differentiable render mapping one input image of a given person under a specific pose to an output image of the same person under a different pose. Note that G is used twice in our network, first to map the input image Ipo [Wingdings font/0xE0] Ipf and then to render the latter back to the original pose Ipf [Wingdings font/0xE0] Ipo ; (2) A regressor Φ responsible of estimating the 2D joint locations of a given image; (3) A discriminator DI(I) that seeks to discriminate between generated and real samples; (4) A loss function, computed without ground truth, that aims to preserve the person identity. For this purpose, we devise a novel loss function that enforces semantic content similarity of Ipo and ^Ipo , and style similarity between Ipo and Ipf .).
The reasoning for combination of Pumarola, Xu and Mukherjee is the same as described in Claim 1.

Regarding Claim 7. The combination of Pumarola, Xu and Mukherjee further teaches The method of claim 6, wherein the creating of the virtual image of the pose type corresponding to the selected target pose type category further includes creating a base semantic label map which is a semantic label map of the base image (Pumarola, See Figure 1, given an original image of a person (left) and a desired body pose defined by a 2D skeleton (bottom-row), our model generates new photo-realistic images of the person under that pose (top-row). The main contribution of our work is to train this generative model with unlabeled data.
Therefore, the created virtual image is connected to the desired body pose defined by a 2D skeleton. The correspondence is similar to a label map.
Page 3, par 1-2, Figure 2 shows an overview of our model. It is composed of four main modules: (1) A generator G(I | p) that acts as a differentiable render mapping one input image of a given person under a specific pose to an output image of the same person under a different pose. Note that G is used twice in our network, first to map the input image Ipo [Wingdings font/0xE0] Ipf and then to render the latter back to the original pose Ipf [Wingdings font/0xE0] Ipo ; (2) A regressor Φ responsible of estimating the 2D joint locations of a given image; (3) A discriminator DI(I) that seeks to discriminate between generated and real samples; (4) A loss function, computed without ground truth, that aims to preserve the person identity. For this purpose, we devise a novel loss function that enforces semantic content similarity of Ipo and ^Ipo , and style similarity between Ipo and Ipf .
Therefore, the base image (original input image) is mapped to its corresponding specific pose).
The reasoning for combination of Pumarola, Xu and Mukherjee is the same as described in Claim 1.

Regarding Claim 8. The combination of Pumarola, Xu and Mukherjee further teaches The method of claim 6, wherein the creating of the virtual image of the pose type corresponding to the selected target pose type category further includes creating the virtual image by inputting the pose semantic label map and the base image to the pre-trained deep learning neural network (Pumarola, See Figure 1, given an original image of a person (left) and a desired body pose defined by a 2D skeleton (bottom-row), our model generates new photo-realistic images of the person under that pose (top-row). The main contribution of our work is to train this generative model with unlabeled data.
Therefore, the created virtual image is connected to the desired body pose defined by a 2D skeleton. The correspondence is similar to a label map.
Page 3, par 1-2, Figure 2 shows an overview of our model. It is composed of four main modules: (1) A generator G(I | p) that acts as a differentiable render mapping one input image of a given person under a specific pose to an output image of the same person under a different pose. Note that G is used twice in our network, first to map the input image Ipo [Wingdings font/0xE0] Ipf and then to render the latter back to the original pose Ipf [Wingdings font/0xE0] Ipo ; (2) A regressor Φ responsible of estimating the 2D joint locations of a given image; (3) A discriminator DI(I) that seeks to discriminate between generated and real samples; (4) A loss function, computed without ground truth, that aims to preserve the person identity. For this purpose, we devise a novel loss function that enforces semantic content similarity of Ipo and ^Ipo , and style similarity between Ipo and Ipf .
Therefore, the base image (original input image) is mapped to its corresponding specific pose.  The base image and its pairing pose is is entered back to the deep learning network, for further comparing to the created virtual image and its pairing pose.).
The reasoning for combination of Pumarola, Xu and Mukherjee is the same as described in Claim 1.


Regarding Claim 9. The combination of Pumarola, Xu and Mukherjee further teaches The method of claim 6, wherein the creating of the virtual image of the pose type corresponding to the selected target pose type category includes creating the virtual image having the pose type corresponding to the selected target pose type category by inputting the information related to the pose type corresponding to the selected target pose type category and
the pose semantic label map to the pre-trained deep learning neural network to correspond to the base image and the base semantic label map  (Pumarola, See Figure 1, given an original image of a person (left) and a desired body pose defined by a 2D skeleton (bottom-row), our model generates new photo-realistic images of the person under that pose (top-row). The main contribution of our work is to train this generative model with unlabeled data.
Therefore, the created virtual image is connected to the desired body pose defined by a 2D skeleton. The correspondence is similar to a label map.
Page 3, par 1-2, Figure 2 shows an overview of our model. It is composed of four main modules: (1) A generator G(I | p) that acts as a differentiable render mapping one input image of a given person under a specific pose to an output image of the same person under a different pose. Note that G is used twice in our network, first to map the input image Ipo [Wingdings font/0xE0] Ipf and then to render the latter back to the original pose Ipf [Wingdings font/0xE0] Ipo ; (2) A regressor Φ responsible of estimating the 2D joint locations of a given image; (3) A discriminator DI(I) that seeks to discriminate between generated and real samples; (4) A loss function, computed without ground truth, that aims to preserve the person identity. For this purpose, we devise a novel loss function that enforces semantic content similarity of Ipo and ^Ipo , and style similarity between Ipo and Ipf .
Therefore, the base image (original input image) is mapped to its corresponding specific pose.  The base image and its pairing pose is is entered back to the deep learning network, for further comparing to the created virtual image and its pairing pose.).
The reasoning for combination of Pumarola, Xu and Mukherjee is the same as described in Claim 1.

	Claim 10 is similar in scope as Claim 1, and thus is rejected under same rationale. Claim 10 further requires: 
at least one processor; and a memory storing instructions for an image application executed by the at least one processor (Mukherjee, abstract, the invention describes a non-transitory computer readable medium embodies
instructions that cause one or more processors to perform a method for training an object detection algorithm. The method is for training an object detection algorithm. The method includes: (a) selecting a 3D model corresponding to an object; (b) acquiring images of the 3D model, the images being obtained by rendering the 3D model at respective poses; (c) acquiring 2D projections of 3D points on the 3D model at the respective poses; and (d) storing, in a memory, an association between the acquired 2D projections and the respective poses.).

Claim 12 is similar in scope as Claim 4, and thus is rejected under same rationale.

Regarding Claim 18. The combination of Pumarola, Xu and Mukherjee further teaches A method for creating a virtual image based on deep learning by an image application executed by a processor of a computing device (Pumarola, abstract, the paper describes a method for synthesizing photorealistic images of people in arbitrary poses using generative adversarial learning. Given an input image of a person and a desired pose represented by a 2D skeleton, our model renders the image of the same person under the new pose, synthesizing novel views of the parts visible in the input image and hallucinating those that are not seen.), the method comprising:
obtaining a product image including a model wearing one product;
creating a virtual image of a first pose type different from a second pose type of the model included in the product image (Mukherjee, [0114] In FIG. 5D, the algorithm can also be trained for view classification. In FIG. 5D, step SS0S replaces step S502 and S504 in FIG. 5A. This means that for each domain adapted image 606, a rough classification (e.g. front, back, side) of the pose is assigned. Specifically, the algorithm model is trained with the synthetic training data as input to map (i) a domain-adapted image 606 containing the 3D model rendered from a viewpoint in the synthetic training data and (ii) its corresponding 2D bounding box and view classification. This view classification is useful in the detection phase because it narrows the number of pose candidates to be analyzed, resulting in greater accuracy and reduced computational load.
Pumarola, See Figure 1, given an original image of a person (left) and a desired body pose defined by a 2D skeleton (bottom-row), our model generates new photo-realistic images of the person under that pose (top-row). The main contribution of our work is to train this generative model with unlabeled data.
Therefore, the original image of a person is the product image. The various pose type 2D skeleton are different pose types.); and
outputting the created virtual image of the first pose type different from the second pose type of the model included in the product image, wherein the creating of the virtual image of the first pose type different from the second pose type of the model included in the product image includes creating the virtual image of the first pose type by modifying at least one body region of a model having the second pose type to conform to the first pose type (Pumarola, page 3, col 1, par 1-2, Given a single-view image of a person, our goal is to train a GAN model in an unsupervised manner, allowing to generate photo-realistic pose transformations of the input image while retaining the person identity and clothes appearance. Formally, we seek to learn the mapping (Ipo ; pf ) [Wingdings font/0xE0] Ipf  between an image Ipo [Symbol font/0xCE] R3xHxW of a person with pose po and the image Ipf [Symbol font/0xCE] R3xHxW of the same person with the desired position pf. Poses are represented by 2D skeletons with N = 18 joints p = (u1,…, uN), where ui = (ui, vi) is the i-th joint pixel location in the image. Further see Figure 2.
It is clear from Figure 2 that part of body region in Ipo is conformed to form posture pf  which is the selected target pose type.).
The reasoning for combination of Pumarola, Xu and Mukherjee is the same as described in Claim 1.

Regarding Claim 19. The combination of Pumarola, Xu and Mukherjee further teaches The method of claim 18, wherein the creating of the virtual image of the first pose type by modifying the at least one body region of the model having the second pose type to conform to the first pose type includes creating a pose semantic label map in which body regions of a model conform to the first pose type by inputting the product image to a pre-trained image deep learning neural network (Pumarola, See Figure 1, given an original image of a person (left) and a desired body pose defined by a 2D skeleton (bottom-row), our model generates new photo-realistic images of the person under that pose (top-row). The main contribution of our work is to train this generative model with unlabeled data.
Therefore, the created virtual image is connected to the desired body pose defined by a 2D skeleton. The correspondence is similar to a label map.
Page 3, par 1-2, Figure 2 shows an overview of our model. It is composed of four main modules: (1) A generator G(I | p) that acts as a differentiable render mapping one input image of a given person under a specific pose to an output image of the same person under a different pose. Note that G is used twice in our network, first to map the input image Ipo [Wingdings font/0xE0] Ipf and then to render the latter back to the original pose Ipf [Wingdings font/0xE0] Ipo ; (2) A regressor Φ responsible of estimating the 2D joint locations of a given image; (3) A discriminator DI(I) that seeks to discriminate between generated and real samples; (4) A loss function, computed without ground truth, that aims to preserve the person identity. For this purpose, we devise a novel loss function that enforces semantic content similarity of Ipo and ^Ipo , and style similarity between Ipo and Ipf .).
The reasoning for combination of Pumarola, Xu and Mukherjee is the same as described in Claim 1.

Regarding Claim 20. The combination of Pumarola, Xu and Mukherjee further teaches The method of claim 19, wherein the creating of the virtual image of the first pose type by modifying the at least one body region of the model having the second pose type to conform to the first pose type includes creating the virtual image modified to conform to the first pose type by inputting the product image and the pose semantic label map to the pretrained image deep learning neural network (Pumarola, page 3, par 1-2, Figure 2 shows an overview of our model. It is composed of four main modules: (1) A generator G(I | p) that acts as a differentiable render mapping one input image of a given person under a specific pose to an output image of the same person under a different pose. Note that G is used twice in our network, first to map the input image Ipo [Wingdings font/0xE0] Ipf and then to render the latter back to the original pose Ipf [Wingdings font/0xE0] Ipo ; (2) A regressor Φ responsible of estimating the 2D joint locations of a given image; (3) A discriminator DI(I) that seeks to discriminate between generated and real samples; (4) A loss function, computed without ground truth, that aims to preserve the person identity. For this purpose, we devise a novel loss function that enforces semantic content similarity of Ipo and ^Ipo , and style similarity between Ipo and Ipf .
Therefore, the created virtual image and its pairing pose is entered back to the deep learning network for further regression.).
The reasoning for combination of Pumarola, Xu and Mukherjee is the same as described in Claim 1.

Allowable Subject Matter
Claims 2, 5, 11, 13-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reason for the indication of allowable subject matter: 
Regarding Claim 2, it recites “…wherein the selecting of the at least one target pose type category includes detecting an empty category to which no product image is classified among the plurality of pose type categories, and determining the detected empty category as the target pose type category”. in the context of claim 2.
The prior arts of record either alone or in combination fails to teach or suggest the above quoted limitation of Claim 2. Therefore, Claim 2 is allowable over prior art.
Regarding Claim 5, it recites “…wherein the determining of the at least one of the plurality of product images as the base image includes determining, as the base image, a product image classified to a pose type category having a highest priority among the plurality of pose type categories, wherein each of the plurality of pose type categories has a respective preconfigured priority”. in the context of claim 5.
The prior arts of record either alone or in combination fails to teach or suggest the above quoted limitation of Claim 5. Therefore, Claim 5 is allowable over prior art.
Claim 11 recite similar limitations as discussed above with regard to claim 2. Therefore, claim 11 is allowable over prior art.
 Claim 13 recite similar limitations as discussed above with regard to claim 5. Therefore, claim 13 is allowable over prior art.
Claims 14-17 depend from Claim 13 with respective additional limitations. Therefore, Claims 14-17 are allowable over prior art.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIN SHENG whose telephone number is (571)272-5734. The examiner can normally be reached M-F 9:30AM-3:30PM 6:00PM-8:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 5712727794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Xin Sheng/Primary Examiner, Art Unit 2611