DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment

This Office Action is responsive to Applicant’s remarks received on January 28, 2021.  Claims 1-20 are pending.


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-8, 10-15 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Ge et al. (US 2020/0184721) and Sunkavalli et al. (US 2019/0340810).
Regarding claim 1, Ge et al. discloses a method for generating augmented training data for hand pose estimation, comprising: 
receiving, by a device, source data that is associated with a first lighting condition (“A 3D hand model is generated, rigged with joints, and then photorealistic textures are applied on the 3D hand model as well as natural lighting using high-dynamic range (HDR) images” at paragraph 0040, line 5); 
receiving, by the device, target data that is associated with a second lighting condition (“a randomly selected background image (e.g., a city image, a living room image, or any other suitable image obtained randomly or pseudo-randomly from a background image server(s))” at paragraph 0040, line 22); 
generating, by the device, the augmented training data for hand pose estimation based on the target data (“To do this, the system obtains an image that contains a rendered 3D hand mesh, the 3D hand mesh is cropped and extracted from the image, a background image is randomly selected, and the cropped 3D hand mesh is combined with the selected background image and stored as a new image to be used in the first training phase” at paragraph 0040, second to last sentence).
Ge et al. does not explicitly disclose determining, by the device and by inputting the source data and the target data into a model, a lighting condition translation between the first lighting condition and the second lighting condition, generating, by the device, lighting translated data using the source data based on the lighting condition 
Sunkavalli et al. teaches a method in the same field of endeavor of object recognition, comprising:
receiving, by a device, source data that is associated with a first lighting condition (“As shown in FIG. 3, the image relighting system trains an object relighting neural network using training digital images 302. In particular, the training digital images 302 portray a training object illuminated from various lighting directions” at paragraph 0056, line 1);
receiving, by the device, target data that is associated with a second lighting condition (“new lighting direction 304 indicates the lighting direction from which the training object should be illuminated in the resulting digital images” at paragraph 0057, line 3);
determining, by the device and by inputting the source data and the target data into a model, a lighting condition translation that maps the first lighting condition and the second lighting condition (“Specifically, the image relighting system can analyze the training digital images 302 and the new lighting direction 304 at a variety of different levels of abstraction utilizing a plurality of layers of the object relighting neural network 306 to predict the new digital image 308” at paragraph 0058, line 4; as seen in figure 3, the input images and the target direction are input into the relighting system); 
generating, by the device, lighting translated data using the source data based on the lighting condition translation between the first lighting condition and the second 
generating, by the device, the augmented training data for object estimation based on the target data and the lighting translated data (“For synthetic training objects, the image relighting system can render the training object as illuminated by the new lighting direction 304 and generate an image of the illuminated training object” at paragraph 0059, line 9; “The image relighting system can train the object relighting neural network 306 based on the determined loss. For example, in one or more embodiments, the image relighting system back propagates the determined loss to the object relighting neural network 306 to modify its parameters” at paragraph 0061, line 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the lighting transfer as taught by Sunkavalli et al. for the background and hand images of Ge et al. to ensure that the blended image contains consistent lighting.
Regarding claim 3, Ge et al. discloses a method wherein the source data is a synthetic hand pose image (“A 3D hand model is generated, rigged with joints, and then photorealistic textures are applied on the 3D hand model as well as natural lighting using high-dynamic range (HDR) images” at paragraph 0040, line 5).
Regarding claim 4, the Ge et al. and Sunkavalli et al. combination discloses a method wherein the target data is a background image (“a randomly selected background image (e.g., a city image, a living room image, or any other suitable image 
Regarding claim 5, Ge et al. discloses a method further comprising: 
training a hand pose estimation model using the augmented training data (“To do this, the system obtains an image that contains a rendered 3D hand mesh, the 3D hand mesh is cropped and extracted from the image, a background image is randomly selected, and the cropped 3D hand mesh is combined with the selected background image and stored as a new image to be used in the first training phase” at paragraph 0040, second to last sentence).
Regarding claim 6, Ge et al. discloses a method further comprising: 
generating, using a three dimensional model simulator, the source data that represents a hand pose (“A 3D hand model is generated, rigged with joints, and then photorealistic textures are applied on the 3D hand model as well as natural lighting using high-dynamic range (HDR) images” at paragraph 0040, line 5).
Regarding claim 7, Ge et al. discloses a method wherein the target data is a real-world image (“a randomly selected background image (e.g., a city image, a living room image, or any other suitable image obtained randomly or pseudo-randomly from a background image server(s))” at paragraph 0040, line 22), and wherein the source data is a synthetic hand pose image (“A 3D hand model is generated, rigged with joints, and 
Regarding claim 8, Ge et al. discloses a device, comprising: 
at least one memory configured to store program code (“The storage unit 1016 and memory 1014 store the instructions 1010 embodying any one or more of the methodologies or functions described herein” at paragraph 0100, line 4); 
at least one processor configured to read the program code and operate as instructed by the program code (“the processors 1004 (e.g., a central processing unit (CPU), a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a graphics processing unit (GPU), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a radio-frequency integrated circuit (RFIC), another processor, or any suitable combination thereof) may include, for example, a processor 1008 and a processor 1012 that may execute the instructions 1010” at paragraph 0099, line 4), the program code including: 
receiving code that is configured to cause the at least one processor to: 
receive source data that is associated with a first lighting condition (“A 3D hand model is generated, rigged with joints, and then photorealistic textures are applied on the 3D hand model as well as natural lighting using high-dynamic range (HDR) images” at paragraph 0040, line 5); 
receive target data that is associated with a second lighting condition (“a randomly selected background image (e.g., a city image, a living room image, or any other suitable image obtained randomly or pseudo-randomly from a background image server(s))” at paragraph 0040, line 22); 

Ge et al. does not explicitly disclose determining code that is configured to cause the at least one processor to determine, by inputting the source data and the target data into a model, a lighting condition translation that maps the first lighting condition and the second lighting condition, and generating code that is configured to cause the at least one processor to: generate lighting translated data using the source data based on the lighting condition translation between the first lighting condition and the second lighting condition, and generate augmented training data for hand pose estimation based on the target data and the lighting translated data.
Sunkavalli et al. teaches a device in the same field of endeavor of object recognition, comprising:
at least one memory configured to store program code (“Memory 1304 may be used for storing data, metadata, and programs for execution by the processor(s)” at paragraph 0141, line 1); 
at least one processor configured to read the program code and operate as instructed by the program code (the processor executes the program as described above), the program code including: 
receiving code that is configured to cause the at least one processor to: 

receive source data that is associated with a first lighting condition (“As shown in FIG. 3, the image relighting system trains an object relighting neural network using training digital images 302. In particular, the training digital images 302 portray a training object illuminated from various lighting directions” at paragraph 0056, line 1);
receive target data that is associated with a second lighting condition (“new lighting direction 304 indicates the lighting direction from which the training object should be illuminated in the resulting digital images” at paragraph 0057, line 3);
determining code that is configured to cause the at least one processor to determine, by inputting the source data and the target data into a model, a lighting condition translation that maps the first lighting condition and the second lighting condition (“Specifically, the image relighting system can analyze the training digital images 302 and the new lighting direction 304 at a variety of different levels of abstraction utilizing a plurality of layers of the object relighting neural network 306 to predict the new digital image 308” at paragraph 0058, line 4; as seen in figure 3, the input images and the target direction are input into the relighting system); and 
generating code that is configured to cause the at least one processor to: generate lighting translated data using the source data based on the lighting condition translation that maps the first lighting condition and the second lighting condition (“Accordingly, the new digital image 308 portrays a prediction by the object relighting neural network 306 of the training object illuminated from the new lighting direction 304” at paragraph 0058, line 9); and 

It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the lighting transfer as taught by Sunkavalli et al. for the background and hand images of Ge et al. to ensure that the blended image contains consistent lighting.
Regarding claim 10, Ge et al. discloses a device wherein the source data is a synthetic hand pose image (“A 3D hand model is generated, rigged with joints, and then photorealistic textures are applied on the 3D hand model as well as natural lighting using high-dynamic range (HDR) images” at paragraph 0040, line 5).
Regarding claim 11, the Ge et al. and Sunkavalli et al. combination discloses a device wherein the target data is a background image (“a randomly selected background image (e.g., a city image, a living room image, or any other suitable image obtained randomly or pseudo-randomly from a background image server(s))” Ge et al. at paragraph 0040, line 22), and wherein the augmented training data includes the lighting translated data that is superimposed on the background image (“For synthetic training objects, the image relighting system can render the training object as 
Regarding claim 12, Ge et al. discloses a device further comprising: 
training code that is configured to cause the at least one processor to train a hand pose estimation model using the augmented training data (“To do this, the system obtains an image that contains a rendered 3D hand mesh, the 3D hand mesh is cropped and extracted from the image, a background image is randomly selected, and the cropped 3D hand mesh is combined with the selected background image and stored as a new image to be used in the first training phase” at paragraph 0040, second to last sentence).
Regarding claim 13, Ge et al. discloses a device wherein the generating code is further configured to cause the at least one processor to generate, using a three dimensional model simulator, the source data that represents a hand pose (“A 3D hand model is generated, rigged with joints, and then photorealistic textures are applied on the 3D hand model as well as natural lighting using high-dynamic range (HDR) images” at paragraph 0040, line 5).
Regarding claim 14, Ge et al. discloses a device wherein the target data is a real-world image (“a randomly selected background image (e.g., a city image, a living room image, or any other suitable image obtained randomly or pseudo-randomly from a background image server(s))” at paragraph 0040, line 22), and wherein the source data is a synthetic hand pose image (“A 3D hand model is generated, rigged with joints, and then photorealistic textures are applied on the 3D hand model as well as natural lighting using high-dynamic range (HDR) images” at paragraph 0040, line 5). 
claim 15, Ge et al. discloses a non-transitory computer-readable medium storing instructions (“The storage unit 1016 and memory 1014 store the instructions 1010 embodying any one or more of the methodologies or functions described herein” at paragraph 0100, line 4), the instructions comprising: one or more instructions that, when executed by one or more processors of a device (“the processors 1004 (e.g., a central processing unit (CPU), a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a graphics processing unit (GPU), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a radio-frequency integrated circuit (RFIC), another processor, or any suitable combination thereof) may include, for example, a processor 1008 and a processor 1012 that may execute the instructions 1010” at paragraph 0099, line 4), cause the one or more processors to:
receive source data that is associated with a first lighting condition (“A 3D hand model is generated, rigged with joints, and then photorealistic textures are applied on the 3D hand model as well as natural lighting using high-dynamic range (HDR) images” at paragraph 0040, line 5); 
receive target data that is associated with a second lighting condition (“a randomly selected background image (e.g., a city image, a living room image, or any other suitable image obtained randomly or pseudo-randomly from a background image server(s))” at paragraph 0040, line 22); 
generating augmented training data for hand pose estimation based on the target data (“To do this, the system obtains an image that contains a rendered 3D hand mesh, the 3D hand mesh is cropped and extracted from the image, a background image is 
Ge et al. does not explicitly disclose determining, by inputting the source data and the target data into a model, a lighting condition translation that maps the first lighting condition and the second lighting condition, generating lighting translated data using the source data based on the lighting condition translation between the first lighting condition and the second lighting condition and generating the augmented training data for hand pose estimation based on the target data and the lighting translated data.
Sunkavalli et al. teaches a non-transitory computer-readable medium storing instructions in the same field of endeavor of object recognition (“Memory 1304 may be used for storing data, metadata, and programs for execution by the processor(s)” at paragraph 0141, line 1), the instructions comprising: one or more instructions that, when executed by one or more processors of a device, cause the one or more processors to:
receive source data that is associated with a first lighting condition (“As shown in FIG. 3, the image relighting system trains an object relighting neural network using training digital images 302. In particular, the training digital images 302 portray a training object illuminated from various lighting directions” at paragraph 0056, line 1);
receive target data that is associated with a second lighting condition (“new lighting direction 304 indicates the lighting direction from which the training object should be illuminated in the resulting digital images” at paragraph 0057, line 3);

generate lighting translated data using the source data based on the lighting condition translation between the first lighting condition and the second lighting condition (“Accordingly, the new digital image 308 portrays a prediction by the object relighting neural network 306 of the training object illuminated from the new lighting direction 304” at paragraph 0058, line 9); and 
generate augmented training data for object estimation based on the target data and the lighting translated data (“For synthetic training objects, the image relighting system can render the training object as illuminated by the new lighting direction 304 and generate an image of the illuminated training object” at paragraph 0059, line 9; “The image relighting system can train the object relighting neural network 306 based on the determined loss. For example, in one or more embodiments, the image relighting system back propagates the determined loss to the object relighting neural network 306 to modify its parameters” at paragraph 0061, line 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the lighting transfer as taught by Sunkavalli et al. for 
Regarding claim 17, Ge et al. discloses a computer-readable medium wherein the source data is a synthetic hand pose image (“A 3D hand model is generated, rigged with joints, and then photorealistic textures are applied on the 3D hand model as well as natural lighting using high-dynamic range (HDR) images” at paragraph 0040, line 5).
Regarding claim 18, the Ge et al. and Sunkavalli et al. combination discloses a computer-readable medium wherein the target data is a background image (“a randomly selected background image (e.g., a city image, a living room image, or any other suitable image obtained randomly or pseudo-randomly from a background image server(s))” Ge et al. at paragraph 0040, line 22), and wherein the augmented training data includes the lighting translated data that is superimposed on the background image (“For synthetic training objects, the image relighting system can render the training object as illuminated by the new lighting direction 304 and generate an image of the illuminated training object” at Sunkavalli et al. paragraph 0059, line 9).
Regarding claim 19, Ge et al. discloses a computer-readable medium further wherein the one or more instructions cause the one or more processors to: 
train a hand pose estimation model using the augmented training data (“To do this, the system obtains an image that contains a rendered 3D hand mesh, the 3D hand mesh is cropped and extracted from the image, a background image is randomly selected, and the cropped 3D hand mesh is combined with the selected background image and stored as a new image to be used in the first training phase” at paragraph 0040, second to last sentence).
claim 20, Ge et al. discloses a computer-readable medium wherein the target data is a real-world image (“a randomly selected background image (e.g., a city image, a living room image, or any other suitable image obtained randomly or pseudo-randomly from a background image server(s))” at paragraph 0040, line 22), and wherein the source data is a synthetic hand pose image (“A 3D hand model is generated, rigged with joints, and then photorealistic textures are applied on the 3D hand model as well as natural lighting using high-dynamic range (HDR) images” at paragraph 0040, line 5). 

Claims 2, 9 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Ge et al. and Sunkavalli et al. as applied to claims 1, 8 and 15 above, and further in view of Sohn et al. (US 2019/0066493).
Regarding claim 2, the Ge et al. and Sunkavalli et al. combination discloses the elements of claim 1 as described above.
The Ge et al. and Sunkavalli et al. combination does not explicitly disclose that the model is a cycle-consistent adversarial network.
Sohn et al. teaches a method in the same field of endeavor of object recognition, wherein the model is a cycle-consistent adversarial network (“The attribute specific generator 210 can be trained in a generative adversarial network (GAN), such as, e.g., CycleGAN or other GANs. Thus, the attribute specific lighting generator 210 generates one or more adjustments to lighting for each of the input images, thus outputting predicted style and view augmented source images 13” at paragraph 0053, line 1).

Regarding claim 9, the Ge et al. and Sunkavalli et al. combination discloses the elements of claim 8 as described above.
The Ge et al. and Sunkavalli et al. combination does not explicitly disclose that the model is a cycle-consistent adversarial network.
Sohn et al. teaches a device in the same field of endeavor of object recognition, wherein the model is a cycle-consistent adversarial network (“The attribute specific generator 210 can be trained in a generative adversarial network (GAN), such as, e.g., CycleGAN or other GANs. Thus, the attribute specific lighting generator 210 generates one or more adjustments to lighting for each of the input images, thus outputting predicted style and view augmented source images 13” at paragraph 0053, line 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the CycleGAN as taught by Sohn et al. for the model of the Ge et al. and Sunkavalli et al. combination to thus be able to modify the lighting conditions for training accordingly.
Regarding claim 16, the Ge et al. and Sunkavalli et al. combination discloses the elements of claim 15 as described above.
The Ge et al. and Sunkavalli et al. combination does not explicitly disclose that the model is a cycle-consistent adversarial network.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the CycleGAN as taught by Sohn et al. for the model of the Ge et al. and Sunkavalli et al. combination to thus be able to modify the lighting conditions for training accordingly.


Response to Arguments

	Summary of Remarks (@ response page labeled 8): “Chang is silent regarding inputting source data having a first lighting condition and target data having a second lighting condition into a model, and determining a lighting condition translation that maps the first lighting condition and the second lighting condition.”

	Examiner’s Response: This argument is moot in view of the newly cited Sunkavalli et al. reference.


Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATRINA R FUJITA whose telephone number is (571)270-1574.  The examiner can normally be reached on Monday - Friday 9:30-5:30 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/KATRINA R FUJITA/Primary Examiner, Art Unit 2662