DETAILED ACTION


Response to Amendment
Applicant’s amendments filed on June 16, 2021 have been entered. Claim 21 has been added. Claims 1-21 are still pending in this application, with claims 1, 11 and 16 being independent.


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 


Claim Rejections - 35 USC § 103
1.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole 


3.	Claims 1, 9, 11, 16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Novotny et al. (US 20170330059 A1), referred herein as Novotny in view of Hamada et al. (US 20150116349 A1), referred herein as Hamada.
	Regarding Claim 1, Novotny teaches method, comprising (Novotny Abs: A method for generating object and part detectors includes accessing a collection of training images): 
applying a first generator model to a semantic representation of an image to generate an affine transformation, wherein the affine transformation represents a bounding box associated with at least one region within the image (Novotny FIG4.70: a transformation computation component; [0028] localization of objects and their parts in images and to name them according to semantic categories; [0030] While FIGS. 2 and 3 show the regions where the parts and objects are predicted to be located as rectangular bounding boxes; [0093] each region pair (R, Q).di-elect cons. M is used to generate a transformation hypothesis T by fitting an affine transformation to map R into Q (i.e., Q.apprxeq.TR), resulting in a candidate set of possible pairwise transformations T; FIG7.S208: Generate image transformation which maps each image in set of similar images to common geometric frame);
applying a second generator model to the affine transformation and 
the semantic representation to generate a shape of an object (Novotny FIG4.72: a geometric representation generator; [0039] The appearance representation generator 62 generates an image level representation 80 (image descriptor); [0045] The geometric 
Novotny does not teach inserting the object into the image based on the bounding box and the shape.
However Hamada discloses a generation unit generates a sub image by performing correction for improving visibility on an image of the detected area, which is analogous to the present patent application. Hamada teaches inserting the object into the image based on the bounding box and the shape (Hamada FIG12.S504: display sub image on input image in overlapping manner at position designated by arrangement plan information with high transparency; [0116] in the proximity arrangement plan optimization model M1 illustrated in FIG. 9, the upper limit of the number of sub images to be arranged may be inserted).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have modified Novotny to incorporate the teachings of Hamada, and applying the technology for performing a recognition process such as a character recognition process or an object recognition process on an input image and overlaying a result of the recognition process on the input image, as taught by Hamada into the method and system for joint object and object part detection using web supervision.


Regarding Claim 9, Novotny in view of Hamada teaches the method of claim 1, and further teaches wherein inserting the object into the image based on the bounding box and the shape comprises applying the affine transformation to the shape (Novotny [0091] The standard IoU measure can be relaxed to provide a more permissive geometric similarity measure between regions R and Q. To do so, let R be a bounding box of extent [x.sub.1,x.sub.2].times.[y.sub.1,y.sub.2]; [0092] each region pair (R, Q).di-elect cons. M is used to generate a transformation hypothesis T by fitting an affine transformation to map R into Q (i.e., Q.apprxeq.TR), resulting in a candidate set of possible pairwise transformations T).

Regarding Claim 11, Novotny in view of Hamada teaches a non-transitory computer readable medium storing instructions that, when executed by a processor a (Novotny Abs: A method for generating object and part detectors includes accessing a collection of training images; [0047] The memory 12 may represent any type of non-transitory computer readable medium such as random access memory (RAM), read only memory (ROM), magnetic disk or tape, optical disk, flash memory, or holographic memory; [0049] The digital processor device 16).
The metes and bounds of the rest of the limitations of the claim substantially correspond to the claim as set forth in Claim 1; thus they are rejected on similar grounds and rationale as their corresponding limitations.

Regarding Claim 16, Novotny in view of Hamada teaches a system (Novotny [0020] FIG. 4 is a functional block diagram of a system for training a model for object/part detection), comprising : 
a memory storing one or more instructions (Novotny [0047] The memory 12); and 
a processor that executes the one or more instructions to at least ([0049] The digital processor device 16; [0050] The term "software," as used herein, is intended to encompass any collection or set of instructions executable by a computer or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software): 
The metes and bounds of the rest of the limitations of the claim substantially correspond to the claim as set forth in Claim 1; thus they are rejected on similar grounds and rationale as their corresponding limitations.

Regarding Claim 20, Novotny in view of Hamada teaches the system of claim 16. The metes and bounds of the claims substantially correspond to the method claim as set forth in Claim 9; thus they are rejected on similar grounds and rationale as their corresponding limitations.

4.	Claims 2-8, 10, 12-15, 17-19 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Novotny et al. (US 20170330059 A1), referred herein as Novotny in  Hamada et al. (US 20150116349 A1), referred herein as Hamada further in view of Madani et al. (US 20190197358 A1), referred herein as Madani.
Regarding Claim 2, Novotny in view of Hamada teaches the method of claim 1, but does not teach further comprising: 
calculating one or more errors associated with the first generator model or the second generator model based on output from discriminator models associated with at least one of the first generator model or the second generator model 
However Madani discloses a machine learning training model that trains an image generator of a generative adversarial network (GAN) to generate medical images approximating actual medical images, which is analogous to the present patent application. Madani teaches further comprising: 
calculating one or more errors associated with the first generator model and the second generator model based on output from discriminator models associated with at least one of the first generator model or the second generator model (Madani [0005] The method further comprises generating, by a generator of the GAN, one or more generated medical images and inputting, to the discriminator of the GAN, a training medical image set comprising a first subset of labeled medical images, a second subset of unlabeled medical images, and a third subset comprising the one or more generated medical images; [0006] The method comprises training the GAN based on labeled image data, unlabeled image data, and generated image data generated by a generator of the GAN. The GAN comprises a loss function that comprises error components for each of the labeled image data, unlabeled image data, and generated image data which is used to train the GAN); and 
at least one of the first generator model or the second generator model based on the one or more errors (Hamada [0082] In the projection transform parameter dictionary P1, one or more pairs of a dictionary edge-based feature and a projection transform parameter are stored. The dictionary edge-based feature and the projection transform parameter are generated in advance by using image data for instruction (training).).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have modified Novotny in view of Hamada to incorporate the teachings of Madani, and applying the discriminator on machine learning system, as taught by Madani into the method and system for joint object and object part detection using web supervision.
Doing so would provide an architecture that can be trained using both labeled and unlabeled image data that are equally applicable regardless of the particular type of image data being operated on in the method and system for joint synthesis and placement of objects in scenes.

Regarding Claim 3, Novotny in view of Hamada further in view of Madani teaches the method of claim 2, and further teaches wherein updating the parameters comprises: 
executing an unsupervised path to update the parameters of the first generator model and the second generator model based on a first error in the one or more errors (Madani [0005] the recently proposed technique of Generative Adversarial Networks (GANs) repurposes the min/max paradigm from game theory to generate images in an 
executing a supervised path comprising ground truths for the first generator model and the second generator model to update the parameters of the first generator model and the second generator model based on a second error in the one or more errors (Novotny [0130] Sometimes it is beneficial to combine the extremely noisy annotations obtained from Web supervision with a small amount of strongly supervised annotations, such as manually-generated, labeled bounding boxes. MIL can be modified to incorporate one or more single strongly-annotated examples).

Regarding Claim 4, Novotny in view of Hamada further in view of Madani teaches the method of claim 3, and further teaches wherein the first error comprises an unsupervised adversarial loss that is calculated from a first discriminator model for at least one of the first or second generator models (Madani [0006] instructions that are executed by the processor to configure the processor to implement a generative adversarial network (GAN). The GAN comprises a loss function that comprises error components for each of the labeled image data, unlabeled image data, and generated image data which is used to train the GAN; [0062] The GAN architecture 200 shown in FIGS. 2A and 2B may be trained using a semi-supervised training technique. The main difference between a semi-supervised GAN implementation and an unsupervised GAN 

Regarding Claim 5, Novotny in view of Hamada further in view of Madani teaches the method of claim 4, and further teaches wherein the second error comprises a supervised adversarial loss that is calculated from a second discriminator model for at least one of the first or second generator models (Madani [0062] The GAN architecture 200 shown in FIGS. 2A and 2B may be trained using a semi-supervised training technique. The main difference between a semi-supervised GAN implementation and an unsupervised GAN is the structure of the loss function of the neural network of the discriminator D 250 to incorporate both labeled and unlabeled real image data).

Regarding Claim 6, Novotny in view of Hamada further in view of Madani teaches the method of claim 3, and further teaches wherein the first error comprises a reconstruction loss associated with random input to at least one of the first or second generator models (Madani [0027] VAEs attempt to find the variational lower bound of the probability density function with a loss function that consists of a reconstruction error and regularizer; [0029] FIG. 1 is an example block diagram of a generative adversarial network (GAN). As shown in FIG. 1, the generator, G, takes a vector z, sampled from random Gaussian noise or conditioned with structured input).

Regarding Claim 7, Novotny in view of Hamada further in view of Madani teaches the method of claim 2, and further teaches wherein a first discriminator model or fake or an affine discriminator model that categorizes the affine transformation as real or fake (Novotny Claim 5. The method of claim 3, wherein the generating of the geometric embedding of the region comprises: identifying similar regions in a set of training images based on the appearance-based representations of the regions; for pairs of training images in the set of training images, learning a pairwise transformation to align a pair of training images in the set, based on respective locations of at least some of the similar regions in the pair of images; and generating an image transformation for mapping each training image in the set to a common frame based on the pairwise transformations for the training image; and computing the geometric embedding for regions of the training images in the set based on the respective image transformation; [0029] FIG. 1 is an example block diagram of a generative adversarial network (GAN). As shown in FIG. 1, the generator, G, takes a vector z, sampled from random Gaussian noise or conditioned with structured input, and transforms the noise to p.sub.G=G(z) to mimic the data distribution, p.sub.data. Batches of the generated (fake) images and real images are sent to the discriminator, D, where the discriminator assigns a label 0 for real or a label 1 for fake).

Regarding Claim 8, Novotny in view of Hamada further in view of Madani teaches the method of claim 2, and further teaches wherein a first discriminator model associated with the second generator model comprises a layout discriminator model that categorizes a location of the shape as real or fake or a shape discriminator model Madani [0067] As shown in FIG. 4, the sampled generated (fake) images capture the global structural elements such as the lungs, spine, heart, and visual signatures such as the ribs, aortic arch, and the unique curvature of the lower lungs).

Regarding Claim 10, Novotny in view of Hamada teaches the method of claim 1, does not teach, but Madani teaches wherein each of the first generator model and the second generator model comprises at least one of a variational autoencoder (VAE) or a spatial transformer network (Madani [0027] the approach of learning the underlying distribution has had considerable success with the advent of variational auto-encoders ( VAEs); [0032] standard augmentation methods that produce new examples of data merely involve varying lighting, field of view, and spatial rigid transformations). Same motivation as Claim 2 applies here.

Regarding Claims 12-15, Novotny in view of Hamada further in view of Madani teaches the non-transitory computer readable medium of claim 11. The metes and bounds of the claims substantially correspond to the method claim as set forth in Claims 2-8; thus they are rejected on similar grounds and rationale as their corresponding limitations.

Regarding Claims 17-19, Novotny in view of Hamada further in view of Madani teaches the system of claim 16. The metes and bounds of the claims substantially 

Regarding Claim 21, Novotny in view of Hamada teaches the method of claim 1, but does not teach the claimed limitation therein. However Madani teaches wherein the affine transformation is generated using a neural network (Madani [0027] Generative machine learning models have the potential to generate new dataset samples. The two main approaches of deep generative models involve either learning the underlying data distribution or learning a function to transform a sample from an existing distribution to the data distribution of interest; [0032] Augmentation of a dataset is a widely used practice in deep learning to enrich the data in data-limited scenarios and to avoid overfitting. However, standard augmentation methods that produce new examples of data merely involve varying lighting, field of view, and spatial rigid transformations, for example). Same motivation as Claim 2 applies here.


Response to Arguments
Applicant's arguments filed on June 16, 2021, with respect to the 103 rejection have been fully considered but they are not persuasive.
On pages 7-9, Applicant's Remarks, with respect to claim 1, the applicant argues “the cited portion of Novotny does not teach or suggest "applying a first generator model to input comprising a semantic representation of an image to generate an affine transformation with a neural network" as recited in amended claim 1.” Examiner respectfully disagrees with that argument. In response to applicant's argument that the neural network”) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). Regarding this argument, it is respectfully noted that, Novotny teaches the claimed limitation of “applying a second generator model to the affine transformation and the semantic representation to generate a shape of an object.”
On page 10 of Applicant’s Remarks, the Applicant argues that the independent claims 11 and 16 are not taught by the prior art for reasons similar to those discussed in regard to claim 1, and that the dependent claims are not taught by the prior art, insomuch as they depend from claims that are not taught by the prior art. Examiner respectfully disagrees with these arguments, for the reasons discussed above.


Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Samantha (YUEHAN) WANG whose telephone number is (571)270-5011.  The examiner can normally be reached on Monday-Friday, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 


/Samantha (YUEHAN) WANG/
Primary Examiner
Art Unit 2611