DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-27 are pending under this Office action.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-15, 17-25, and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Kaufhold, etc. (US 20190080205 A1) in view of Fu, etc. (US 20190295302 A1).
Regarding claim 1, Kaufhold teaches that a computer-implemented method of training a generator neural network and an encoder neural network (See Kaufhold: Fig. 12, and [0106], “FIG. 12 is a block diagram of an embodiment illustrating the process of training an object recognizer using translated images in addition to training images. During training of the object recognizer 1219, a relatively sparse set of training images 1213, and a relatively dense set of translated images 1215 are used. The translated images 1215 are produced by the DMTG 1203, and specifically the translator 1205 after it has been trained according to methods described herein. The translator 1205 includes an autoencoder 1207, which may contain an encoder 1209 and a decoder 1211. The encoder 1209 compresses the data whereby N dimensions are mapped to M dimensions and M<N. The decoder 1211 uncompresses the data mapping the M dimensions to N dimensions using a multi-layered convolutional neural network. The translator may be used to populate a relatively dense database of translated images 1215”),
wherein the generator neural network is configured to generate, based on a set of latent values, data items which are samples of a distribution representing a set of training data items (See Kaufhold: Fig. 13, and [0047], “Second, embodiments of the invention can significantly increase the size of a training data by generating images that are similar to each image in the training data set. FIG. 13, described in more detail below, illustrates the interactions between image generation and object recognition”; and [0042], “Formally, a generative adversarial network defines a model G and a model D. Model D distinguishes between samples from G and samples h from its own distribution. Model G takes random noise, defined by z, as input and produces a sample h. The input received by D can be from h or h. Model D produces a probability indicating whether the sample is input that fits into the distribution or not”);
wherein the encoder neural network is configured to generate a set of latent values for a respective data item (See Kaufhold: Fig. 19, and [0097], “As depicted in FIG. 19, a translation process can have two phases. First, an encoder can encode an image into a hidden representation. In the hidden representation, the dimensionality of the encoded image is reduced, which enables certain features of the image to be identified. Then, the encoded representation is decoded into a complete image, completing the translation. This translation process can be performed on an entire set of training images, which enables the network to learn the translation. This training process is based on the pairing mechanisms illustrated in FIG. 6. By using an image pairing technique 605 during training, whereby images from two domains (603, 601) are paired, a translator 609 can be trained to encode the images from one domain into images from another domain. For example, as shown in FIG. 15 an autoencoder might be trained to encode poor quality images of cats (e.g., 1501) into photo realistic images of cats (e.g., 1505)”);
wherein the method comprises jointly training the generator neural network, the encoder neural network and a discriminator neural network configured to distinguish between samples generated by the generator network and samples of the distribution which are not generated by the generator network (See Kaufhold: Fig. 7, and [0101], “FIG. 7 is a block diagram illustrating an embodiment comprising training a DMTG generator using noise as an initialization. The DMTG 707 includes a generator 709, which is composed of a GAN 711 comprising a discriminator neural network 713 and a generative neural network 715. Given a relatively sparse set of training images 701 as input into the discriminator neural network 713 and Gaussian noise 705 and defined by the mathematical equation 703 as input into the generative neural network 715, the two networks in the GAN 711 work together to train the generator neural network 715 to produce images that are representative of the training examples 701 used to train the discriminative network 713. The discriminative neural network 713 is aware of the original training images 701. The generative neural network 715 is not given data to begin with; instead it is initialized with Gaussian noise 705. The two networks play an adversarial game where the generator 715 generates images 717 that are then processed by the discriminator 713. The discriminator 713 classifies the generated images 717 as either similar to the training data set 701 or not. The generator 715 may use this information to learn how to generate new images 717 that are better than the last iteration of generated images. The output from the discriminator 713, may be used directly in the loss function for the generator 715. The loss function changes how the generator 715 will generate the next set of images. Once trained, DMTG 707 can be called upon to produce generated images. The output is a comparatively dense set of generated images 717”), and
wherein the discriminator neural network is configured to distinguish by processing, by the discriminator neural network (See Kaufhold: Fig. 9, and [0103], “FIG. 9 is a block diagram illustrating an embodiment comprising training a DMTG generator using synthetic images as an initialization. The DMTG 907 includes a generator 909, which is composed of a GAN 911 comprising a discriminator neural network 913 and a generative neural network 915. Given a relatively sparse set of training images 901 as input into the discriminator network 913 and set of separately created synthetic images 903 as input into the generative network 915 (note this input 905 is different than Gaussian noise), these two networks work together to train the generator network 915 to produce images that are representative of the training examples 901 used to train the discriminative network 913. The discriminative neural network 913 is aware of the original training images 901. The generative neural network 915 is not given data to begin with; instead it is initialized with synthetic images 903. The two networks play an adversarial game where the generator 915 generates images 917 that are then processed by the discriminator 913. The discriminator 913 classifies the generated images 917 as either similar to the training data set 901 or not. The generator 915 may use this information to learn how to generate new images 917 that are better than the last iteration of generated images. The output from the discriminator 913, may be used directly in the loss function for the generator 915. The loss function changes how the generator 915 will generate the next set of images. Once trained, DMTG 907 can be called upon to produce generated images. The output is a comparatively dense set of generated images 917”), an input pair comprising a sample part and a latent part (See Kaufhold: Fig. 6, and [0100], “FIG. 6 is a block diagram illustrating an embodiment comprising training a DMTG translator. Given a relatively sparse set of training images 601 and a set of separately created synthetic images 603, a set of image pairings 605 indicates which synthetic images 603 align with corresponding training images 601. The pairings 605 are input to the translator 609, which is part of DMTG 607. The translator 609 comprises an autoencoder 611, which includes an encoder 613 and decoder 615. Encoder 613 compresses the data whereby N dimensions are mapped to M dimensions and M<N. Decoder 615 uncompresses the data mapping the M dimensions to N dimensions using a multi-layered convolutional neural network. Each time an image from synthetic images 603 goes through the encoder 613 and decoder 615, the network in translator 609 learns how to take the compressed information based on a synthetic image 603 to generate a training image such as the training images in 601 based on pairings 605. Once trained, DMTG 607 can be called upon with any unpaired synthetic image to produce a translated version of the image. The output of the translator 609 is a comparatively dense set of translated images 617”);
wherein the sample and latent parts of the input pair comprise either a sample of the distribution generated by the generator neural network and the corresponding set of latent values used to generate the sample respectively, or a training data item of the set of training data items and a set of latent values generated by the encoder neural network based upon the training data item (See Kaufhold: Fig. 1, and [0042], “Formally, a generative adversarial network defines a model G and a model D. Model D distinguishes between samples from G and samples h from its own distribution. Model G takes random noise, defined by z, as input and produces a sample h. The input received by D can be from h or h. Model D produces a probability indicating whether the sample is input that fits into the distribution or not”); and
wherein the training is based upon a loss function (See Kaufhold: Fig. 8, and [0102], “FIG. 8 is a block diagram illustrating an embodiment comprising training a DMTG generator using translated images as an initialization. The DMTG 807 includes a generator 809, which is composed of a GAN 811 comprising a discriminator neural network 813 and a generative neural network 815. Given a relatively sparse set of training images 801 as input into the discriminator network 813 and translated images 803 as input into the generative network 815 (note this input 805 is different than Gaussian noise), these two networks in the GAN 811 work together to train the generator neural network 815 to produce images that are representative of the training examples 801 used to train the discriminative network 813. The discriminative neural network 813 is aware of the original training images 801. The generative neural network 815 is not given data to begin with; instead it is initialized with translated images 803. The two networks play an adversarial game where the generator 815 generates images 817 that are then processed by the discriminator 813. The discriminator 813 then classifies the generated images 817 as either similar to the training data set 801 or not. The generator 815 may use this information to learn how to generate new images 817 that are better than the last iteration of generated images. The output from the discriminator 813, may be used directly in the loss function for the generator 815. The loss function changes how the generator 815 will generate the next set of images. Once trained, DMTG 807 can be called upon to produce generated images. The output is a comparatively dense set of generated images 817”) comprising a joint discriminator loss term based upon the sample and latent parts of the input pair processed by the discriminator neural network and a single discriminator loss term based upon only one of the sample or latent parts of the input pair.
However, Kaufhold fails to explicitly disclose that wherein the generator neural network is configured to generate, based on a set of latent values; and a loss function comprising a joint discriminator loss term based upon the sample and latent parts of the input pair processed by the discriminator neural network and a single discriminator loss term based upon only one of the sample or latent parts of the input pair.
However, Fu teaches that wherein the generator neural network is configured to generate, based on a set of latent values (See Fu: Fig. 2A, and [0127], “In contrast to the existing methods, embodiments of the present invention provide a SCGAN that takes latent vectors, attribute labels, and semantic segmentations as inputs, and decouples the image generation into three dimensions. As such, embodiments of the SCGAN are capable of generating images with controlled spatial contents and attributes and generate target images with a large diversity”); and 
a loss function comprising a joint discriminator loss term based upon the sample and latent parts of the input pair processed by the discriminator neural network and a single discriminator loss term based upon only one of the sample or latent parts of the input pair (See Fu: Equation (10) and [0089], “where λ.sub.1, λ.sub.2, and λ.sub.3 are hyper-parameters which control the weights of classification loss, segmentation loss, and reconstruction loss. These weights act as relatively importance of those terms compared to adversarial loss. According to an embodiment, the weights are hyper-parameters chosen by user. The weights (hyper-parameters) can be tuned to affect how the generated images look. In an embodiment, the loss terms are constraints and regulations. In an embodiment, the generator will trade off those constraints (the loss terms) in generating the final output image. A larger hyper-parameter indicates a larger impact of that specific loss term. For example, increasing λ.sub.2 will let the generated image be more consistent with the target segmentation. Since A.sub.c is embedded in D and shares the same weights except the output layer, A.sub.c is trained together with D using discriminator loss custom-character.sub.D which contains both the adversarial term and the classification term on real image samples”).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Kaufhold to have wherein the generator neural network is configured to generate, based on a set of latent values; and a loss function comprising a joint discriminator loss term based upon the sample and latent parts of the input pair processed by the discriminator neural network and a single discriminator loss term based upon only one of the sample or latent parts of the input pair as taught by Fu in order to improve generator ability to generate images that are in accordance with desired attributes and target segmentation (See Fu: Figs. 2A-D, and [0046], “In the training procedure 270, the generator 220 is configured to receive the target segmentation 271, desired attributes vector 272, and real image 273 and from the inputs 271-273, generate the image 274. Further, the generator 220 (which is depicted twice in FIG. 2D to show additional processing) is configured to perform a reconstruction process that attempts to reconstruct the input image 273 using a segmentation 275 that is based on the real image 273, attributes 276 of the real image 273, and the generated image 274. To train the generator 220, the generated image 274 is provided to the discriminator 240. The discriminator 240 makes a determination if the image 274 is real or fake and also determines attributes of the image 274. Then, based on these determinations, weights of the neural network implementing the generator 220 are adjusted so as to improve the generator's 240 ability to generate images that are in accordance with the desired attributes 272 and target segmentation 271 while also being indistinguishable from real images. Similarly, the generator 220 is also adjusted, i.e., weights of the neural network implementing the generator 220 are adjusted based on the reconstruction loss. In an embodiment, the reconstruction loss is the difference between the reconstructed image 277 and the real image 273”). Kaufhold teaches a method and system that may train the neural network to recognize the target object in the images by collecting the training images assembled from set of real-world image of the target object; while Fu teaches a system and method that may train the neural network with a set of training images and iteratively updating the latent vectors and operation of training generator and discriminator by minimizing the loss functions that comprises loss from each neural network. Therefore, it is obvious to one of ordinary skill in the art to modify Kaufhold by Fu to update the latent values by minimizing the loss function. The motivation to modify Kaufhold by Fu is “Use of known technique to improve similar devices (methods, or products) in the same way”.
Regarding claim 2, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. Further, Fu teaches that the method of claim 1, wherein the single discriminator loss term is based upon the sample part of the input pair (See Fu: Equation (1) and [0079], “To regulate the generated face image to comply with the target semantic segmentation, embodiments employ a segmentation loss which acts as an additional regulation and guides the generator G to generate target fake images. Taking a real image sample x as input, the generated segmentation S(x) is compared with the source segmentation s to optimize the segmentor S. The loss function is given by”).
Regarding claim 3, Kaufhold and Fu teach all the features with respect to claim 2 as outlined above. Further, Kaufhold teaches that the method of claim 2, wherein the single discriminator loss term comprises a sample discrimination score generated based upon processing the sample part of the input pair using a sample discriminator sub-network (See Kaufhold: Fig. 1, and [0038], “In order for the network to improve over time, it minimizes an objective function that tries to minimize the negative log-likelihood: AE(θ)=Σ.sub.x∈D.sub.nL(x,g(ƒ(x))),”).
Regarding claim 4, Kaufhold and Fu teach all the features with respect to claim 3 as outlined above. Further, Kaufhold teaches that the method of claim 3, wherein the sample discrimination score is further generated based upon applying a projection to the output of the sample discriminator sub-network (See Kaufhold: Fig. 1, and [0034], “Formally, within an autoencoder, a function maps input data to a hidden representation using a non-linear activation function. This is known as the encoding: z=ƒ(x)=s.sub.ƒ(W.sub.x+b.sub.z)”).
Regarding claim 5, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. Further, Kaufhold teaches that the method of claim 1, wherein the sample discriminator sub-network is based upon a convolutional neural network (See Kaufhold: Fig. 4, and [0098], “FIG. 4 is a block diagram illustrating how an object recognizer's performance can be improved by using image translation in addition to the original training data set. DMTG 401 includes a translator 403 comprising an autoencoder 405, which includes an encoder 407 and decoder 409. The encoder 407 compresses the data whereby N dimensions are mapped to M dimensions and M<N. The decoder 409 uncompresses the data mapping the M dimensions to N dimensions using a multi-layered convolutional neural network. The original training images 411 and translated images 413 created by 401 are used to train the object recognizer 415. The results of performing object recognition on an unknown data set are shown on graph 417, which plots the objects recognized given the number of training examples and the accuracy. Points 419 show that the object recognizer 415 is able to achieve high accuracy rates with few real-world (original) training examples because the object recognizer 415 was trained with supplemental translated images 413”).
Regarding claim 6, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. Further, Fu teaches that the method of claim 1, wherein the single discriminator loss term is based upon the latent part of the input pair (See Fu: Figs. 2A-D, and [0045], “In the training procedure 270, the segmentor 260 receives a target segmentation 271 and a generated image 274 produced by the generator 220. Then, based upon a segmentation loss, i.e., the difference between a segmentation determined from the generated image 274 and the target segmentation 271, the segmentor 260 is adjusted, e.g., weights in a neural network implementing the segmentor 260 are modified so the segmentor 260 produces segmentations that are closer to the target segmentation 271. The generator 240 is likewise adjusted based upon the segmentation loss to generate images that are closer to the target segmentation 271. In this way, the segmentor 260 and generator 220 are trained collaboratively”; and [0063], “In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly. When S is trained together with G, the fake image denoted by G(x, s′, c′) is fed to S to obtain the image's estimated segmentation S(G(x, s′, c′)) which is compared with a target segmentation s′ to calculate a segmentation loss. When optimizing G, with minimizing the segmentation loss providing gradient information, G tends to translate the input image to be consistent with s′. To better utilize the information in s′, s′ is annotated as a k-channel image where each pixel is represented by a one-hot vector indicating its class index. Then s′ is concatenated to x in channel dimension before feeding into the generator. In summary, such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation”).
Regarding claim 7, Kaufhold and Fu teach all the features with respect to claim 6 as outlined above. Further, Kaufhold teaches that the method of claim 6, wherein the single discriminator loss term comprises a latent discrimination score generated based upon processing the latent part of the input pair using a latent discriminator sub-network (See Kaufhold: Fig. 1, and [0038], “In order for the network to improve over time, it minimizes an objective function that tries to minimize the negative log-likelihood: AE(θ)=Σ.sub.x∈D.sub.nL(x,g(ƒ(x))),”).
Regarding claim 8, Kaufhold and Fu teach all the features with respect to claim 7 as outlined above. Further, Kaufhold teaches that the method of claim 7, wherein the latent discrimination score is further generated based upon applying a projection to the output of the latent discriminator sub-network (See Kaufhold: Fig. 1, and [0034], “Formally, within an autoencoder, a function maps input data to a hidden representation using a non-linear activation function. This is known as the encoding: z=ƒ(x)=s.sub.ƒ(W.sub.x+b.sub.z)”).
Regarding claim 9, Kaufhold and Fu teach all the features with respect to claim 7 as outlined above. Further, Kaufhold teaches that the method of claim 7, wherein the latent discriminator sub-network is based upon a multi-layer perceptron (See Kaufhold: Fig. 6, and [0100], “FIG. 6 is a block diagram illustrating an embodiment comprising training a DMTG translator. Given a relatively sparse set of training images 601 and a set of separately created synthetic images 603, a set of image pairings 605 indicates which synthetic images 603 align with corresponding training images 601. The pairings 605 are input to the translator 609, which is part of DMTG 607. The translator 609 comprises an autoencoder 611, which includes an encoder 613 and decoder 615. Encoder 613 compresses the data whereby N dimensions are mapped to M dimensions and M<N. Decoder 615 uncompresses the data mapping the M dimensions to N dimensions using a multi-layered convolutional neural network. Each time an image from synthetic images 603 goes through the encoder 613 and decoder 615, the network in translator 609 learns how to take the compressed information based on a synthetic image 603 to generate a training image such as the training images in 601 based on pairings 605. Once trained, DMTG 607 can be called upon with any unpaired synthetic image to produce a translated version of the image. The output of the translator 609 is a comparatively dense set of translated images 617”).
Regarding claim 10, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. Further, Fu teaches that the method of claim 1, wherein the loss function comprises a plurality of single discriminator loss terms (See Fu: Equations (1-2), and [0079], “To regulate the generated face image to comply with the target semantic segmentation, embodiments employ a segmentation loss which acts as an additional regulation and guides the generator G to generate target fake images. Taking a real image sample x as input, the generated segmentation S(x) is compared with the source segmentation s to optimize the segmentor S. The loss function is given by: custom-character.sub.seg.sup.real=custom-character.sub.x,s[A.sub.s(s,S(x)],  (1) Where A.sub.s(⋅,⋅) computes cross-entropy loss pixel-wisely by: [00001] A s  ( a , b ) = - .Math. i = 1 H .Math. .Math. j = 1 W .Math. .Math. k = 1 n s .Math. a i , j , k .Math. log .Math. .Math. b i , j , k , ( 2 ) with a and b being two segmentation maps of size (H×W×n.sub.s)”).
Regarding claim 11, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. Further, Fu teaches that the method of claim 1, wherein the joint discriminator loss term comprises a joint discrimination score generated using a joint discriminator sub-network (See Fu: Fig. 3, and [0054], “The fake adversarial loss term 313, the fake segmentation loss 309, the fake classification loss 314, and the reconstruction loss 318 are used by the optimizer 310 to optimize the generator 301. In an embodiment, the optimizer 310 sums up the loses 313, 309, 314, and 318 with weights, i.e., weights the losses differently, to determine a generator loss, which is used by the optimizer 310 to do back-propagation and update the parameters in a neural network implementing the generator 301. According, to an embodiment, losses are summed as shown in the equation below: custom-character.sub.G=custom-character.sub.adv+λ.sub.1custom-character.sub.cls.sup.fake+λ.sub.2custom-character.sub.seg.sup.fake+λ.sub.3custom-character.sub.rec  (10) where the weights λ.sub.1, λ.sub.2, and λ.sub.3 are hyper-parameters chosen by a user”).
Regarding claim 12, Kaufhold and Fu teach all the features with respect to claim 11 as outlined above. Further, Fu teaches that the method of claim 11, wherein the joint discriminator sub-network is configured to process the output of a sample discriminator sub-network and the output of a latent discriminator sub-network, wherein the sample discriminator sub-network is configured to process the sample part of the input pair and the latent discriminator sub-network is configured to process the latent part of the input pair (See Fu: Fig. 3, and [0051], “During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307. After generating the image 307, there are three paths. The first path is to input the generated image 307 to the segmentor 303. The segmentor 303 estimates a semantic segmentation 308 from the generated image 307, and the estimated segmentation 308 is then compared with the target segmentation 305 to calculate a fake segmentation loss 309 which is provided to the generator optimizer 310. According to an embodiment, loss calculations are determined by one or more optimizers, e.g., the optimizer 310, orchestrating the training process”).
Regarding claim 13, Kaufhold and Fu teach all the features with respect to claim 11 as outlined above. Further, Kaufhold teaches that the method of claim 11, wherein the joint discrimination score further generated based upon applying a projection to the output of the joint discriminator sub-network (See Kaufhold: Fig. 1, and [0034], “Formally, within an autoencoder, a function maps input data to a hidden representation using a non-linear activation function. This is known as the encoding: z=ƒ(x)=s.sub.ƒ(W.sub.x+b.sub.z)”).
Regarding claim 14, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. Further, Kaufhold teaches that the method of claim 1, wherein the joint discriminator sub-network is based upon a multi-layer perceptron (See Kaufhold: Fig. 6, and [0100], “FIG. 6 is a block diagram illustrating an embodiment comprising training a DMTG translator. Given a relatively sparse set of training images 601 and a set of separately created synthetic images 603, a set of image pairings 605 indicates which synthetic images 603 align with corresponding training images 601. The pairings 605 are input to the translator 609, which is part of DMTG 607. The translator 609 comprises an autoencoder 611, which includes an encoder 613 and decoder 615. Encoder 613 compresses the data whereby N dimensions are mapped to M dimensions and M<N. Decoder 615 uncompresses the data mapping the M dimensions to N dimensions using a multi-layered convolutional neural network. Each time an image from synthetic images 603 goes through the encoder 613 and decoder 615, the network in translator 609 learns how to take the compressed information based on a synthetic image 603 to generate a training image such as the training images in 601 based on pairings 605. Once trained, DMTG 607 can be called upon with any unpaired synthetic image to produce a translated version of the image. The output of the translator 609 is a comparatively dense set of translated images 617”).
Regarding claim 15, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. Further, Fu teaches that the method of claim 1, wherein the loss function is based upon a summation of the joint discriminator loss term and the single discriminator loss term (See Fu: Equations (1-2), and [0079], “To regulate the generated face image to comply with the target semantic segmentation, embodiments employ a segmentation loss which acts as an additional regulation and guides the generator G to generate target fake images. Taking a real image sample x as input, the generated segmentation S(x) is compared with the source segmentation s to optimize the segmentor S. The loss function is given by: custom-character.sub.seg.sup.real=custom-character.sub.x,s[A.sub.s(s,S(x)],  (1) Where A.sub.s(⋅,⋅) computes cross-entropy loss pixel-wisely by: [00001] A s  ( a , b ) = - .Math. i = 1 H .Math. .Math. j = 1 W .Math. .Math. k = 1 n s .Math. a i , j , k .Math. log .Math. .Math. b i , j , k , ( 2 ) with a and b being two segmentation maps of size (H×W×n.sub.s)”).
Regarding claim 17, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. Further, Kaufhold teaches that the method of claim 1, wherein the encoder neural network represents a probability distribution and generating a set of latent values comprises sampling from the probability distribution (See Kaufhold: Fig. 1, and [0041], “A generative adversarial network (“GAN”) [see endnote 1] is a network made of two deep networks. The two networks can be fully connected where each neuron in layer l is connected to every neuron in layer l−1, or can include convolutional layers, where each neuron in layer l is connected to a few neurons in layer l−1. The GANs used in embodiments of the invention encompass a combination of fully connected layers and convolutional layers. One of the networks is typically called the discriminative network and the other is typically called the generative network. The discriminative network has knowledge of the training examples. The generative network does not, and tries to ‘generate new samples,’ typically beginning from noise. The generated samples are fed to the discriminative network for evaluation. The discriminative network provides an error measure to the generative network to convey how ‘good’ or ‘bad’ the generated samples are, as they relate to the data distribution generated from the training set”).
Regarding claim 18, Kaufhold and Fu teach all the features with respect to claim 17 as outlined above. Further, Fu teaches that the method of claim 17, wherein the output of the encoder neural network has a mean and standard deviation for defining a normal probability distribution (See Fu: Fig. 3, and [0148], “An embodiment utilizes a training procedure for training the network including the generator, discriminator, and segmentor. In one such example embodiment, let θ.sub.G, θ.sub.D, and θ.sub.S be the parameters of the networks G, D, and S, respectively. In such an embodiment, the objective is to find a converged θ.sub.G with minimized custom-character.sub.G. According to an embodiment, when training the proposed SCGAN, a batch of latent vectors are sampled from a Gaussian distribution custom-character(0, 1) (which refers to a normal distribution with mean 0 and variation 1). A batch of real images each with a ground-truth segmentation and attribute labels are randomly sampled from the joint distribution custom-character.sub.data(x, c, s) of a dataset. To avoid over-fitting, s may be randomly shuffled to obtain target segmentation s.sub.t to be input to θ.sub.G. According to an embodiment, first, D is trained with x and c by optimizing custom-character.sub.D. Then, S is trained with x and s by optimizing the objective custom-character.sub.seg.sup.real. D and S are trained repeatedly, e.g., five times, before training G. G takes z, c, and s.sub.t as inputs and generates a fake image G(z, c, s.sub.t), which is input to D and S to calculate the loss terms custom-character.sub.adv, custom-character.sub.cls.sup.fake, and custom-character.sub.seg.sup.fake. G is optimized by minimizing the full objective custom-character.sub.G. According to an embodiment, when training the generator, segmentor, and discriminator, λ.sub.cls=5, λ.sub.seg=1, λ.sub.gp=10, n.sub.repeat=5, and a batch size m=16 is used”).
Regarding claim 19, Kaufhold and Fu teach all the features with respect to claim 17 as outlined above. Further, Fu teaches that the method of claim 17, wherein the set of latent values is generated based upon a reparameterized sampling (See Fu: Fig. 3, and [0055], “To train the discriminator 302, the input source image 304 is fed to the discriminator 302 which generates the discrimination result 319 and classification result 320. The discrimination result 319 is used to calculate a real adversarial loss term 321, and the classification result 320 is compared with the real source attributes label 316 to calculate a real classification loss 322. The fake adversarial losses 313, real adversarial losses 321, and the real classification loss 322 are summed up and fed to the optimizer 323 to optimize the discriminator 302. In an embodiment, optimizing the discriminator 302 includes performing a back-propagation and updating the parameters, e.g., weights, in a neural network implementing the discriminator 302”).
Regarding claim 20, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. Further, Kaufhold teaches that the method of claim 1, wherein the encoder neural network is based upon a convolutional neural network (See Kaufhold: Fig. 1, and [0040], “There are different variants of autoencoders: from fully connected to convolutional. With fully connected autoencoders, neurons contained in a particular layer are connected to each neuron in the previous layer. (A “neuron” in an artificial neural network is a mathematical approximation of a biological neuron. It receives a vector of inputs, performs a transformation on them, and outputs a single scalar value.) With convolutional layers, the connectivity of neurons is localized to a few nearby neurons in the previous layer. For image based tasks convolutional autoencoders are the standard. In embodiments of this invention, when autoencoders are referenced, it is implied that the convolutional variant may be used”).
Regarding claim 21, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. Further, Kaufhold teaches that the method of claim 1, wherein the training further comprises alternating updates of the discriminator neural network parameters and updates of the encoder neural network parameters and generator neural network parameters, wherein the updates are generated based upon the loss function (See Fu: Fig. 3, and [0054], “The fake adversarial loss term 313, the fake segmentation loss 309, the fake classification loss 314, and the reconstruction loss 318 are used by the optimizer 310 to optimize the generator 301. In an embodiment, the optimizer 310 sums up the loses 313, 309, 314, and 318 with weights, i.e., weights the losses differently, to determine a generator loss, which is used by the optimizer 310 to do back-propagation and update the parameters in a neural network implementing the generator 301. According, to an embodiment, losses are summed as shown in the equation below”; [0055], “To train the discriminator 302, the input source image 304 is fed to the discriminator 302 which generates the discrimination result 319 and classification result 320. The discrimination result 319 is used to calculate a real adversarial loss term 321, and the classification result 320 is compared with the real source attributes label 316 to calculate a real classification loss 322. The fake adversarial losses 313, real adversarial losses 321, and the real classification loss 322 are summed up and fed to the optimizer 323 to optimize the discriminator 302. In an embodiment, optimizing the discriminator 302 includes performing a back-propagation and updating the parameters, e.g., weights, in a neural network implementing the discriminator 302”; and [0056], “To train the segmentor 303, the input source image 304 is input to the segmentor 303 to obtain an estimated semantic segmentation 324. Then, this estimated segmentation 324 is compared with a ground-truth source segmentation 315, which may be a landmark based segmentation, to calculate a real segmentation loss 325. The optimizer 326 utilizes this loss 325 to do back-propagation and update the parameters in a neural network implementing the segmentor 303”).
Regarding claim 22, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. Further, Fu teaches that the method of claim 1, wherein the training further comprises jointly updating the encoder neural network parameters and generator neural network parameters (See Fu: Fig. 21, and [0183], “FIG. 21 is a flow chart of a method 2100 for training an image generator. The method 2100: (i) trains a generator, implemented with a first neural network, to generate a fake image based on a target segmentation, (ii) trains a discriminator, implemented with a second neural network, to distinguish a real image from a fake image and output a discrimination result as a function thereof, and (iii) trains a segmentor, implemented with a third neural network, to generate a segmentation from the fake image. In the method 2100, the generator outputs 2101 a fake image to the discriminator and the segmentor. In turn, the training method 2100 iteratively operates 2102 the generator, discriminator, and segmentor during a training period. The iterative operation 2102 causes the discriminator and generator to train in an adversarial relationship with each other and the generator and segmentor to train in a collaborative relationship with each other. At the end of the training period, the generator's first neural network is trained to generate the fake image based on the target segmentation with more accuracy than at the start of the training period”).
Regarding claim 23, Kaufhold and Fu teach all the features with respect to claim 21 as outlined above. Further, Fu teaches that the method of claim 21, wherein alternating updates of the discriminator neural network parameters and updates of the encoder neural network parameters and generator neural network parameters comprises performing a plurality of updates of the discriminator neural network parameters followed by an update of the encoder neural network parameters and generator neural network parameters (See Fu: Fig. 14, and [0137], “During the training process, the three inputs (target segmentation 1406, target attributes 1405, and latent vector 1404) are fed into the generator 1401 to obtain a generated image 1407. There are two paths after generating the image 1407. The first path is to input the generated image 1407 to the segmentor 1403. The segmentor 1403 estimates a semantic segmentation 1408 from the generated image 1407 and the estimated segmentation 1408 is then compared with the target segmentation 1406 to calculate a fake segmentation loss 1409. The fake segmentation loss 1409 is fed to the optimizer 1410. The second path feeds the generated image 1407 to the discriminator 1402 which generates a discrimination output 1411 and a classification output 1412. The discrimination output 1411 is used to calculate the fake adversarial loss term 1413, and the classification output 1412 is used to calculate a fake classification loss 1414. The fake adversarial loss term 1413, the fake classification loss 1414, and the fake segmentation loss 1409 are all provided to the optimizer 1410 to optimize the generator 1401. In an embodiment, the loses 1410, 1413, and 1414 are summed up with weights as the generator loss and the generator loss is used by the optimizer 1410 to do back-propagation and update parameters in a neural network implementing the generator 1401. It is noted that while multiple optimizers are depicted, e.g., the optimizes 1410, 1421, and 1425, embodiments may utilize any number of optimizers to implement the training procedures described herein”).
Regarding claim 24, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. Further, Kaufhold and Fu teach that a method of performing inference using an encoder neural network (See Kaufhold: Fig. 12, and [0106], “FIG. 12 is a block diagram of an embodiment illustrating the process of training an object recognizer using translated images in addition to training images. During training of the object recognizer 1219, a relatively sparse set of training images 1213, and a relatively dense set of translated images 1215 are used. The translated images 1215 are produced by the DMTG 1203, and specifically the translator 1205 after it has been trained according to methods described herein. The translator 1205 includes an autoencoder 1207, which may contain an encoder 1209 and a decoder 1211. The encoder 1209 compresses the data whereby N dimensions are mapped to M dimensions and M<N. The decoder 1211 uncompresses the data mapping the M dimensions to N dimensions using a multi-layered convolutional neural network. The translator may be used to populate a relatively dense database of translated images 1215”), the method comprising:
processing an input data item using the encoder neural network to generate a set of latent values representing the input data item (See Kaufhold: Fig. 19, and [0097], “As depicted in FIG. 19, a translation process can have two phases. First, an encoder can encode an image into a hidden representation. In the hidden representation, the dimensionality of the encoded image is reduced, which enables certain features of the image to be identified. Then, the encoded representation is decoded into a complete image, completing the translation. This translation process can be performed on an entire set of training images, which enables the network to learn the translation. This training process is based on the pairing mechanisms illustrated in FIG. 6. By using an image pairing technique 605 during training, whereby images from two domains (603, 601) are paired, a translator 609 can be trained to encode the images from one domain into images from another domain. For example, as shown in FIG. 15 an autoencoder might be trained to encode poor quality images of cats (e.g., 1501) into photo realistic images of cats (e.g., 1505)”);
wherein the encoder neural network is jointly trained with a generator neural network configured to generate, based on a set of latent values (See Fu: Fig. 2A, and [0127], “In contrast to the existing methods, embodiments of the present invention provide a SCGAN that takes latent vectors, attribute labels, and semantic segmentations as inputs, and decouples the image generation into three dimensions. As such, embodiments of the SCGAN are capable of generating images with controlled spatial contents and attributes and generate target images with a large diversity”), data items which are samples of a distribution representing a set of training data items (See Kaufhold: Fig. 19, and [0097], “As depicted in FIG. 19, a translation process can have two phases. First, an encoder can encode an image into a hidden representation. In the hidden representation, the dimensionality of the encoded image is reduced, which enables certain features of the image to be identified. Then, the encoded representation is decoded into a complete image, completing the translation. This translation process can be performed on an entire set of training images, which enables the network to learn the translation. This training process is based on the pairing mechanisms illustrated in FIG. 6. By using an image pairing technique 605 during training, whereby images from two domains (603, 601) are paired, a translator 609 can be trained to encode the images from one domain into images from another domain. For example, as shown in FIG. 15 an autoencoder might be trained to encode poor quality images of cats (e.g., 1501) into photo realistic images of cats (e.g., 1505)”), and a discriminator neural network configured to distinguish between samples generated by the generator network and samples of the distribution which are not generated by the generator network (See Kaufhold: Fig. 7, and [0101], “FIG. 7 is a block diagram illustrating an embodiment comprising training a DMTG generator using noise as an initialization. The DMTG 707 includes a generator 709, which is composed of a GAN 711 comprising a discriminator neural network 713 and a generative neural network 715. Given a relatively sparse set of training images 701 as input into the discriminator neural network 713 and Gaussian noise 705 and defined by the mathematical equation 703 as input into the generative neural network 715, the two networks in the GAN 711 work together to train the generator neural network 715 to produce images that are representative of the training examples 701 used to train the discriminative network 713. The discriminative neural network 713 is aware of the original training images 701. The generative neural network 715 is not given data to begin with; instead it is initialized with Gaussian noise 705. The two networks play an adversarial game where the generator 715 generates images 717 that are then processed by the discriminator 713. The discriminator 713 classifies the generated images 717 as either similar to the training data set 701 or not. The generator 715 may use this information to learn how to generate new images 717 that are better than the last iteration of generated images. The output from the discriminator 713, may be used directly in the loss function for the generator 715. The loss function changes how the generator 715 will generate the next set of images. Once trained, DMTG 707 can be called upon to produce generated images. The output is a comparatively dense set of generated images 717”);
wherein the discriminator neural network is configured to distinguish by processing, by the discriminator neural network (See Kaufhold: Fig. 9, and [0103], “FIG. 9 is a block diagram illustrating an embodiment comprising training a DMTG generator using synthetic images as an initialization. The DMTG 907 includes a generator 909, which is composed of a GAN 911 comprising a discriminator neural network 913 and a generative neural network 915. Given a relatively sparse set of training images 901 as input into the discriminator network 913 and set of separately created synthetic images 903 as input into the generative network 915 (note this input 905 is different than Gaussian noise), these two networks work together to train the generator network 915 to produce images that are representative of the training examples 901 used to train the discriminative network 913. The discriminative neural network 913 is aware of the original training images 901. The generative neural network 915 is not given data to begin with; instead it is initialized with synthetic images 903. The two networks play an adversarial game where the generator 915 generates images 917 that are then processed by the discriminator 913. The discriminator 913 classifies the generated images 917 as either similar to the training data set 901 or not. The generator 915 may use this information to learn how to generate new images 917 that are better than the last iteration of generated images. The output from the discriminator 913, may be used directly in the loss function for the generator 915. The loss function changes how the generator 915 will generate the next set of images. Once trained, DMTG 907 can be called upon to produce generated images. The output is a comparatively dense set of generated images 917”), an input pair comprising a sample part and a latent part (See Kaufhold: Fig. 6, and [0100], “FIG. 6 is a block diagram illustrating an embodiment comprising training a DMTG translator. Given a relatively sparse set of training images 601 and a set of separately created synthetic images 603, a set of image pairings 605 indicates which synthetic images 603 align with corresponding training images 601. The pairings 605 are input to the translator 609, which is part of DMTG 607. The translator 609 comprises an autoencoder 611, which includes an encoder 613 and decoder 615. Encoder 613 compresses the data whereby N dimensions are mapped to M dimensions and M<N. Decoder 615 uncompresses the data mapping the M dimensions to N dimensions using a multi-layered convolutional neural network. Each time an image from synthetic images 603 goes through the encoder 613 and decoder 615, the network in translator 609 learns how to take the compressed information based on a synthetic image 603 to generate a training image such as the training images in 601 based on pairings 605. Once trained, DMTG 607 can be called upon with any unpaired synthetic image to produce a translated version of the image. The output of the translator 609 is a comparatively dense set of translated images 617”);
wherein the sample and latent parts of the input pair comprise either a sample of the distribution generated by the generator neural network and the corresponding set of latent values used to generate the sample respectively, or a training data item of the set of training data items and a set of latent values generated by the encoder neural network based upon the training data item (See Kaufhold: Fig. 1, and [0042], “Formally, a generative adversarial network defines a model G and a model D. Model D distinguishes between samples from G and samples h from its own distribution. Model G takes random noise, defined by z, as input and produces a sample h. The input received by D can be from h or h. Model D produces a probability indicating whether the sample is input that fits into the distribution or not”);
wherein the training is based upon a loss function (See Kaufhold: Fig. 8, and [0102], “FIG. 8 is a block diagram illustrating an embodiment comprising training a DMTG generator using translated images as an initialization. The DMTG 807 includes a generator 809, which is composed of a GAN 811 comprising a discriminator neural network 813 and a generative neural network 815. Given a relatively sparse set of training images 801 as input into the discriminator network 813 and translated images 803 as input into the generative network 815 (note this input 805 is different than Gaussian noise), these two networks in the GAN 811 work together to train the generator neural network 815 to produce images that are representative of the training examples 801 used to train the discriminative network 813. The discriminative neural network 813 is aware of the original training images 801. The generative neural network 815 is not given data to begin with; instead it is initialized with translated images 803. The two networks play an adversarial game where the generator 815 generates images 817 that are then processed by the discriminator 813. The discriminator 813 then classifies the generated images 817 as either similar to the training data set 801 or not. The generator 815 may use this information to learn how to generate new images 817 that are better than the last iteration of generated images. The output from the discriminator 813, may be used directly in the loss function for the generator 815. The loss function changes how the generator 815 will generate the next set of images. Once trained, DMTG 807 can be called upon to produce generated images. The output is a comparatively dense set of generated images 817”) comprising a joint discriminator loss term based upon the sample and latent parts of the input pair processed by the discriminator neural network and a single discriminator loss term based upon only one of the sample or latent parts of the input pair (See Fu: Equation (10) and [0089], “where λ.sub.1, λ.sub.2, and λ.sub.3 are hyper-parameters which control the weights of classification loss, segmentation loss, and reconstruction loss. These weights act as relatively importance of those terms compared to adversarial loss. According to an embodiment, the weights are hyper-parameters chosen by user. The weights (hyper-parameters) can be tuned to affect how the generated images look. In an embodiment, the loss terms are constraints and regulations. In an embodiment, the generator will trade off those constraints (the loss terms) in generating the final output image. A larger hyper-parameter indicates a larger impact of that specific loss term. For example, increasing λ.sub.2 will let the generated image be more consistent with the target segmentation. Since A.sub.c is embedded in D and shares the same weights except the output layer, A.sub.c is trained together with D using discriminator loss custom-character.sub.D which contains both the adversarial term and the classification term on real image samples”).
Regarding claim 25, Kaufhold and Fu teach all the features with respect to claim 24 as outlined above. Further, Fu teaches that the method of claim 24, further comprising:
classifying the input data item based upon the latent values representing the input data item (See Fu: Figs. 2A-D, and [0182], “An example embodiment is directed to target-oriented image generation with spatial constraints. An embodiment employs a novel, Spatially Constrained, Generative Adversarial Network (SCGAN) that decouples the spatial constraints from a latent vector and makes them available as additional control signal inputs. A SCGAN embodiment includes a generator network, a discriminator network with an auxiliary classifier, and a segmentor network, which are trained together adversarially. In an embodiment, the generator is specially designed to take a semantic segmentation, a latent vector, and an attribute label as inputs step by step to synthesize a fake image. The discriminator network tries to distinguish between real images and generated images as well as classify the images into attributes. The discrimination and classification results guide the generator to synthesize realistic images with correct target attributes. The segmentor network attempts to conduct semantic segmentations on both real images and fake images to deliver estimated segmentations to guide the generator in synthesizing spatially constrained images. With those networks, example embodiments have increased controllability of an image synthesis task. Embodiment generate target-oriented realistic images guided by semantic segmentations and attribute labels”).
Regarding claim 27, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. Further, Kaufhold and Fu teach that a method of generating a data item using a generator neural network (See Kaufhold: Fig. 12, and [0106], “FIG. 12 is a block diagram of an embodiment illustrating the process of training an object recognizer using translated images in addition to training images. During training of the object recognizer 1219, a relatively sparse set of training images 1213, and a relatively dense set of translated images 1215 are used. The translated images 1215 are produced by the DMTG 1203, and specifically the translator 1205 after it has been trained according to methods described herein. The translator 1205 includes an autoencoder 1207, which may contain an encoder 1209 and a decoder 1211. The encoder 1209 compresses the data whereby N dimensions are mapped to M dimensions and M<N. The decoder 1211 uncompresses the data mapping the M dimensions to N dimensions using a multi-layered convolutional neural network. The translator may be used to populate a relatively dense database of translated images 1215”), the method comprising:
receiving a set of latent values (See Fu: Fig. 2A, and [0127], “In contrast to the existing methods, embodiments of the present invention provide a SCGAN that takes latent vectors, attribute labels, and semantic segmentations as inputs, and decouples the image generation into three dimensions. As such, embodiments of the SCGAN are capable of generating images with controlled spatial contents and attributes and generate target images with a large diversity”);
processing the set of latent values using the generator neural network to generate a data item (See Fu: Figs. 1-12, and [0119], “Image generation has raised tremendous attention in both academic and industrial areas, especially for conditional and target oriented image generation, such as, criminal portrait and fashion design. Although current studies have achieved preliminary results along this direction, existing methods focus on class labels where spatial contents are randomly generated from a latent vector, and edge details or spatial information is blurred or difficult to preserve. In light of this, an embodiment of the present invention implements a novel Spatially Constrained Generative Adversarial Network (SCGAN) that decouples the spatial constraints from the latent vector and makes them available as additional control signals”);
wherein the generator neural network is jointly trained with an encoder neural network configured to generate a set of latent values for a respective data item (See Kaufhold: Fig. 19, and [0097], “As depicted in FIG. 19, a translation process can have two phases. First, an encoder can encode an image into a hidden representation. In the hidden representation, the dimensionality of the encoded image is reduced, which enables certain features of the image to be identified. Then, the encoded representation is decoded into a complete image, completing the translation. This translation process can be performed on an entire set of training images, which enables the network to learn the translation. This training process is based on the pairing mechanisms illustrated in FIG. 6. By using an image pairing technique 605 during training, whereby images from two domains (603, 601) are paired, a translator 609 can be trained to encode the images from one domain into images from another domain. For example, as shown in FIG. 15 an autoencoder might be trained to encode poor quality images of cats (e.g., 1501) into photo realistic images of cats (e.g., 1505)”) and a discriminator neural network configured to distinguish between samples generated by the generator network and samples of the distribution which are not generated by the generator network (See Kaufhold: Fig. 7, and [0101], “FIG. 7 is a block diagram illustrating an embodiment comprising training a DMTG generator using noise as an initialization. The DMTG 707 includes a generator 709, which is composed of a GAN 711 comprising a discriminator neural network 713 and a generative neural network 715. Given a relatively sparse set of training images 701 as input into the discriminator neural network 713 and Gaussian noise 705 and defined by the mathematical equation 703 as input into the generative neural network 715, the two networks in the GAN 711 work together to train the generator neural network 715 to produce images that are representative of the training examples 701 used to train the discriminative network 713. The discriminative neural network 713 is aware of the original training images 701. The generative neural network 715 is not given data to begin with; instead it is initialized with Gaussian noise 705. The two networks play an adversarial game where the generator 715 generates images 717 that are then processed by the discriminator 713. The discriminator 713 classifies the generated images 717 as either similar to the training data set 701 or not. The generator 715 may use this information to learn how to generate new images 717 that are better than the last iteration of generated images. The output from the discriminator 713, may be used directly in the loss function for the generator 715. The loss function changes how the generator 715 will generate the next set of images. Once trained, DMTG 707 can be called upon to produce generated images. The output is a comparatively dense set of generated images 717”);
wherein the generator neural network is configured to generate, based on a set of latent values (See Fu: Fig. 2A, and [0127], “In contrast to the existing methods, embodiments of the present invention provide a SCGAN that takes latent vectors, attribute labels, and semantic segmentations as inputs, and decouples the image generation into three dimensions. As such, embodiments of the SCGAN are capable of generating images with controlled spatial contents and attributes and generate target images with a large diversity”), data items which are samples of a distribution representing a set of training data items (See Kaufhold: Fig. 13, and [0047], “Second, embodiments of the invention can significantly increase the size of a training data by generating images that are similar to each image in the training data set. FIG. 13, described in more detail below, illustrates the interactions between image generation and object recognition”; and [0042], “Formally, a generative adversarial network defines a model G and a model D. Model D distinguishes between samples from G and samples h from its own distribution. Model G takes random noise, defined by z, as input and produces a sample h. The input received by D can be from h or h. Model D produces a probability indicating whether the sample is input that fits into the distribution or not”);
wherein the discriminator neural network is configured to distinguish by processing, by the discriminator neural network (See Kaufhold: Fig. 9, and [0103], “FIG. 9 is a block diagram illustrating an embodiment comprising training a DMTG generator using synthetic images as an initialization. The DMTG 907 includes a generator 909, which is composed of a GAN 911 comprising a discriminator neural network 913 and a generative neural network 915. Given a relatively sparse set of training images 901 as input into the discriminator network 913 and set of separately created synthetic images 903 as input into the generative network 915 (note this input 905 is different than Gaussian noise), these two networks work together to train the generator network 915 to produce images that are representative of the training examples 901 used to train the discriminative network 913. The discriminative neural network 913 is aware of the original training images 901. The generative neural network 915 is not given data to begin with; instead it is initialized with synthetic images 903. The two networks play an adversarial game where the generator 915 generates images 917 that are then processed by the discriminator 913. The discriminator 913 classifies the generated images 917 as either similar to the training data set 901 or not. The generator 915 may use this information to learn how to generate new images 917 that are better than the last iteration of generated images. The output from the discriminator 913, may be used directly in the loss function for the generator 915. The loss function changes how the generator 915 will generate the next set of images. Once trained, DMTG 907 can be called upon to produce generated images. The output is a comparatively dense set of generated images 917”), an input pair comprising a sample part and a latent part (See Kaufhold: Fig. 6, and [0100], “FIG. 6 is a block diagram illustrating an embodiment comprising training a DMTG translator. Given a relatively sparse set of training images 601 and a set of separately created synthetic images 603, a set of image pairings 605 indicates which synthetic images 603 align with corresponding training images 601. The pairings 605 are input to the translator 609, which is part of DMTG 607. The translator 609 comprises an autoencoder 611, which includes an encoder 613 and decoder 615. Encoder 613 compresses the data whereby N dimensions are mapped to M dimensions and M<N. Decoder 615 uncompresses the data mapping the M dimensions to N dimensions using a multi-layered convolutional neural network. Each time an image from synthetic images 603 goes through the encoder 613 and decoder 615, the network in translator 609 learns how to take the compressed information based on a synthetic image 603 to generate a training image such as the training images in 601 based on pairings 605. Once trained, DMTG 607 can be called upon with any unpaired synthetic image to produce a translated version of the image. The output of the translator 609 is a comparatively dense set of translated images 617”), an input pair comprising a sample part and a latent part (See Kaufhold: Fig. 6, and [0100], “FIG. 6 is a block diagram illustrating an embodiment comprising training a DMTG translator. Given a relatively sparse set of training images 601 and a set of separately created synthetic images 603, a set of image pairings 605 indicates which synthetic images 603 align with corresponding training images 601. The pairings 605 are input to the translator 609, which is part of DMTG 607. The translator 609 comprises an autoencoder 611, which includes an encoder 613 and decoder 615. Encoder 613 compresses the data whereby N dimensions are mapped to M dimensions and M<N. Decoder 615 uncompresses the data mapping the M dimensions to N dimensions using a multi-layered convolutional neural network. Each time an image from synthetic images 603 goes through the encoder 613 and decoder 615, the network in translator 609 learns how to take the compressed information based on a synthetic image 603 to generate a training image such as the training images in 601 based on pairings 605. Once trained, DMTG 607 can be called upon with any unpaired synthetic image to produce a translated version of the image. The output of the translator 609 is a comparatively dense set of translated images 617”);
wherein the sample and latent parts of the input pair comprise either a sample of the distribution generated by the generator neural network and the corresponding set of latent values used to generate the sample respectively, or a training data item of the set of training data items and a set of latent values generated by the encoder neural network based upon the training data item (See Kaufhold: Fig. 1, and [0042], “Formally, a generative adversarial network defines a model G and a model D. Model D distinguishes between samples from G and samples h from its own distribution. Model G takes random noise, defined by z, as input and produces a sample h. The input received by D can be from h or h. Model D produces a probability indicating whether the sample is input that fits into the distribution or not”);
wherein the training is based upon a loss function (See Kaufhold: Fig. 1, and [0042], “Formally, a generative adversarial network defines a model G and a model D. Model D distinguishes between samples from G and samples h from its own distribution. Model G takes random noise, defined by z, as input and produces a sample h. The input received by D can be from h or h. Model D produces a probability indicating whether the sample is input that fits into the distribution or not”) comprising a joint discriminator loss term based upon the sample and latent parts of the input pair processed by the discriminator neural network and a single discriminator loss term based upon only one of the sample or latent parts of the input pair (See Fu: Equation (10) and [0089], “where λ.sub.1, λ.sub.2, and λ.sub.3 are hyper-parameters which control the weights of classification loss, segmentation loss, and reconstruction loss. These weights act as relatively importance of those terms compared to adversarial loss. According to an embodiment, the weights are hyper-parameters chosen by user. The weights (hyper-parameters) can be tuned to affect how the generated images look. In an embodiment, the loss terms are constraints and regulations. In an embodiment, the generator will trade off those constraints (the loss terms) in generating the final output image. A larger hyper-parameter indicates a larger impact of that specific loss term. For example, increasing λ.sub.2 will let the generated image be more consistent with the target segmentation. Since A.sub.c is embedded in D and shares the same weights except the output layer, A.sub.c is trained together with D using discriminator loss custom-character.sub.D which contains both the adversarial term and the classification term on real image samples”).


Claims 16 and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Kaufhold, etc. (US 20190080205 A1) in view of Fu, etc. (US 20190295302 A1), further in view of Chatwin, et. (US 20180211303 A1).
Regarding claim 16, Kaufhold and Fu teach all the features with respect to claim 1 as outlined above. However, Kaufhold, modified by Fu, fails to explicitly disclose that the method of claim 1, wherein the loss function comprises a hinge function applied to a component of the loss function.
However, Chatwin teaches that the method of claim 1, wherein the loss function comprises a hinge function applied to a component of the loss function (See Chatwin: Fig. 12, and [0071], “A domain discriminator is trained using additional holdout sets of the unlabeled source and target data to approximate the custom-character-distance. Thus, in some embodiments, method 400 can optionally comprise an activity 420 of training a domain discriminator using a first additional holdout set of source data from the labeled source training data and a second additional holdout set of data from the target data. In some embodiments, method 400 also can optionally comprise an activity of using the domain discriminator and a loss function to approximate an H-distance between the different holdout set of source data of the labeled source training data and the different portion of the target data within each cluster of the plurality of clusters. In some embodiments, a small custom-character-distance indicates more similarity between the different holdout set of source data of the labeled source training data and the different portion of the target data in each cluster of the plurality of clusters than a large H-distance that is larger than the small H-distance. The approximate H-distance can be calculated as (1-hinge loss). The loss function can comprise one of a hinge loss function, a negative logarithmic loss function, a cross entropy loss function, a Huber loss function, a modified Huber loss function, an exponential loss function, a mean absolute deviation, or a Kullback-Leibler divergence”).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Kaufhold to have the method of claim 1, wherein the loss function comprises a hinge function applied to a component of the loss function as taught by Chatwin in order to bind or limit the error of the source classifier on the target data (See Chatwin: [0098], “Once an active sampling algorithm has been applied, knowledge of the agreement and disagreement clusters can be obtained with high probability. The set of agreement clusters can be denoted as C.sub.arr, and the target error of a hypothesis h on the agreement clusters can be defined as: [00022] .Math. T , agr  ( h ) = .Math. c i ∈ C agr .Math. .Math. T , c i  ( h ) , and the set of disagreement clusters can be denoted as C.sub.dis, and the target error of a hypothesis h on the disagreement clusters can be defined as: [00023] .Math. T , dis  ( h ) = .Math. c i ∈ C dis .Math. .Math. T , c i  ( h ) . For the agreement regions, binding or limiting the error of the source classifier {circumflex over (f)} on the target data is desirable.”). Kaufhold teaches a method and system that may train the neural network to recognize the target object in the images by collecting the training images assembled from set of real-world image of the target object and minimizing the loss function; while Chatwin teaches a system and method that may train the neural network by minimizing the loss functions that comprises one of a hinge loss function, a negative logarithmic loss function, a cross entropy loss function, a Huber loss function, a modified Huber loss function, an exponential loss function, a mean absolute deviation, or a Kullback-Leibler divergence. Therefore, it is obvious to one of ordinary skill in the art to modify Kaufhold by Chatwin to the hinge loss function as the loss function. The motivation to modify Kaufhold by Chatwin is “Use of known technique to improve similar devices (methods, or products) in the same way”.
Regarding claim 26, Kaufhold and Fu teach all the features with respect to claim 24 as outlined above. Further, Chatwin teaches that the method of claim 24, further comprising:
performing an action with an agent based upon the latent values representing the input data item (See Chatwin: Fig. 3, and [0088], “In some embodiments, a MAB can be defined where each arm i corresponds to a cluster c.sub.i. This approach is a novel application of the MAB. In addition, instead of choosing the arm that maximizes the total reward, the ‘best’ arms are the arms whose corresponding disagreement hypotheses hold. Each arm i is associated with an expectation μ.sub.i that is initially unknown. In addition, each arm is associated with a hypothesis H.sub.i:μ.sub.i>ϵ.sub.i for some given threshold ϵ ∈ (0, 1). At each round t, the agent selects an action A.sub.t (a subset of arms) from the action set A: {A .Math.[1, . . . , K]} and receives a stochastic observation r.sub.i,t ∈ {0,1} from each of the arms in A.sub.t. A goal is to obtain observations from the set of “optimal” arms on which the disagreement hypotheses H.sub.dis.i hold. Therefore, the optimal action is defined as A*:{i:μ.sub.i>ϵ.sub.i}”).


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GORDON G LIU whose telephone number is (571)270-0382. The examiner can normally be reached Monday - Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GORDON G LIU/Primary Examiner, Art Unit 2612