Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to submission of application on 2/27/2018.
Claims 1-20 are presented for examination.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 13-17, and 20  are rejected under 35 U.S.C. § 103 as being unpatentable over Liu, et al (Coupled Generative Adversarial Networks, herein Liu), and Larsen, et al (Autoencoding beyond pixels using a learned similarity metric, herein Larsen). 
Regarding claim 1
	Liu teaches a computer-implemented method, comprising: (Liu, Figure 1, and page 1, paragraph 1, line 1 “We propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images… It can learn a joint distribution with just samples drawn from the marginal distributions.” And, page 1, paragraph 2, line 1 “The paper concerns the problem of learning a joint distribution of multi-domain images from data.” And paragraph, 4, line 1 “CoGAN consists of a tuple of GANs, each for one image domain….We show that by enforcing a weight-sharing constraint the CoGAN can learn a joint distribution without existence of corresponding images in different domains.”

    PNG
    media_image1.png
    441
    1021
    media_image1.png
    Greyscale

  In other words, CoGAN is a computer implemented method.)
	Liu teaches a first domain and a second domain. (Liu, page 1, paragraph 4, line 1 “CoGAN consists of a tuple of GANS, each for one image domain.” And, page 1, paragraph 4, line 8 “CoGAN is for multi-image domains but, for ease of presentation, we focused on the case of two image domains in the paper.  However, the discussions and analyses can be easily generalized to multiple image domains.”  In other words, two image domains are a first domain and a second domain.)
	Thus far, Liu does not explicitly teach encoding, by a first neural network, a first image represented in a first domain to convert the first image to a shared latent space, producing a first latent code; encoding, by a second neural network, a second image represented in a second domain to convert the second image to a shared latent space, producing a second latent code; and generating, by a third neural network, a first translated image in the second domain based on the first latent code, wherein the first translated image is correlated with the first image and weight values of the third neural network are computed based on the first latent code and the second latent code.
	Larsen teaches encoding, by a first neural network, a first image represented in a first domain to convert the first image to a shared latent space, producing a first latent code; (Larsen, page 2, column 1, paragraph 1, line 1“We combine VAEs and GANs into an unsupervised generative model that simultaneously learns to encode, generate and compare dataset samples.” And, page 2, column 1, paragraph 3, line 1 “A VAE consists of two networks that encode a data sample x to a latent representation z and decode the latent representation back to data space, respectively: 
    PNG
    media_image2.png
    28
    525
    media_image2.png
    Greyscale
” In other words, encode is encoding, two networks (the first encodes) is first neural network in a first domain, encode a data sample x to a latent representation z is convert the first image to a shared latent space, and latent representation z is a first latent code.)
	Larsen teaches encoding, by a second neural network, a second image represented in a second domain to convert the second image to a shared latent space, producing a second latent code; and (Larsen, page 2, column 1, paragraph 1, line 1“We combine VAEs and GANs into an unsupervised generative model that simultaneously learns to encode, generate and compare dataset samples.” And, page 2, column 1, paragraph 3, line 1 “A VAE consists of two networks that encode a data sample x to a latent representation z and decode the latent representation back to data space, respectively: 
    PNG
    media_image2.png
    28
    525
    media_image2.png
    Greyscale
” In other words, encode is encoding, two networks (the second encodes) is second neural network in a second domain, encode a data sample x to a latent representation z is convert the second image to a shared latent space, and latent representation z is a second latent code.  Doing the same steps again, is encoding by a second network, a second image to a shared latent space, producing a second latent code z.)
	Larsen teaches generating, by a third neural network, a first translated image in the second domain based on the first latent code, wherein the first translated image is correlated with the first image and weight values of the third neural network are computed based on the first latent code and the second latent code. (Larsen, page 2, column 1, paragraph 5, line 1 “A VAE consists of two networks that encode a data sample x to a latent representation z and decode the latent representation back to data space respectively:

    PNG
    media_image3.png
    29
    579
    media_image3.png
    Greyscale
” In other words, decoder is third neural network, decode is generating wherein the first translated image is correlated with the first image and weight values of the third neural network are computed based on the first latent code.)
In order to motivate the combination of Liu and Larsen, a brief description of the underlying technology is necessary. A GAN (generative adversarial network) is a machine learning model in which two neural networks, called the generator and discriminator, respectively, compete with each other to become more accurate in their predictions.  The generator, learns to generate plausible data.  The generated instances become negative training examples for the discriminator. The discriminator learns to distinguish the generator’s fake data from real data.  For example, in the case of images, this means discriminating between an image generated by the generator and an actual photograph. The discriminator penalizes the generator for producing implausible results.
	Liu combines two GANS into a coupled generative adversarial network (CoGAN) to learn a joint distribution between two or more domains. Liu does this to allow for unsupervised learning of a joint domain by enforcing the layers that decode high-level semantics in the GANs to share weights, thereby forcing the GANs to decode the high-level semantics in the same way. (See, Figure 1 of Liu.)
	An autoencoder is a neural network that combines an encoder and a decoder.  The encoder maps the input into a latent code and the decoder takes the latent code and produces an output that maps back to the input.  The goal of an autoencoder is to learn a lower-dimensional representation (encoding) for a higher-dimensional data, typically for dimensionality reduction, by training the network to capture the most important parts of the input.  The loss function used to train an autoencoder is called reconstruction loss, as it is a check of how well the image has been reconstructed from the input.
	A Variational Autoencoder (VAE) performs the same function as the autoencoder, but instead of the encoder’s output being a latent vector, the encoder outputs the mean and the standard deviation for each latent variable.  It does this in order to normalize the output and remove outliers.  The loss function is the reconstruction loss, as with a typical autoencoder, combined with a similarity loss.

    PNG
    media_image4.png
    353
    682
    media_image4.png
    Greyscale

	Larsen teaches a combination of VAE and GAN called a VAE/GAN. The VAE/GAN encodes the input with the encoder into the latent space, and then decodes the latent space to the reconstructed input.  The reconstructed input is then used as the generator in the GAN where the reconstructed input is compared to a real image by the discriminator.   In other words, when the VAE encoder-decoder is combined with the GAN generator-discriminator, the decoder of the VAE becomes the generator of the GAN.  The loss is then promulgated back to the encoder. “The end result will be a method that combines the advantage of GAN as a high quality generative model and VAE as a method that produces an encoder of data into the latent space z.” (Larson, page 2, column 2, paragraph 3, line 5.)

    PNG
    media_image5.png
    263
    459
    media_image5.png
    Greyscale

The combined VAE/GAN simultaneously learns to encode, generate, and compare dataset samples.  The VAE/GAN replaces element-wise reconstruction errors with feature-wise errors for measuring reconstruction quality during training. 
The claimed invention is a combination of VAE/GANs from Larsen combined with coupled GANs from Liu. The result is coupled VAE/GANs for unsupervised image-to-image translation (See Fig. 2C) 

    PNG
    media_image6.png
    673
    437
    media_image6.png
    Greyscale

	Both Liu and Larson are directed to image to image translation among other things.   Liu teaches using two sets of GANS, aka coupled generative adversarial networks (GANs), to learn a joint distribution of multi-domain images by enforcing layers to share weights in order to make a shared latent space.  But Liu does not teach combining VAEs with GANs for the purpose of learning feature representations in the GAN discriminator for reconstruction instead of element-wise representations.    Larson teaches combining VAEs with GANs to learn feature-wise errors instead of element-wise errors allowing for an embedding in which high-level abstract visual features can be modified using simple arithmetic. However, Larsen does not teach learning  a joint distribution of multiple domains from data. In view of the teaching of Liu, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Larsen into Liu.  This would result in using VAE-GANS that are coupled to improve the quality of image translation between domains using unsupervised data.	
	One of ordinary skill in the art would be motivated to do this because the problems of element-wise distance metrics have been a long-standing problem in the field. (Larsen, page 6, column 2, paragraph 3, lie 1 “The problems of element-wise distance metrics are well known in the literature and many attempts have been made at going beyond pixels – typically using hand-engineered measures.  Much in the spirit of deep learning, we argue that the similarity measure is yet another component which can be replaced by a learned model capable of capturing high-level structure relevant to the data distribution.  In this work, our main contribution is an unsupervised scheme for learning and applying such a distance measure.”)
Regarding claim 2,
	The combination of Liu and Larsen teaches the method of claim 1, wherein 
	encoder weight values are shared between a last layer of the first neural network and a last layer of the second neural network.  (Liu, page 1, paragraph 4, line 5 “By enforcing the layers that decode high-level semantics in the GANs to share the weights, it forces the GANs to decode the high-level semantics in the same way.  The layers that decode low-level details then map the shared representation to image in individual domains for confusing the respective discriminative models.” In other words, the layers that decode high-level semantics in the GANs to share the weights is encoder weight values are shared between a last layer of the first neural network and a last layer of the second neural network.)
Regarding claim 3,
	The combination of Liu and Larsen teaches the method of claim 1, further comprising 	generating, by a fourth neural network, a second translated image in the first domain based on the second latent code, wherein the second translated image is correlated with the second image.  (Larsen, page 2, column 1, paragraph 5, line 1 “A VAE consists of two networks that encode a data sample x to a latent representation z and decode the latent representation back to data space respectively:
    PNG
    media_image3.png
    29
    579
    media_image3.png
    Greyscale
” In other words, in the combination of Liu and Larson where there are two VAE/GANs, the second decoder is the fourth neural network, decode is generating wherein the second translated image is correlated with the second image and weight values of the fourth neural network are computed based on the second latent code. Examiner notes that Larsen shows a decoder that generates a translated image based on a latent code and correlated with the input image.  The combination of Liu and Larsen, where there are two VAE/GANs that are coupled, shows a decoder/generator that generates a 2nd translated image based on a second latent code, wherein the second translated image is correlated with the second image.)
Regarding claim 4,
	The combination of Liu and Larsen teaches the method of claim 3, wherein 
	the weight values include generator weight values that are shared between a first layer of the third neural network and a first layer of the fourth neural network.  (Liu, Figure 1, and, page 1, paragraph 4, line 5 “By enforcing the layers that decode high-level semantics in the GANs to share the weights, it forces the GANs to decode the high-level semantics in the same way.  The layers that decode low-level details then map the shared representation to image in individual domains for confusing the respective discriminative models.” In other words, the third and fourth neural networks are the decoder/generator neural networks that are created by the coupling of VAE-GANs. The layers that decode high-level semantics in the GANs to share the weights is generator weight values are shared between a last layer of the first neural network (encoder 1) and a last layer of the second neural network (encoder 2).)
Regarding claim 13,
	The combination of Liu and Larsen teaches the method of claim 1, wherein 
	the first domain is synthetic and the second domain is real.  (Liu, Figure 1, “Each has a generative model for synthesizing realistic images in one domain and discriminative model for classifying whether an image is real or synthesized.”  In other words, domain is domain, synthesized is synthetic and real is real.)
Claims 14-17 are system claims corresponding to method claims 1-4, respectively.  Otherwise they are the same.  It is implicit that a computer implemented method requires a system in order to execute.  Therefore, claims 14-17 are rejected for the same reasons as claim 1-4, respectively.
Claim 20 is a non-transitory computer readable media storing computer instructions claim corresponding to method claim 1.  Otherwise, they are the same.  It is implicit that a method requires at least one non-transitory computer readable media storing computer instructions in order to execute.  Therefore, claim 20 is rejected for the same reasons as claim 1.

Claim 12 is rejected under Liu, Larsen, and Gatys et al (Image Style Transfer Using Convolutional Neural Networks, herein Gatys).
Regarding claim 12,
	The combination of Liu and Larsen teaches the method of claim 1, wherein 
	Thus far, the combination of Liu and Larsen does not explicitly teach the first domain is day time and the second domain is night time.
	Gatys teaches the first domain is day time and the second domain is night time. (Gatys,  page, 2420, column 2, paragraph 2, line 1 “Thus far the focus of this paper was on artistic style transfer.  In general though, the algorithm can transfer the style between arbitrary images.  As an example, we transfer the style of a photograph of New York by night onto an image of London in daytime (Fig 7).” 

    PNG
    media_image7.png
    686
    619
    media_image7.png
    Greyscale

In other words, London by day is day domain and New York by night is night domain.)
	Both Gatys and the combination of Liu and Larsen are directed to image to image translation.  In view of the teaching of the combination of Liu and Larsen, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Gatys into the combination of Liu and Larsen.  This would result in being able to translate an image in daytime into an image in nighttime.
	One of ordinary skill in the art would be motivated to do this because being able to simulate human vision would give computers the ability to separate image content from style. (Gatys, page 2421, column 2, paragraph 2, line 1 “Nevertheless, we find it truly fascinating that a neural system, which is trained to perform one of the core computational tasks of biological vision, automatically learns image representations that allow – at least to some extent – the separation of image content from style.”)
Allowable Subject Matter
Claims 5, 11, and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Claims depending from claims 5, 11, and 18 are objected to for depending from claims 5, 11, and 18. 
The following is a statement of reasons for the indication of allowable subject matter: Claim 5 requires, among other things, a method for generating, by a third neural network, a first reconstructed image in the second domain based on a second latent code, wherein the first reconstructed image is correlated with the second image.
	Larsen teaches a method for generating an image based on a latent code but does not teach two domains, in particular, a cross domain in which a first reconstructed image in the second domain based on a second latent code, wherein the first reconstructed image is correlated with the second image.
	Liu teaches multiple domains where the weights are shared between the multiple domains to produce a joint domain, but does not teach generating an image based on a latent code, in particular, a cross domain in which a first reconstructed image in the second domain based on a second latent code, wherein the first reconstructed image is correlated with the second image.  
	The combination of Liu and Larsen teaches combining  two VAE/GANs combined with coupled GANs, but does not teach a cross domain in which a first reconstructed image in the second domain based on a second latent code, wherein the first reconstructed image is correlated with the second image. 
	All fail to teach, either individually or in combination, a cross domain in which a first reconstructed image in the second domain based on a second latent code, wherein the first reconstructed image is correlated with the second image.
	Claim 11 requires, among other things, a method wherein the first latent code and the second latent code are equal.  Neither Liu, nor Larsen, either individually or in combination, teach two latent codes generated from two different domains that are equal. 
	Claim 18 is a system claim corresponding to method claim 5.  Otherwise, they are the same.  Therefore, claim 18 would be allowable for similar reasons under similar conditions as claim 5.

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124