DETAILED ACTION
This action is responsive to the Application filed on 07/11/2022. Claims 1-22 are pending in the case.  Claims 1, 10, 19 20 are independent claims. Claims 10 and 20 are amended. Claim 22 is new. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 01/11/2022 have been fully considered but they are not persuasive. 
	Applicant argues that the cited art does not teach the amended limitations. Examiner notes the rejections have been updated in view of Achanta as a result of the amendments. 
	Further applicant argues “It would not have been obvious to primarily use an image based GAN and modify it into the claimed text based Gan”. Applicant argues that a real image is not the same as a one-hot representation of real text, at least because they would not introduce the same technical problems solved by the claimed GAN. Firstly, examiner notes that Wang does not describe an “image based GAN”. Wang describes a GAN which discriminates between real and fake data. While Wang describes one embodiment of the GAN which operates on “real images” as input data, this does not suggest that the GAN of Wang is only operable to solve the problem of generating “convincing” fake images. One of ordinary skill in the art would understand that the GAN may be used on other types of data, at least as evidenced by the applicant’s specification (¶0024). In addition, the prior art Zhang et al. uses a GAN on text data.
Further applicant argues that the rationale to combine Wang with Zhang at best explains why Zhang improves a text-based GAN. Examiner notes that applicant has merely asserted that because Wang system operates on images, it is only applicable to image based domains. Examiner disagrees. Both Zhang and the specification note that GANs can generally be used to process not only images but also text based data. Examiner notes that therefore the features relied upon in Wang are pertinent to not only image based GANs.  While Zhang teaches how a GAN which operates on text data may be improved, it also notes that their framework is applicable to other domains particularly to help in alleviating mode collapse, a common problem in GANs generally addressed by both Zhang and Wang. “In our approach, MMD and feature matching are introduced to alleviate mode collapsing with text data as motivating domain” (pg 3 paragraph 2 Zhang). Further, Wang points out the very same problem which is addressed by their solution “[traditional models] suffers the difficulty in translating a random vector into a desired high dimensional sample. As a result, the training dynamics in GAN are often unstable and the generated samples could collapse to limited modes” (Wang abstract). This provides further support that the architectures of Zhang and Wang are both are applicable to solving the same problem where the data type, image or text, is merely exemplary and does not suggest that the architecture is applicable to only one data type. Examiner points out that characterizing Zhang as “text-based” and Wang as “image-based” is imprecise because it ignores the features of each are usable together at least as pointed out above.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 10-15, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. “Unregularized Auto-Encoder with Generative Adversarial Networks for Image Generation”. Further in view of Zhang et al. “Adversarial Feature Matching for Text Generation” hereinafter Zhang. Further still in view of Elaffendi et al. “Text Encoding for Deep Learning Neural Networks: A Reversible Base 64 (Tetrasexagesimal) Integer Transformation (RIT64) Alternative to One Hot Encoding with Applications to Arabic Morphology  hereinafter Elaffendi., Further in view of Achanta “analysis of sequence to sequence neural networks on grapheme to phoneme conversion task”.

Regarding Claim 10 and 20
Wang teaches, A method for training a latent space and [data] based generative adversarial network (GAN) executing on one or more processing units for real [feature] generation, the method comprising (Abstract “we propose a new Auto-Encoder Generative Adversarial Networks (AEGAN) … we map the random vector into the encoded latent space by adversarial training based on GAN” Examiner notes that training implicitly involves execution by at least one processing unit.) receiving, at an encoder neural network of the latent space and [data]-based GAN, a real [features] and outputting, by the encoder neural network, a latent representation of the real [features] generated by the encoder neural network from the real [features] 
receiving, at a decoder neural network, the latent representation of the real [features], and outputting, by the decoder neural network, a reconstructed representation of the real [features] generated by the decoder neural network from the latent representation of the real [features];… 
receiving, at the decoder neural network of the latent space and [data]-based GAN,… random noise data or an artificial code generated by a generator neural network of the latent space and [data]-based GAN from the random noise data, and outputting, by the decoder neural network, [reconstructed] representation of artificial [features] generated by the decoder neural network from the random noise data or the artificial code… 
receiving, at a [data] based discriminator neural network, the [reconstructed features] output by the decoder neural network and the [reconstructed] representation of the artificial [features] output by the decoder neural network;
… receiving, at a [data] based  discriminator neural network, the latent representation of the real [features] output by the decoder neural network and the artificial code or the random noise data generated by the generator neural network of the latent space and [data] based GAN.(Figure 1 AEGAN
    PNG
    media_image1.png
    251
    708
    media_image1.png
    Greyscale
Examiner notes that x and h correspond to the real features and latent representation of real features.) and outputting, by the [data] based discriminator neural network, a second probability indicating whether the artificial code or the random noise data received by the [data] based discriminator neural network is similar to the latent representation of the real [features] received by the code based discriminator neural network. (Figure 1 and Section 3.3 ¶02 “and the discriminator D1 estimates the probability that a latent vector came from Auto-Encoder rather than G1” Examiner notes that the first discriminator and second discriminator corresponds to                         
                            
                                
                                    D
                                
                                
                                    2
                                
                            
                             
                            a
                            n
                            d
                             
                            
                                
                                    D
                                
                                
                                    1
                                
                            
                        
                     respectively.) outputting, by the [data] based discriminator neural network, a first probability indicating whether the [reconstructed] representation of artificial [features] received by the [data]-based discriminator neural network is similar to the [reconstructed features] received by the [data] based discriminator neural network. (Section 3.3 ¶02 “After that, the discriminator D2 is used to distinguish them from real images” Examiner notes that D2 receives soft max representations of the real and artificial [features] output by the decoder. Given that D1 is capable of distinguishing via a probability estimate it would obvious for D2 to distinguish via a probability output.) A device comprising: one or more processing units; a non-transitory computer readable storage medium storing programming for execution by the one or more processing units, the programming including instructions for: (Section 4 and Algorithm 1 Examiner notes that the algorithm presented by Wang is implemented in the experiments section. In order to perform the experiments described, a computer or device consisting of processing units must necessarily be utilized consisting of computer readable storage including instructions, such as algorithm 1.)
Wang does not explicitly teach, training a latent space and text based, a one-hot representation of a real text,… comprises a sequence of words, and wherein the one-hot representation of the real text is based on a K-word natural language dictionary;…a reconstructed softmax representation of the real text… generated from the a latent representation of the real text, a reconstructed softmax representation of the real text generated from the latent representation of the real text… the reconstructed softmax representation of the real text that is a continuous representation of the real text, softmax representation of artificial text generated from the… artificial code
Zhang, however, when addressing issues related mapping a latent representation to a softmax representation to be discriminated by a discriminator, training a latent space and text based, the latent representation of the real text, a reconstructed softmax representation of the real text… generated from the a latent representation of the real text, (Section 4¶002-¶003 “feature vectors encoded from real sentences [latent representation of real text]…The feature vector is then fed into a 900-200-2 fully connected network for the discriminator… with sigmoid activation units connecting the intermediate layers and softmax/tanh units for the top layer of discriminator/encoder” pg 8 ¶03 “, a baseline autoencoder (AE) is trained for 20 epochs. The results for textGAN and AE are presented in Table 3” Examiner notes that Zhang teaches using softmax units to map latent text to a softmax representation of the real text. And further that a discriminator is operable on text-based data.)the reconstructed softmax representation of the real text that is a continuous representation of the real text, (Examiner notes, by definition the softmax operator outputs a continuous representation of its input.) softmax representation of artificial text generated from the… artificial code(Figure 2 Top Examiner notes, that just as the discriminator generates softmax representation in the feature layer, it also generates f̃, synthetic features)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a discriminator uses a softmax representation of latent features to discriminate between real and false text in a generative adversarial network as taught by Zhang to the disclosed invention of Wang.
One of ordinary skill in the arts would have been motivated to make this modification because in order to “delivers superior performance compared to related approaches produce realistic sentences, and that the learned latent representation space can “smoothly” encode plausible sentences” (Zhang Conclusion)
Wang/Zhang does not explicitly teach, a one-hot representation of a real text,…comprises a sequence of words; wherein the real text comprises a sequence of words from a K- word natural language dictionary, and wherein the one-hot representations of the real text include a one-hot representation for each word in the sequence of words from is based on a the K-word natural language dictionary; wherein the continuous representation of the real text is a k- dimensional vector of real numbers in which each entry of the k-dimensional vector is a corresponding probability and maps to a corresponding word in the k-word natural language dictionary
Elaffendi however, when addressing encoding text usable in neural networks teaches , a one-hot representation of a real text,… comprises a sequence of words (Introduction “One Hot Encoding approaches represent each word in the vocabulary by a numerical positional vector whose elements are all zeros, except for the position of the word in the vocabulary list” Examiner notes that a text with a string or sequence of words in a corresponding vocabulary can be encoded as described, in the context of neural networks for text processing.) wherein the real text comprises a sequence of words from a K- word natural language dictionary, and wherein the one-hot representations of the real text include a one-hot representation for each word in the sequence of words from is based on a the K-word natural language dictionary; (pg 1 Section 1 ¶02 “One Hot Encoding approaches represent each word in the vocabulary by a numerical positional vector whose elements are all zeros, except for the position of the word in the vocabulary list” the real text corresponds to the word from a vocabulary list. Wherein the vocabulary list is a k-word natural language dictionary.)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate one-hot encoding for a real text string as taught by Elaffendi to the disclosed invention of Wang/Zhang.
One of ordinary skill in the arts would have been motivated to make this modification because “One Hot Encoding (OHE) is currently the norm in text encoding for deep learning neural models” (Elaffendi Abstract)
Wang/Zhang/Elaffendi does not explicitly teach, wherein the continuous representation of the real text is a k- dimensional vector of real numbers in which each entry of the k-dimensional vector is a corresponding probability and maps to a corresponding word in the k-word natural language dictionary
Achanta however when discussing the output of a decoder network in an encoder decoder framework teaches, wherein the continuous representation of the real text is a k- dimensional vector of real numbers in which each entry of the k-dimensional vector is a corresponding probability and maps to a corresponding word in the k-word natural language dictionary (Section 4.1.1 “The encoder in our case is a bi-directional recurrent neural network (BiRNN) [18][23]. The encoder reads the entire input sequence and a representation is stored in the final state vectors.” Section 4.1.3 “The decoder is also an RNN but only a uni-directional one.

    PNG
    media_image2.png
    63
    339
    media_image2.png
    Greyscale
…It has to be noted that the past output yt−1, during training, can be either from the ground truth or from the prediction of the network itself. Similarly, Ui, U and Uc denote the weight connections from current hidden state, past output and context layers. g in our case is a soft-max layer” Section 5 Experiments “The input consisted of 41 letters including numbers and special characters. The output layer had 39 phones. One-hot representation was used at both input and output.” The encoder decoder framework maps and input one hot representation to and output one hot representation the output representation is a 39-dimensional phone vector which is a mapping from the input 41-world natural language library represented at the encoder input. The decoder output is a continuous representation of the one hot vector because the SoftMax function is applied to it, g(). As noted previously the softmax function output a vector of continuous values between zero and 1, these values correspond to the probability for each word in the vector to correspond to the input encoding.)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate use one hot encoding to process text input and output from an encoder decoder framework respectively as taught by Achanta to the disclosed invention of Wang/Zhang/Elaffendi.
One of ordinary skill in the arts would have been motivated to make this modification because both Achanta and Wang/Zhang/Elaffendi  relate to discovering a mapping for text strings using a encoder decoder framework and sequence to sequence models allow one to directly map input sequence to output sequences as noted by Achanta “With the recent advent of sequence to sequence neural networks, the alignment step can be skipped allowing us to directly map the input and output sequences” (abstract Achanta)

Regarding Claim 11
	Wang/Zhang/Elaffendi/Achanta  teach Claim 10
Further Wang teaches, calculating a reconstruction loss based on a difference between the one-hot representation of the real text and the reconstructed softmax representation of the real text output from the decoder neural network; (Section 3.3 ¶03 “Images from training set are first be encoded into latent space by the encoder: hi = Enc(xi), where i ∈ {0, 1, ...,n}, xi represents the image sampled from training set, hi is the latent vector of xi, and n is the size of mini-batches. Together with the decoder, the whole Auto-Encoder networks are trained by minimizing the squared cost function [reconstruction loss]:

    PNG
    media_image3.png
    69
    316
    media_image3.png
    Greyscale
” Examiner notes, as stated previously that when modified with Zhang the decoder output undergoes the soft-max operation. Such that Dec(h) corresponds to softmax(Dec(h))) and updating parameters of the encoder neural network and parameters of the decoder neural network based on the reconstruction loss. (Section 4 Algorithm 1 The Auto-Encoder Generative Adversarial Networks training procedure “Update Enc and Dec by descending: … Update D2 by descending: … Update D1 by descending: … Update G1 by descending:” Examiner notes that updating includes updating the encoder and decoder based in part on the reconstruction loss)
Further Wang/Zhang/Elaffendi/Achanta  teaches, and repeating the receiving the one-hot representations of the real text and outputting the latent representation of the real text, the receiving the latent representation of the real text and outputting the reconstructed softmax representation of the real text, the receiving the random noise data or the artificial code, the receiving the reconstructed softmax representation of the real text and the softmax representation of artificial text, the outputting the first probability, the receiving the latent representation of the real text, outputting the second probability, the calculating the reconstruction loss, and the updating the parameters of the encoder neural network and the parameters of the decoder neural network until the reconstruction loss is minimized. ( Wang Section 3.3 ¶03 “Images from training set are first be encoded into latent space by the encoder: hi = Enc(xi), where i ∈ {0, 1, ...,n}, xi represents the image sampled from training set, hi is the latent vector of xi, and n is the size of mini-batches. Together with the decoder, the whole Auto-Encoder networks are trained by minimizing the squared cost function [reconstruction loss]:

    PNG
    media_image3.png
    69
    316
    media_image3.png
    Greyscale
” Examiner notes that Wang when combined with the text based features of Zhang and Elaffendi teach the training as claimed, because the system is trained until the loss is minimized. As the data is sampled the steps recited are repeated)

Regarding Claim 12
	Wang/Zhang/Elaffendi/Achanta teach Claim 10
Further Wang as modified by Zhang teaches, calculating a text-based  discriminator loss for the first discriminator neural network based on the reconstructed softmax representation of the real text and the soft-max representation of artificial text; (Section 3.3 ¶04 “Then, both kinds of latent vectors are fed into the Decoder to get the generated images x˜i and the reconstructed images xi′. Discriminator D2 [first discriminator] is trained to distinguish them from each other…
    PNG
    media_image4.png
    34
    404
    media_image4.png
    Greyscale
” Examiner notes, xi undergoes the soft-max operation before being input into the discriminator.) and updating parameters of the text-based discriminator neural network and parameters of the decoder neural network based on the first discriminator loss.(Section 4 Algorithm 1 The Auto-Encoder Generative Adversarial Networks training procedure “Update Enc and Dec by descending:… Update D2 by descending: …Update D1 by descending: …Update G1 by descending:” Examiner notes that updating includes updating the parameters of the 1st  discriminator and encoder based in part on the 1st discriminator loss)
Further Wang/Zhang/Elaffendi teaches, and repeating the receiving the one-hot representations of the real text and outputting the latent representation of the real text, the receiving the latent representation of the real text and outputting the reconstructed softmax representation of the real text, the receiving the random noise data or the artificial code, the receiving the reconstructed softmax representation of the real text and the softmax representation of artificial text, the outputting the first probability, the receiving the latent representation of the real text, the outputting the second probability, the calculating the first discriminator loss, and the updating the parameters of the text-based discriminator neural network and the parameters of the decoder neural network until the first discriminator loss is minimized. ( Wang Section 3.3 ¶03 “Images from training set are first be encoded into latent space by the encoder: hi = Enc(xi), where i ∈ {0, 1, ...,n}, xi represents the image sampled from training set, hi is the latent vector of xi, and n is the size of mini-batches.” Section 3.3 ¶04 “Then, both kinds of latent vectors are fed into the Decoder to get the generated images x˜i and the reconstructed images xi′. Discriminator D2 [first discriminator] is trained to distinguish them from each other…
    PNG
    media_image4.png
    34
    404
    media_image4.png
    Greyscale
”
” Examiner notes that Wang when combined with the text based features of Zhang and Elaffendi teach the training as claimed, because the system is trained until the loss is minimized. As the data is sampled the steps recited are repeated)


Regarding Claim 13
	Wang/Zhang/Elaffendi/Achanta  teach Claim 10
Further Wang as modified by Zhang teaches, calculating a code-based discriminator loss for the second discriminator neural network based on the artificial code or the random noise data and the latent representation of the real text; (Section 3.3 ¶04 “the discriminator D1, which takes a real latent vector hi or a generated one ˜hi as input, is trained to classify inputs into two classes (real or fake)…
    PNG
    media_image5.png
    33
    373
    media_image5.png
    Greyscale
”) and updating parameters of the code-based discriminator neural network and parameters the encoder neural network based on the second discriminator loss when input to the code-based discriminator neural network is the artificial code, or is the random noise data and the latent code representation of the real text(Section 4 Algorithm 1 The Auto-Encoder Generative Adversarial Networks training procedure “Update Enc and Dec by descending:… Update D2 by descending: …Update D1 by descending:…Update G1 by descending:” Examiner notes that updating includes updating the parameters of the 2nd discriminator and encoder based in part on the 2nd discriminator loss)
Further Wang/Zhang/Elaffendi teaches, and repeating the receiving the one-hot representations of the real text and outputting the latent representation of the real text, the receiving the latent representation of the real text and outputting the reconstructed softmax representation of the real text, the receiving the random noise data or the artificial code, the receiving the reconstructed softmax representation of the real text and the softmax representation of artificial text, the outputting the first probability, the receiving the latent representation of the real text, the outputting the second probability, the calculating the second discriminator loss, and the updating the parameters of the code-based discriminator neural network and the parameters of the encoder neural network until the second discriminator loss is minimized ( Wang Section 3.3 ¶03 “Images from training set are first be encoded into latent space by the encoder: hi = Enc(xi), where i ∈ {0, 1, ...,n}, xi represents the image sampled from training set, hi is the latent vector of xi, and n is the size of mini-batches.” Section 3.3 ¶04 “the discriminator D1, which takes a real latent vector hi or a generated one ˜hi as input, is trained to classify inputs into two classes (real or fake)…
    PNG
    media_image5.png
    33
    373
    media_image5.png
    Greyscale
” Examiner notes that Wang when combined with the text based features of Zhang and Elaffendi teach the training as claimed, because the system is trained until the loss is minimized. As the data is sampled the steps recited are repeated)


Regarding Claim 14
	Wang/Zhang/Elaffendi/Achanta  teach Claim 10
	Further Wang teaches, calculating a generator loss that maximizes the first probability of the text based discriminator neural network and the second probability of the code based discriminator neural network (Section 3.3 ¶04 “while G1 is trained to "fool" both D1 and D2… 
    PNG
    media_image6.png
    29
    323
    media_image6.png
    Greyscale
” Examiner notes that in the context of discriminators that output probabilities of being fooled. A Generator network that is trained to fool the discriminators would be one that maximizes those probabilities.) and updating parameters of the generator neural network and parameters the decoder neural network based on the generator loss. (Section 4 Algorithm 1 The Auto-Encoder Generative Adversarial Networks training procedure “Update Enc and Dec by descending: …Update D2 by descending: … Update D1 by descending: …Update G1 by descending:” Examiner notes that updating includes updating the parameters of the decoder and generator based in part on the generator loss) generating, at the generator neural network of the latent space and text-based GAN, the artificial code form the random noise data ( As shown in figure 1, the artificial code h̃ is generated by the Generator G, 
    PNG
    media_image7.png
    323
    454
    media_image7.png
    Greyscale
)
Further Wang/Zhang/Elaffendi/Achanta  teaches, and repeating the receiving the one-hot representations of the real text and outputting the latent representation of the real text, the receiving the latent representation of the real text and outputting the reconstructed softmax representation of the real text, the receiving the random noise data or the artificial code, the receiving the reconstructed softmax representation of the real text and the softmax representation of artificial text, the outputting the first probability, the receiving the latent representation of the real text, the outputting the second probability, the generating the artificial code, the calculating the generator loss, and the updating the parameters of the generator neural network and the parameters of the decoder neural network until the generator loss is minimized. ( Wang Section 3.3 ¶03 “Images from training set are first be encoded into latent space by the encoder: hi = Enc(xi), where i ∈ {0, 1, ...,n}, xi represents the image sampled from training set, hi is the latent vector of xi, and n is the size of mini-batches.” Section 3.3 ¶04 “while G1 is trained to "fool" both D1 and D2… 
    PNG
    media_image6.png
    29
    323
    media_image6.png
    Greyscale
” Examiner notes that Wang when combined with the text based features of Zhang and Elaffendi teach the training as claimed, because the system is trained until the loss is minimized. As the data is sampled the steps recited are repeated)


Regarding Claim 15
	Wang/Zhang/Elaffendi/Achanta teach Claim 11
Further Wang teaches, 

    PNG
    media_image8.png
    187
    648
    media_image8.png
    Greyscale
 (Section 3.3 ¶03 “Images from training set are first be encoded into latent space by the encoder: hi = Enc(xi), where i ∈ {0, 1, ...,n}, xi represents the image sampled from training set, hi is the latent vector of xi, and n is the size of mini-batches. Together with the decoder, the whole Auto-Encoder networks are trained by minimizing the squared cost function [reconstruction loss]:

    PNG
    media_image3.png
    69
    316
    media_image3.png
    Greyscale
”Examiner notes, as stated previously that when modified with Zhang the decoder output undergoes the soft-max operation. Further “training” entails not only calculating but also updating. Furthermore, the coefficient 1/nHW is only a scalar multiple that would have been obvious to be removed by PHOSITA, as this scalar does not affect the error calculation)
Regarding Claim 18
	Wang/Zhang/Elaffendi/Achanta teach Claim 14
Further Wang teaches, 
    PNG
    media_image9.png
    497
    649
    media_image9.png
    Greyscale
 (Section 3.3 ¶04 “while G1 is trained to "fool" both D1 and D2… 
    PNG
    media_image6.png
    29
    323
    media_image6.png
    Greyscale
” Section 4 ¶02 “We experimented various values of the hyperparameter λ, and found that λ = 1 works well in all reported experiments” Examiner notes that the expected value of the discriminators output necessarily is dependent on a function of both the synthetic features (x̃ and h̃ ) corresponding to (x̂ and ĉ) but also the real features (x and h) corresponding to (x̃ and c). Thus, E [ logD1 ] and E [ logD2 ] is equivalent to the presented in the claim. Furthermore, λ is simply a hyper parameter for learning rate, whose sign, positive or negative, defines the loss as a maximization problem or minimization problem. However, Examiner notes that the second alternative equation is not taught by any of the art references in combination because examiner interprets,                        
                            
                                
                                     
                                    f
                                
                                
                                    w
                                    2
                                
                            
                        
                    , to be the mapping of discriminator inputs to outputs, where outputs are probabilities. When combined the art does not teach a discriminator taking random noise, z, as input.) generating, at the generator neural network of the latent space and text-based GAN, the artificial code form the random noise data ( As shown in figure 1, the artificial code h̃ is generated by the Generator G, 
    PNG
    media_image7.png
    323
    454
    media_image7.png
    Greyscale
)


Claims 16 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Wang/Zhang/Elaffendi/Achanta, further still in view of Gulrajani et al. “Improved Training of Wasserstein GANs” hereinafter Gulrajani.

Regarding Claim 16
	Wang/Zhang/Elaffendi/Achanta  teach Claim 12
Wang/Zhang/Elaffendi/Achanta does not explicitly teach, 

    PNG
    media_image10.png
    183
    636
    media_image10.png
    Greyscale
 
Gulrajani however, when addressing issues related to improving the training stability for generative models in GANs teaches, 
    PNG
    media_image10.png
    183
    636
    media_image10.png
    Greyscale
 ((Section 4 ¶01 “To circumvent tractability issues, we enforce a soft version of the constraint with a penalty on the gradient norm for random samples xˆ ∼ Pxˆ. Our new objective is: …
    PNG
    media_image11.png
    71
    562
    media_image11.png
    Greyscale
 ” Examiner notes that this gradient penalty corresponds to the gradient policy presented by the claim. Furthermore, the critic loss is described here… Section 2.2 “where D is the set of 1-Lipschitz functions and Pg is once again the model distribution implicitly defined by x̃ = G(z)” Thus the functions in the red boxes correspond to each other because they both define the expectation of the discriminator with artificial inputs, while the functions in the green boxes define the expectation of the discriminator with the real inputs.)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a modified adversarial critic loss function that includes as gradient penalty as taught by Gulrajani to the disclosed invention of Wang/Zhang/Elaffendi
One of ordinary skill in the arts would have been motivated to make this modification because in order to demonstrate a “strong modeling performance and stability across a variety of architectures…. our work opens the path for stronger modeling performance on large-scale image datasets and language… adapting our penalty term to the standard GAN objective function, where it might stabilize training by encouraging the discriminator to learn smoother decision boundaries” (Gulrajani Conclusion)	

Regarding Claim 17
	Wang/Zhang/Elaffendi/Achanta teach Claim 13
Wang/Zhang/Elaffendi/Achanta does not explicitly teach, 

    PNG
    media_image12.png
    466
    646
    media_image12.png
    Greyscale

Gulrajani however, when addressing issues related to improving the training stability for generative models in GANs teaches, 
    PNG
    media_image12.png
    466
    646
    media_image12.png
    Greyscale
 ((Section 4 ¶01 “To circumvent tractability issues, we enforce a soft version of the constraint with a penalty on the gradient norm for random samples xˆ ∼ Pxˆ. Our new objective is: …
    PNG
    media_image11.png
    71
    562
    media_image11.png
    Greyscale
 ” Examiner notes that this gradient penalty corresponds to the gradient policy presented by the claim. The gradient policy for both presented alternative equations are understood to represent the same function,                         
                            
                                
                                    f
                                
                                
                                    w
                                    2
                                
                            
                        
                    , where both c̅ and c̅1 corresponds to x̂. Furthermore, the critic loss is described here… Section 2.2 “where D is the set of 1-Lipschitz functions and Pg is once again the model distribution implicitly defined by x̃ = G(z)” Thus the functions in the green boxes correspond to each other because they each define the expectation of the discriminator with artificial inputs, whether latent or reconstructed, while the functions in the red boxes define the expectation of the discriminator with the real inputs, whether latent or reconstructed. In this case the set of 1-Lipschitz functions includes the mapping of c and ĉ through the discriminator, such that ĉ corresponds to x̃ and c corresponds to x. However, Examiner notes that the second alternative equation is not taught by any of the art references in combination because examiner interprets,                        
                            
                                
                                     
                                    f
                                
                                
                                    w
                                    2
                                
                            
                        
                    , to be the mapping of discriminator inputs to outputs, wherein the outputs are probabilites. When combined the art does not teach a discriminator taking random noise, z, as input.)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a modified adversarial critic loss function that includes as gradient penalty as taught by Gulrajani to the disclosed invention of Wang/Zhang/Elaffendi/Achanta.
One of ordinary skill in the arts would have been motivated to make this modification because in order to demonstrate a “strong modeling performance and stability across a variety of architectures…. our work opens the path for stronger modeling performance on large-scale image datasets and language… adapting our penalty term to the standard GAN objective function, where it might stabilize training by encouraging the discriminator to learn smoother decision boundaries” (Gulrajani Conclusion)	

Claims 22 are rejected under 35 U.S.C. 103 as being unpatentable over Wang/Zhang/Elaffendi/Achanta, further still in view of Olabiyi US Document ID US 10152970 B1.

Regarding Claim 16
	Wang/Zhang/Elaffendi/Achanta  teach Claim 12
	
Wang/Zhang/Elaffendi/Achanta does not explicitly teach, receiving, at the decoder neural network, the random noise data, and outputting, by the decoder neural network, the softmax representation of the artificial text generated by the decoder neural network from the random noise data.
Olabiyi however when addressing the use of a decoder in a seq2seq Gan framework teaches, receiving, at the decoder neural network, the random noise data (Column 12 line 28-30 “in the disclosed example, Gaussian noise may be injected at the input of the decoder RNN. Noise samples could be injected at the utterance and/or word level”)
Wang/Zhang/Elaffendi/Achanta when combined with XXX , outputting, by the decoder neural network, the softmax representation of the artificial text generated by the decoder neural network from the random noise data. (as noted in the rejection of claim 10, Wang/Zhang/Elaffendi/Achanta when combined with XX the decoder output already taught by Wang/Zhang/Elaffendi/Achanta is generated in part from the random noise data
 It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a decoder which takes as input the gaussian noise as taught by Olabiyi to the disclosed invention of Wang/Zhang/Elaffendi/Achanta.
One of ordinary skill in the arts would have been motivated to make this modification because in order to allow a model to improve upon conventional models, Olabiyi notes “the systems and methods described herein sample the noise distribution along the lines of conditional generative adversarial networks to generate several possible responses and select the one that is ranked best by the discriminator” in turn “producing longer, more informative and more diverse responses even with limited training data.” (Olabiyi Column 9 and 10 lines 65-9)

Allowable Subject Matter
Claim 1-9, 19 and 21 are allowable.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/J.R.G./
Examiner, Art Unit 2122
                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122