Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 01/31/2020 and 03/17/2020 were filed before the mailing date of the first office action. The submissions are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.	

Claim Interpretation
Examiner notes that the preamble to claim 2 recites “a combined code formed by concatenating the encoder direct output with a vector of output means of predetermined internal parameters of the encoder is used as input for a combination of at least one of the steps”. The broadest reasonable interpretation of “a combination of at least one step” is one step, therefore Examiner is interpreting that claim 2 only requires that a combined code be used as input for one of the “regularizing the embedding space”, “identifying whether a data item is adversarial or unnatural”, or “classifying data items not identified as adversarial or unnatural” steps. 
Similarly, the preamble of claim 3 recites “allowing an additional boolean input where one or both” – the broadest reasonable interpretation of claim 3 only requires either that the unnatural/adversarial data is not included in the regularization or a training procedure that attempts to classify unnatural/adversarial data. The specification does not describe this step with respect to a boolean input (paragraphs [0052] and [0089] of Applicant’s specification only describe boolean values with respect to a kernel density estimator that decides whether images are natural or not); therefore for purposes of examination, Examiner is interpreting that when unnatural/adversarial data is detected, the boolean input is a trigger to decide when to remove the unnatural/adversarial data from input to the regularization procedure or to activate a training procedure to try and classify the detected unnatural/adversarial data.
Claim 4 recites the limitation “the embedding space vectors of a subset of the input data items including all natural and non-adversarial items are expected to follow a simple prior distribution”. This limitation can be read in multiple ways but does not rise to the level of a clarity issue requiring a 112(b) rejection. However, in order to clarify the record, Examiner is interpreting that the subset is of the input data items, wherein the items are all the natural and non-adversarial items but the subset does not necessarily contain all natural and non-adversarial items. If Applicant intends the claim to be interpreted otherwise the claim should be amended accordingly; Applicant is encouraged to reach out to the Examiner with any questions regarding claim interpretation.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3 and 13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Claim 3 recites the limitation “an identification of each of the one or more input data items as an adversarial or unnatural input admits a training procedure encouraging that input items known to be unnatural or adversarial are correctly identified”. Applicant’s disclosure does not make clear how a training procedure would “encourage” unnatural or adversarial input items to be correctly identified. Therefore, for purposes of examination, Examiner is interpreting that the training procedure in claim 3 classifies the identified unnatural/adversarial data.
Claim 13 contains the limitations of claim 3 and is rejected for the same reasons.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-3 and 5-7 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Makhzani et al (“Adversarial Autoencoders”, herein Makhzani).
Regarding claim 1, Makhzani teaches a method for detecting adversarial examples (the abstract recites “we propose the “adversarial autoencoder” (AAE), which is a probabilistic autoencoder that uses the recently proposed generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution. Matching the aggregated posterior to the prior ensures that generating from any part of prior space results in meaningful samples. As a result, the decoder of the adversarial autoencoder learns a deep generative model that maps the imposed prior to the data distribution. We show how the adversarial autoencoder can be used in applications such as semi-supervised classification, disentangling style and content of images, unsupervised clustering, dimensionality reduction and data visualization (i.e. a method for detecting adversarial examples)), comprising:
generating encoder direct output by projecting, via an encoder, one or more input data items to a low-dimensional embedding vector of reduced dimensionality with respect to the one or more input data items to form a low-dimensional embedding space (pg. 12 para. 2 recites “we present an adversarial autoencoder architecture for dimensionality reduction and
 data visualization purposes. We will show that in these autoencoders, the adversarial regularization attaches the hidden code of similar images to each other and thus prevents the manifold fracturing problem that is typically encountered in the embeddings learnt by the autoencoders.” The description of fig. 10 recites “dimensionality reduction with adversarial autoencoders: There are two separate adversarial networks that impose Categorical and Gaussian distribution on the latent representation. The final n dimensional representation is constructed by first mapping the one-hot label representation to an n dimensional cluster head representation and then adding the result to an n dimensional style representation” (i.e. using an encoder to reduce the dimensionality of input data));
regularizing the low-dimensional embedding space via a training procedure such that the one or more input data items produce embedding space vectors whose global distribution is expected to follow a simple prior distribution (pg. 5 para. 2 recites “In adversarial training procedure of our method, a much simpler distribution (e.g., Gaussian as opposed to the data distribution) is imposed in a much lower dimensional space (e.g., 20 as opposed to 1000) which results in a better test-likelihood. Pg. 9 para. 3 recites “in the regularization phase, each of the adversarial networks first updates their discriminative network to tell apart the true samples (generated using the Categorical and Gaussian priors) from the generated samples (the hidden codes computed by the autoencoder)” (i.e. regularizing the data to follow a prior distribution). Pg. 12 para. 2 recites “we present an adversarial autoencoder architecture for dimensionality reduction and data visualization purposes. We will show that in these autoencoders, the adversarial regularization attaches the hidden code of similar images to each other and thus prevents the manifold fracturing problem that is typically encountered in the embeddings learnt by the autoencoders.” (i.e. using the regularization method from pg. 9 on the reduced low-dimensional embedding space));
identifying whether each of the one or more input data items is an adversarial or unnatural input (pg. 2 para. 1 recites “The discriminator model, D(x), is a neural network that computes the probability that a point x in data space is a sample from the data distribution (positive samples) that we are trying to model, rather than a sample from our generative model (negative samples). Concurrently, the generator uses a function G(z) that maps samples z from the prior p(z) to the data space. G(z) is trained to maximally confuse the discriminator into believing that samples it generates come from the data distribution” (i.e. identifying whether an input data item is adversarial. Examiner’s Note: one of ordinary skill in the art would understand that the negative samples are equivalent to an adversarial input)); and
classifying, at least during the training procedure, at least those input data items which have not been identified as adversarial or unnatural into one of a plurality of classes (figure 3 and pg. 5 para. 5 – pg. 6 para. 1 recite “Figure 3 demonstrates the training procedure for this semi-supervised approach. We add a one-hot vector to the input of the discriminative network to associate the label with a mode of the distribution. The one-hot vector acts as switch that selects the corresponding decision boundary of the discriminative network given the class label. For example, in the case of imposing a mixture of 10 2-D Gaussians (Figure 2b and 4a), the one hot vector contains 11 classes. Each of the first 10 class selects a decision boundary for the corresponding individual mixture component. The extra class in the one-hot vector corresponds to unlabeled training points. When an unlabeled point is presented to the model, the extra class is turned on, to select the decision boundary for the full mixture of Gaussian distribution” (i.e. classifying items that have not been identified as adversarial into a plurality of classes)).
Regarding claim 2, Makhzani teaches the method as recited in claim 1, where a combined code formed by concatenating the encoder direct output with a vector of output means of predetermined internal parameters of the encoder is used as input (pg. 3 para. 5 recites “Here we assume that q(z|x) is a Gaussian distribution whose mean and variance is predicted by the encoder network: zi ~ N(ui(x); σ-i(x)). In this case, the stochasticity in q(z) comes from both the data-distribution and the randomness of the Gaussian distribution at the output of the encoder (i.e. the encoder output is concatenated with the parameterization vector)) for a combination of at least one of the steps of: regularizing the embedding space by enforcing the combined code to follow a simple prior (pg. 9 para. 3 recites “in the regularization phase, each of the adversarial networks first updates their discriminative network to tell apart the true samples (generated using the Categorical and Gaussian priors) from the generated samples (the hidden codes computed by the autoencoder)” (i.e. regularizing the input to follow a simple prior)); identifying whether a data item is adversarial or unnatural (pg. 2 para. 1 recites “The discriminator model, D(x), is a neural network that computes the probability that a point x in data space is a sample from the data distribution (positive samples) that we are trying to model, rather than a sample from our generative model (negative samples). Concurrently, the generator uses a function G(z) that maps samples z from the prior p(z) to the data space. G(z) is trained to maximally confuse the discriminator into believing that samples it generates come from the data distribution” (i.e. identifying whether an input data item is adversarial. Examiner’s Note: one of ordinary skill in the art would understand that the negative samples are equivalent to an adversarial input)); and classifying at least those input data items not identified as adversarial or unnatural into one of a plurality of classes (figure 3 and pg. 5 para. 5 – pg. 6 para. 1 recite “Figure 3 demonstrates the training procedure for this semi-supervised approach. We add a one-hot vector to the input of the discriminative network to associate the label with a mode of the distribution. The one-hot vector acts as switch that selects the corresponding decision boundary of the discriminative network given the class label. Pg. 9 para. 3 recites “in the semi-supervised classification phase, the autoencoder updates q(y|x) to minimize the cross-entropy cost on a labeled mini-batch” (i.e. classifying data items into a plurality of classes).
Regarding claim 3, Makhzani teaches the method as recited in claim 1, wherein the input data items are one or more input data items that are adversarially generated, or unnatural input data that matches none of the plurality of classes, allowing an additional boolean input where one or both of: unnatural or adversarially generated input data is not included in a training of a regularization procedure, and an identification of each of the one or more input data items as an adversarial or unnatural input admits a training procedure encouraging that input items known to be unnatural or adversarial are correctly identified (figure 3 and pg. 5 para. 5 – pg. 6 para. 1 recite “Figure 3 demonstrates the training procedure for this semi-supervised approach. We add a one-hot vector to the input of the discriminative network to associate the label with a mode of the distribution. The one-hot vector acts as switch that selects the corresponding decision boundary of the discriminative network given the class label. This one-hot vector has an extra class for unlabeled examples. For example, in the case of imposing a mixture of 10 2-D Gaussians (Figure 2b and 4a), the one hot vector contains 11 classes. Each of the first 10 class selects a decision boundary for the corresponding individual mixture component. The extra class in the one-hot vector corresponds to unlabeled training points. When an unlabeled point is presented to the model, the extra class is turned on, to select the decision boundary for the full mixture of Gaussian distribution (i.e. when the classification a data item does not match the plurality of classes, a procedure is performed to correctly identify that data item. Examiner’s Note: see the claim interpretation of claim 3 for the broadest reasonable interpretation of what claim 3 requires)).
Regarding claim 5, Makhzani teaches the method as recited in claim 1, wherein the simple prior distribution used for regularization is a multidimensional Gaussian distribution (pg. 4 para. 2 recites “Figure 2a shows the coding space z of the test data resulting from an adversarial autoencoder trained on MNIST digits in which a spherical 2-D Gaussian prior distribution is imposed on the hidden codes z. The learned manifold in Figure 2a exhibits sharp transitions indicating that the coding space is filled and exhibits no “holes” (i.e. the prior distribution used for regularization is a Gaussian distribution)).
Regarding claim 6, Makhzani teaches the method as recited in claim 1, further comprising: identifying adversarial or unnatural input data items by differentiating regions of embedding space that have low adversarial density from regions of embedding space that have high adversarial density (pg. 7 para. 2 recites “We evaluate the performance of the adversarial autoencoder by computing its log-likelihood on the hold out test set. Evaluation of the model using likelihood is not straightforward because we cannot directly compute the probability of an image. Thus, we calculate a lower bound of the true log-likelihood using the methods described in prior work. We fit a Gaussian Parzen window (kernel density estimator) to 10; 000 samples generated from the model and compute the likelihood of the test data under this distribution. The free-parameter σ of the Parzen window is selected via cross-validation” (i.e. using a kernel density estimator would differentiate spaces with low density from regions of high density)).
Regarding claim 7, Makhzani teaches the method as recited in claim 1, wherein the reduced dimensionality of the low- dimensional embedding vector is selected from one of <512 and <1024 (pg. 5 para. 2 recites “In adversarial training procedure of our method, a much simpler distribution (e.g., Gaussian as opposed to the data distribution) is imposed in a much lower dimensional space (e.g., 20 as opposed to 1000) which results in a better test-likelihood” (i.e. the dimensionality of the embedding vector is <512 or <1024)).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4 and 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Makhzani et al (“Adversarial Autoencoders”, herein Makhzani) in view of Zhao et al (“Adversarially Regularized Autoencoders”, herein Zhao).
Regarding claim 4, Makhzani teaches the method as recited in claim 1, and training the encoder to force pre-selected subsets of internal hidden parameters, at different levels of abstraction, to follow other simple distributions (pg. 6 para. 4 recites “This method may be extended to arbitrary distributions with no parametric forms – as demonstrated by mapping the MNIST data set onto a “swiss roll” (a conditional Gaussian distribution whose mean is uniformly distributed along the length of a swiss roll axis). Figure 4c depicts the coding space z and Figure 4d highlights the images generated by walking along the swiss roll axis in the latent space” (i.e. the encoder can use other distributions besides the Gaussian)). 
However, Makhzani does not explicitly teach minimizing a penalized form of Wasserstein distance to train the encoder to produce embedding space vectors such that the embedding space vectors of a subset of the input data items including all natural and non-adversarial items are expected to follow a simple prior distribution.
Zhao teaches minimizing a penalized form of Wasserstein distance to train the encoder to produce embedding space vectors such that the embedding space vectors of a subset of the input data items including all natural and non-adversarial items are expected to follow a simple prior distribution (pg. 1 right column para. 2 recites “we extend the adversarial autoencoder (AAE) (Makhzani et al., 2015) to discrete sequences/structures. Similar to the AAE, our model learns an encoder from an input space to an adversarially regularized continuous latent space. However unlike the AAE which utilizes a fixed prior, we instead learn a parameterized prior as a GAN”. Para. 3 recites “This adversarially regularized (i.e. from Makhzani) autoencoder (ARAE) can further be formalized under the recently-introduced Wasserstein autoencoder (WAE) framework”. Pg. 2 left column para. 5 recites “WGAN training uses the following min-max optimization over generator ϴ and critic w,

    PNG
    media_image1.png
    38
    352
    media_image1.png
    Greyscale

where fw : Z [Wingdings font/0xE0] R denotes the critic function, z is obtained from the generator, z = gϴ(s), and P* and Pz are real and generated distributions. If the critic parameters w are restricted to a         1-Lipschitz function set W, this term correspond to minimizing Wasserstein-1 distance W(P*; Pz). Pg. 2 right column para. 3 recites “the model consists of a discrete autoencoder regularized with a prior distribution, 

    PNG
    media_image2.png
    51
    290
    media_image2.png
    Greyscale

Here W is the Wasserstein distance between PQ, the distribution from a discrete encoder model (i.e. encφ(x) where x ~ P*), and Pz, a prior distribution. As above, the W function
is computed with an embedded critic function which is optimized adversarially to the generator and encoder (i.e. minimizing a Wasserstein distance such that the encoder produces vectors that follow a simple prior distribution)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by using the Wasserstein autoencoder from Zhao with the adversarial autoencoder from Makhzani. Zhao explicitly states on page 1, right column paragraph 2 “In this work, we extend the adversarial autoencoder (AAE) (Makhzani et al., 2015) to discrete sequences/structures. Similar to the AAE, our model learns an encoder from an input space to an adversarially regularized continuous latent space.” As such, Zhao does not need to restate the teachings of Makhzani, but one of ordinary skill would understand that the combination is obvious; Zhao uses a Wasserstein autoencoder to improve the performance of the adversarial autoencoder from Makhzani.

Regarding claim 8, Makhzani teaches the method as recited in claim 1. 
However, Makhzani does not explicitly teach wherein the encoder comprises a parameterized function mapping inputs to an embedding layer.
Zhao teaches wherein the encoder comprises a parameterized function mapping inputs to an embedding layer (pg. 2 left column para. 2 recites “Our discrete autoencoder will consist of two parameterized functions: a deterministic encoder function encφ : X [Wingdings font/0xE0] Z with parameters φ that maps from input space to code space, and a conditional decoder pψ(x|z) over structures X with parameters ψ” (i.e. the encoder is a parametrized function mapping inputs to an embedded code space. Examiner’s Note: Zhao pg. 2 left column para. 4 also teaches that generative adversarial networks as a whole are parametrized models, based on this knowledge, one of ordinary skill would also recognize that the adversarial encoder from Makhzani would also use parameterized functions)).
See claim 4 for motivation to combine.
Regarding claim 9, Makhzani teaches the method as recited in claim 1. 
However, Makhzani does not explicitly teach wherein classifying the one or more data items further comprises: applying a parameterized classifier followed by a predictor that has an output of a class label, and a classification loss promoting correct label predictions.
Zhao teaches wherein classifying the one or more data items further comprises: applying a parameterized classifier followed by a predictor that has an output of a class label, and a classification loss promoting correct label predictions (pg. 3 right column para. 2 recites “To adapt ARAE to this setup, we modify the objective to learn to remove attribute distinctions from the prior (i.e. we want the prior to encode all the relevant information except about y). Following similar techniques from other domains, notably in images and video modeling, we introduce a latent space attribute classifier: 

    PNG
    media_image3.png
    55
    445
    media_image3.png
    Greyscale

where Lclass(φ , u) is the loss of a classifier pu(y|z) from latent variable to labels (in our experiments we always set λ(2) =1). This requires two more update steps: (2b) training the classifier, and (3b) adversarially training the encoder to this classifier” (i.e. a parameterized classifier that outputs a class label and a classification loss)).
	See claim 4 for motivation to combine.

Claims 10-13, 15-16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Makhzani et al (“Adversarial Autoencoders”, herein Makhzani) in view of Tran et al (US 11036857 B2, herein Tran).
Regarding claim 10, Makhzani teaches the method as recited in claim 1, wherein the one or more input data items further comprises a labeled input dataset of non-adversarial data (figure 6 and pg. 8 para. 2-3 recite “we first focus on the fully supervised scenarios and discuss an architecture of adversarial autoencoders that can separate the class label information from the image style information. We then extend this architecture to the semi-supervised settings in Section 5. In order to incorporate the label information, we alter the network architecture of Figure 1 to provide a one-hot vector encoding of the label to the decoder” (i.e. supervised learning requires a labeled input dataset)). 
However, Makhzani does not explicitly teach wherein the one or more input data items are augmented by adversarial examples from at least one adversarial attack method.
Tran teaches wherein the one or more input data items are augmented by adversarial examples from at least one adversarial attack method (col. 1 lines 36-41 recite the method includes generating a first adversarial example by modifying an original input in accordance with an attack tactic, wherein the machine learning model accurately classifies the original input but does not accurately classify at least the first adversarial example (i.e. augmenting input data using an adversarial attack method)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by adversarial attack methods from Tran to augment the input data used by the adversarial autoencoder from Makhzani. Tran and Makhzani are both directed to using autoencoders to identify adversarial data (Tran column 7 lines 54-55 recite “Model 136 may be any type of AI model such as an image classifier, auto-encoder, text processor, etc.”); and while Makhzani teaches using a generator to augment data to try and fool the discriminator, Makhzani does not explicitly teach using adversarial attack methods to do so. One of ordinary skill would understand that combining the known technique of adversarial attack methods to improve a similar autoencoder model from Makhzani would yield a predictable result.
Claim 11 is a system claim and its limitation is included in claim 1. The only difference is that claim 11 requires a system (Tran col. 12 lines 5-10 recite the present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention). Therefore, claim 11 is rejected for the same reasons as claim 1.
Claim 12 is a system claim and its limitation is included in claim 2. Claim 12 is rejected for the same reasons as claim 2.
Claim 13 is a system claim and its limitation is included in claim 3. Claim 13 is rejected for the same reasons as claim 3.
Claim 15 is a system claim and its limitation is included in claim 6. Claim 15 is rejected for the same reasons as claim 6.
Claim 16 is a system claim and its limitation is included in claim 7. Claim 16 is rejected for the same reasons as claim 7.
Claim 19 is a non-transitory computer readable storage medium claim and its limitation is included in claim 1. The only difference is that claim 19 requires a non-transitory computer readable storage medium (Tran col. 12 lines 29-35 recite a computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire). Therefore, claim 19 is rejected for the same reasons as claim 1.

Claims 14 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Makhzani et al (“Adversarial Autoencoders”, herein Makhzani) in view of Tran et al (US 11036857 B2, herein Tran), in further view of Zhao et al (“Adversarially Regularized Autoencoders”, herein Zhao).
Regarding claim 14, the combination of Makhzani and Tran teaches the system as recited in claim 11, and training the encoder to force pre-selected subsets of internal hidden parameters, at different levels of abstraction, to follow other simple distributions (Makhzani pg. 6 para. 4 recites “This method may be extended to arbitrary distributions with no parametric forms – as demonstrated by mapping the MNIST data set onto a “swiss roll” (a conditional Gaussian distribution whose mean is uniformly distributed along the length of a swiss roll axis). Figure 4c depicts the coding space z and Figure 4d highlights the images generated by walking along the swiss roll axis in the latent space” (i.e. the encoder can use other distributions besides the Gaussian)). 
However, the combination of Makhzani and Tran does not explicitly teach minimizing a penalized form of Wasserstein distance to train the encoder to produce embedding space vectors such that the embedding space vectors of a subset of the input data items including all natural and non-adversarial items are expected to follow a simple prior distribution.
Zhao teaches minimizing a penalized form of Wasserstein distance to train the encoder to produce embedding space vectors such that the embedding space vectors of a subset of the input data items including all natural and non-adversarial items are expected to follow a simple prior distribution (pg. 1 right column para. 2 recites “we extend the adversarial autoencoder (AAE) (Makhzani et al., 2015) to discrete sequences/structures. Similar to the AAE, our model learns an encoder from an input space to an adversarially regularized continuous latent space. However unlike the AAE which utilizes a fixed prior, we instead learn a parameterized prior as a GAN”. Para. 3 recites “This adversarially regularized (i.e. from Makhzani) autoencoder (ARAE) can further be formalized under the recently-introduced Wasserstein autoencoder (WAE) framework”. Pg. 2 left column para. 5 recites “WGAN training uses the following min-max optimization over generator ϴ and critic w,

    PNG
    media_image1.png
    38
    352
    media_image1.png
    Greyscale

where fw : Z [Wingdings font/0xE0] R denotes the critic function, z is obtained from the generator, z = gϴ(s), and P* and Pz are real and generated distributions. If the critic parameters w are restricted to a         1-Lipschitz function set W, this term correspond to minimizing Wasserstein-1 distance W(P*; Pz). Pg. 2 right column para. 3 recites “the model consists of a discrete autoencoder regularized with a prior distribution, 

    PNG
    media_image2.png
    51
    290
    media_image2.png
    Greyscale

Here W is the Wasserstein distance between PQ, the distribution from a discrete encoder model (i.e. encφ(x) where x ~ P*), and Pz, a prior distribution. As above, the W function
is computed with an embedded critic function which is optimized adversarially to the generator and encoder (i.e. minimizing a Wasserstein distance such that the encoder produces vectors that follow a simple prior distribution)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by using the Wasserstein autoencoder from Zhao with the adversarial autoencoder from Makhzani (as modified by Zhao). Zhao explicitly states on page 1, right column paragraph 2 “In this work, we extend the adversarial autoencoder (AAE) (Makhzani et al., 2015) to discrete sequences/structures. Similar to the AAE, our model learns an encoder from an input space to an adversarially regularized continuous latent space.” As such, Zhao does not need to restate the teachings of Makhzani, but one of ordinary skill would understand that the combination is obvious; Zhao uses a Wasserstein autoencoder to improve the performance of the adversarial autoencoder from Makhzani.

Regarding claim 17, the combination of Makhzani and Tran teaches the system as recited in claim 11. 
However, the combination of Makhzani and Tran does not explicitly teach wherein the encoder comprises a parameterized function mapping inputs to an embedding layer.
Zhao teaches wherein the encoder comprises a parameterized function mapping inputs to an embedding layer (pg. 2 left column para. 2 recites “Our discrete autoencoder will consist of two parameterized functions: a deterministic encoder function encφ : X [Wingdings font/0xE0] Z with parameters φ that maps from input space to code space, and a conditional decoder pψ(x|z) over structures X with parameters ψ” (i.e. the encoder is a parametrized function mapping inputs to an embedded code space. Examiner’s Note: Zhao pg. 2 left column para. 4 also teaches that generative adversarial networks as a whole are parametrized models, based on this knowledge, one of ordinary skill would also recognize that the adversarial encoder from Makhzani would also use parameterized functions)).
See claim 14 for motivation to combine.
Regarding claim 18, the combination of Makhzani and Tran teaches the system as recited in claim 11. 
However, the combination of Makhzani and Tran does not explicitly teach wherein classifying the one or more data items further comprises: applying a parameterized classifier followed by a predictor that has an output of a class label, and a classification loss promoting correct label predictions.
Zhao teaches wherein classifying the one or more data items further comprises: applying a parameterized classifier followed by a predictor that has an output of a class label, and a classification loss promoting correct label predictions (pg. 3 right column para. 2 recites “To adapt ARAE to this setup, we modify the objective to learn to remove attribute distinctions from the prior (i.e. we want the prior to encode all the relevant information except about y). Following similar techniques from other domains, notably in images and video modeling, we introduce a latent space attribute classifier: 

    PNG
    media_image3.png
    55
    445
    media_image3.png
    Greyscale

where Lclass(φ , u) is the loss of a classifier pu(y|z) from latent variable to labels (in our experiments we always set λ(2) =1). This requires two more update steps: (2b) training the classifier, and (3b) adversarially training the encoder to this classifier” (i.e. a parameterized classifier that outputs a class label and a classification loss)).
	See claim 14 for motivation to combine.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20200106805 A1 (Gronat et al) teaches using a Gaussian autoencoder to identify malicious activity in computer data.
US 20200380364 A1 (Sun et al) teaches generating an adversarial probabilistic regularizer using a discriminator of a generative adversarial network in order to solve an optimization problem.
“Stabilizing training of Generative Adversarial Networks through regularization” (Roth et al) teaches a regularization approach with low computational cost in order to yield a stable GAN training procedure.
“Improved Training of Wasserstein GANs” (Gulrajani et al) teaches an improvement to Wasserstein generative adversarial networks that penalizes the norm of gradient of the critic with respect to its input.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEAH M FEITL whose telephone number is (571)272-8350. The examiner can normally be reached on M-F 0800-1700.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
	/L.M.F./             Examiner, Art Unit 2121         


	/Jue Louie/
	Primary Examiner, Art Unit 2121