DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings are deemed acceptable for the purpose of examination.
Specification
The specification is deemed acceptable for the purpose of examination.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 14 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Claim 14 recites “receiving, by the autoencoder, a second plurality of records that are not used for generating the continuous probability distribution” and “generating, by the autoencoder, a second indication representing whether a specific record within the first plurality of records or the second plurality of records has been used for training”. It is not clear 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 12-13, and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub. No. US 20170230675 A1 to Wierstra, et al. (hereinafter, “Wierstra”), in view of U.S. Pub. No. US 20170169357 A1 to Caspi, et al. (hereinafter, “Caspi”)
As per claim 1, Wierstra teaches:
receive data (Wierstra, Para. [0016] discloses “The encoder system 100 receives an input image”)
generate a continuous probability distribution associated with the data; (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution…” (A Gaussian distribution is a continuous probability distribution))
sample a latent variable from the continuous probability distribution to generate a plurality of samples; (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution from which the latent variables are sampled” and Para. [0030] discloses “To generate the compressed representation 122, the compression subsystem 120 uses as the compression latent variables the latent variables that correspond to a predetermined number of highest levels of the hierarchy and does not use the remaining latent variables that correspond to features that are lower in the hierarchy” (sampling a latent variable from a continuous probability distribution results in a plurality of samples which is representative of the compressed representation)
and generate reconstructed data from the plurality of samples (Wierstra, Fig. 3 discloses receiving a compressed representation 302 and generating a reconstructed image 308)
	Wierstra fails to explicitly teach:
a memory storing a data structure that comprises a machine learning model, the machine learning model configured to
and at least one programmable processor communicatively coupled with the memory to access the machine learning model, the at least one programmable processor configured to
	compute a reconstruction error by determining a distance between the reconstructed data and the data
and generate, based on the reconstruction error, an indication representing whether a specific record within the received data was used to train the machine learning model
	However, Caspi (Caspi addresses the issue of detecting anomalies in a dataset using autoencoders) teaches:
a memory storing a data structure that comprises a machine learning model, the machine learning model configured to (Caspi, Para. [0161] discloses “The invention further contemplates a machine-readable memory tangibly embodying a program of instructions executable by the machine for executing one or more methods of the invention.”  (instructions consist of a machine learning model))
(Caspi, Para. [0046] discloses “The term “processing unit” covers any computing unit or electronic unit that may perform tasks based on instructions stored in a memory, such as a computer, a server, a chip, etc.”)
compute a reconstruction error by determining a distance between the reconstructed data and the data (Caspi, Para. [0146] discloses “According to some embodiments, the reconstruction error is computed for each data. For instance, a reconstruction error is computed for each Output unit i, by computing the difference between the reconstruction of Input i (i.e., the output of Output unit i) and the true Input i. Other formulas may be used to express the reconstruction error, such as statistical values (e.g., entropy, variance, covariance, etc.)” (Reconstruction error computes a distance))
and generate, based on the reconstruction error, an indication representing whether a specific record within the received data was used to train the machine learning model (Caspi, Para. [0158] discloses “According to some embodiments, the system 1 may provide indications that anomalies exist in data when the reconstruction error of a subset of data is above a predetermined value (i.e., a threshold)” (Anomalies correspond to data that has been received that has been used as training data))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify the variational autoencoder system for generating reconstructed data as disclosed by Wierstra to use the reconstruction error computation as disclosed by Caspi. The combination would have been obvious because a 

	As per claim 3, the combination of Wierstra and Caspi as shown above teaches the system of claim 1, Wierstra further teaches wherein the machine learning model comprises:
	an encoder that performs (Wierstra, Fig. 1 discloses Encoder neural network 110)
	the receiving of the data (Wierstra, Fig. 1 discloses receiving input image 102 into the encoder system 100),
and the generating of the continuous probability distribution (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution);
and a decoder that performs (Wierstra, Fig. 1 discloses decoder system 150)
the generating of the reconstructed data (Wierstra, Fig. 1 discloses reconstructed image 172 being generated from the decoder system 150)

As per claim 4, the combination of Wierstra and Caspi as shown above teaches the system of claim 3, Wierstra further teaches:
wherein the encoder is a variational autoencoder (Wierstra, Para. [0021] discloses “In particular, the encoder neural network 110 has been trained as the encoder neural network of a variational auto encoder”)

As per claim 5, the combination of Wierstra and Caspi as shown above teaches the system of claim 3, Wierstra further teaches wherein:
the encoder is a first neural network; (Wierstra, Para. [0020] discloses “The encoder neural network 110 is a neural network that has been configured through training to process the input image 102 to generate latent variable data 112 for the input image 102.”)
and the decoder is a second neural network. (Wierstra, Para. [0039] discloses “The decoder system 150 includes a reconstruction subsystem 160 and a generative neural network 170.”)

As per claim 6, the combination of Wierstra and Caspi as shown above teaches the system of claim 3, Wierstra further teaches wherein the machine learning model further comprises:
a storage configured to store the plurality of samples prior to the generation of the reconstructed data. (Wierstra, Para. [0037] discloses “In some implementations, the encoder system 100 and the decoder system 150 are implemented on the same set of one or more computers, i.e., when the compression is being used to reduce the storage size of the image when stored locally by the set of one or more computers. In these implementations, the encoder system 120 stores the compressed representation 122 in a local memory accessible by the one or more computers so that the compressed representation can be accessed by the decoder system 150.”)

As per claim 11, the combination of Wierstra and Caspi as shown above teaches the system of claim 1, Caspi further teaches:
wherein the reconstruction error affects a quantification of a leakage of training data used to train the autoencoder (Caspi, Para. [0158] discloses “According to some embodiments, the system 1 may provide indications that anomalies exist in data when the reconstruction error of a subset of data is above a predetermined value (i.e., a threshold)” (Anomalies correspond to data that has been received that has beee used as training data. This indication is affects a quantification indicating that training data is within data received))
Same motivation to combine WIerstra and Caspi as claim 1

As per claim 13, Wierstra teaches:
receiving, by an autoencoder, ((a first plurality of records)) (Wierstra, Para. [0016] discloses “The encoder system 100 receives an input image” (A plurality of records to be taught by Caspi below))
generating, by the autoencoder and based on the first plurality of records, a continuous probability distribution associated with the first plurality of records (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution…” (plurality of records to be taught by Caspi below))
sampling, by the autoencoder, a latent variable from the continuous probability distribution (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution from which the latent variables are sampled” and Para. [0030] discloses “To generate the compressed representation 122, the compression subsystem 120 uses as the compression latent variables the latent variables that correspond to a predetermined number of highest levels of the hierarchy and does not use the remaining latent variables that correspond to features that are lower in the hierarchy” (sampling a latent variable from a continuous probability distribution results in a plurality of samples which is representative of the compressed representation)
generating, by the autoencoder, reconstructed data based on the latent variable, the reconstructed data characterizing a reconstruction of the first plurality of records (Wierstra, Fig. 3 discloses receiving a compressed representation 302 and generating a reconstructed image 308 (reconstructed data characterizes a reconstruction of input that is based on latent variables))
Wierstra fails to explicitly teach:
a first plurality of records
computing, by at least one processor operably coupled to the autoencoder, a reconstruction error by determining a value of a function associated with a distance between the reconstructed data and the first plurality of records
and generating, by the at least one processor, a first indication representing whether a specific record of the first plurality of records has been used for training the autoencoder
However, Caspi teaches:
a first plurality of records (Caspi, Para. [0020] discloses “he system being configured to receive a plurality of data of the device”)
(Caspi, Para. [0146] discloses “According to some embodiments, the reconstruction error is computed for each data. For instance, a reconstruction error is computed for each Output unit i, by computing the difference between the reconstruction of Input i (i.e., the output of Output unit i) and the true Input i. Other formulas may be used to express the reconstruction error, such as statistical values (e.g., entropy, variance, covariance, etc.)” and Para. [0046] discloses “The term “processing unit” covers any computing unit or electronic unit that may perform tasks based on instructions stored in a memory, such as a computer, a server, a chip, etc. It encompasses a single processor or multiple processors…”)
and generating, by the at least one processor, a first indication representing whether a specific record of the first plurality of records has been used for training the autoencoder (Caspi, Para. [0158] discloses “According to some embodiments, the system 1 may provide indications that anomalies exist in data when the reconstruction error of a subset of data is above a predetermined value (i.e., a threshold)” and Para. [0046] discloses “The term “processing unit” covers any computing unit or electronic unit that may perform tasks based on instructions stored in a memory, such as a computer, a server, a chip, etc. It encompasses a single processor or multiple processors…”  (Anomalies correspond to data that has been received that has been used as training data))
Same motivation to combine Wierstra and Caspi as claim 1

As per claim 15, Wierstra teaches A non-transitory computer-readable medium storing instructions that, when executed by a computer, cause a system comprising a machine learning model and at least one programmable processor communicatively coupled to the machine learning model to perform operations comprising:
receiving data (Wierstra, Para. [0016] discloses “The encoder system 100 receives an input image”)
generating a continuous probability distribution associated with the data; (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution…”)
sampling a latent variable from the continuous probability distribution to generate a plurality of samples; (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution from which the latent variables are sampled” and Para. [0030] discloses “To generate the compressed representation 122, the compression subsystem 120 uses as the compression latent variables the latent variables that correspond to a predetermined number of highest levels of the hierarchy and does not use the remaining latent variables that correspond to features that are lower in the hierarchy” (sampling a latent variable from a continuous probability distribution results in a plurality of samples which is representative of the compressed representation)
generating reconstructed data from the plurality of samples (Wierstra, Fig. 3 discloses receiving a compressed representation 302 and generating a reconstructed image 308)
Wierstra fails to explicitly teach:
computing a reconstruction error by determining a distance between the reconstructed data and the data
and generating, based on the reconstruction error, an indication representing whether a specific record within the received data was used to train the machine learning model
However, Caspi teaches:
computing a reconstruction error by determining a distance between the reconstructed data and the data (Caspi, Para. [0146] discloses “According to some embodiments, the reconstruction error is computed for each data. For instance, a reconstruction error is computed for each Output unit i, by computing the difference between the reconstruction of Input i (i.e., the output of Output unit i) and the true Input i. Other formulas may be used to express the reconstruction error, such as statistical values (e.g., entropy, variance, covariance, etc.)”)
and generating, based on the reconstruction error, an indication representing whether a specific record within the received data was used to train the machine learning model (Caspi, Para. [0158] discloses “According to some embodiments, the system 1 may provide indications that anomalies exist in data when the reconstruction error of a subset of data is above a predetermined value (i.e., a threshold)” (Anomalies correspond to data that has been received that has been used as training data))
Same motivation to combine Wierstra and Caspi as claim 1
As per claim 16, the combination of Wierstra and Caspi as shown above teaches the non-transitory computer-readable medium of claim 15, Wierstra further teaches wherein the machine learning model comprises:
an encoder that performs (Wierstra, Fig. 1 discloses Encoder neural network 110)
	the receiving of the data (Wierstra, Fig. 1 discloses receiving input image 102 into the encoder system 100),
and the generating of the continuous probability distribution (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution);
and a decoder that performs (Wierstra, Fig. 1 discloses decoder system 150)
the generating of the reconstructed data (Wierstra, Fig. 1 discloses reconstructed image 172 being generated from the decoder system 150)

As per claim 17, the combination of Wierstra and Caspi as shown above teaches the non-transitory computer-readable medium of claim 16, Wierstra further teaches wherein:
the encoder is a variational autoencoder (Wierstra, Para. [0021] discloses “In particular, the encoder neural network 110 has been trained as the encoder neural network of a variational auto encoder”)
the encoder is a first neural network; (Wierstra, Para. [0020] discloses “The encoder neural network 110 is a neural network that has been configured through training to process the input image 102 to generate latent variable data 112 for the input image 102.”)
(Wierstra, Para. [0039] discloses “The decoder system 150 includes a reconstruction subsystem 160 and a generative neural network 170.”)

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Wierstra, in view of Caspi, further in view of U.S. Pub. No. US 20180101770 A1 to Tanaka, et al. (hereinafter, “Tanaka”)
As per claim 2, the combination of Wierstra and Caspi as shown above teaches the system of claim 1, the combination of Wierstra and Caspi fails to explicitly teach:
wherein the data comprises at least one of text and images
However, Tanaka (Tanaka addresses the issue of developing a generative (i.e. auto encoder) model) teaches:
wherein the data comprises at least one of text and images (Tanaka, Para. [0028] discloses “The train data can be image data, text data, or video data” (If a model is trained on a specific category of data, then data input to the model must also be of the same category of data that is used to train the model otherwise the model will not be able to classify/predict the data))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Wierstra as modified to use data that consists of text and images as disclosed by Tanaka. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the versatility of a .

Claim 7 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Wierstra, in view of Caspi, further in view of Variational Autoencoders (hereinafter, “VAE”)
As per claim 7, the combination of Wierstra and Caspi as shown above teaches the system of claim 1, the combination of Wierstra and Caspi fails to explicitly teach:
the continuous probability distribution is a Gaussian distribution represented as N(Eμ(x), EΣ(x)); 
wherein: 
the data is represented as x; 
mean of the continuous probability distribution is represented as Eμ(x); 
and variance of the continuous probability distribution variance is represented as EΣ(x)
However, VAE (VAE addresses Gaussian distributions in variational autoencoders) teaches:

    PNG
    media_image1.png
    120
    308
    media_image1.png
    Greyscale
the continuous probability distribution is a Gaussian distribution represented as N(Eμ(x), EΣ(x)); (VAE discloses the Gaussian distribution being represented as                               )
wherein: 
the data is represented as x; (Calculating the mean and variance of data is represented in the equation. To be able to take the mean and variance means that there has to be data available in the first place)
(Mean is represented as μ. A Gaussian distribution is a normal distribution, which is a type of a continuous probability distribution)
and variance of the continuous probability distribution variance is represented as EΣ(x). (Variance represented as Σ. A Gaussian distribution is a normal distribution, which is a type of a continuous probability distribution)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Wierstra as modified to use Gaussian distributions as disclosed by VAE. The combination would have been obvious because a person of ordinary skill in the art would be motivated to be able to sample latent variables from a Gaussian distribution in order to be able to proceed with generating reconstructed data via a decoder in a variational autoencoder.

As per claim 18, the combination of Wierstra and Caspi as shown above teaches the non-transitory computer-readable medium of claim 15, the combination of Wierstra and Caspi fails to explicitly teach:
the continuous probability distribution is a Gaussian distribution represented as N(Eμ(x), EΣ(x)); 
wherein: 
the data is represented as x; 
mean of the continuous probability distribution is represented as Eμ(x); 
and variance of the continuous probability distribution variance is represented as EΣ(x)
However, VAE teaches:

    PNG
    media_image1.png
    120
    308
    media_image1.png
    Greyscale
the continuous probability distribution is a Gaussian distribution represented as N(Eμ(x), EΣ(x)); (VAE discloses the Gaussian distribution being represented as                               )
wherein: 
the data is represented as x; (Calculating the mean and variance of data is represented in the equation. To be able to take the mean and variance means that there has to be data available in the first place)
mean of the continuous probability distribution is represented as Eμ(x); (Mean is represented as μ. A Gaussian distribution is a normal distribution, which is a type of a continuous probability distribution)
and variance of the continuous probability distribution variance is represented as EΣ(x). (Variance represented as Σ. A Gaussian distribution is a normal distribution, which is a type of a continuous probability distribution)
Same motivation to combine Wierstra, Caspi and VAE as claim 7

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Wierstra, in view of Caspi, further in view of JP. Pub. No. JP 2018073258 A to Yamanaka, et al. (hereinafter, “Yamanaka”)
As per claim 8, the combination of Wierstra and Caspi as shown above teaches the system of claim 1, Wierstra further teaches:
the latent variable is one of a plurality of latent variables (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution from which the latent variables are sampled” (latent variable is one of the plurality of laten variables being sampled)
The combiniation of Wierstra and Caspi fails to explicitly teach:
and the decoder minimizes a distance measure between a distribution of the reconstructed data of the plurality of latent variables and the continuous probability distribution
However, Yamanaka (Yamanaka addresses the issue of error detection using a variational autoencoder) teaches:
and the decoder minimizes a distance measure between a distribution of the reconstructed data of the plurality of latent variables and the continuous probability distribution (Yamanaka, Para [0025]-[0026] discloses “In addition, the 2 term of the above equation (1) is called KL divergence. This 2 term represents the dista nce between the conditional probability distribution q (z x), which produces a latent variable z from the o bserved data x, and the previous distribution p (z), which is not dependent on the data x. Thus, minimizing the loss function L, represented by the sum of the 2 distances, is to minimize the 2 dist ances at the same time, meaning that the probability distribution of the latent variable z which is not dependent on the data x as much as possible is determined”  and Para. [0022] discloses “Specifically, the learning unit 15 b learns the input data by optimizing the parameters of the model so as to minimize the loss function L” (The learning unit itself is a variational auto encoder which consists of a encoder and decoder which aims to minimize the loss function. Minimizing the loss function means minimizing the distance measure.)
9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wierstra, in view of Caspi, further in view of Yamanaka, and further in view of Guide to Autoencoders (hereinafter, “Srinivasan”)
As per claim 9, the combination of Wierstra. Caspi and Yamanaka as shown above teaches the system of claim 8, Yamanaka further teaches:

    PNG
    media_image2.png
    99
    291
    media_image2.png
    Greyscale
and the distance measure is a Kullback-Leibler divergence, the Kullback-Leibler divergence being represented as KL(N(E (x), E:(x)))IIN(0,1) (Yamanaka discloses in equation 1 an equivalent Kullback-Leibler divergence                                     )
The combination of Wierstra, Caspi, and Yamanaka fails to explicltly teach:
the data is represented as x; 
the latent variables are represented as zi (i = 1, ...., n); 
the reconstructed data is represented as D(zi); 

    PNG
    media_image3.png
    60
    220
    media_image3.png
    Greyscale
the distance between the reconstructed data and the data is represented as IID(zi)- (x)II;
 the reconstruction error is represented as 
However, Srinivasan (Srinivasan addresses the issue of the theory regarding variational autoencoders) teaches:
the data is represented as x; (Data represented in the loss function)
the latent variables are represented as zi (i = 1, ...., n); (Latent variables are represented in the loss function)
the reconstructed data is represented as D(zi); (Reconstructed data represented in the loss function)
(Loss function such as squared error calculates the distance between two points which is indicative of the reconstructed data and the data)

    PNG
    media_image3.png
    60
    220
    media_image3.png
    Greyscale

    PNG
    media_image4.png
    123
    590
    media_image4.png
    Greyscale
the reconstruction error is represented as                                                      (Srinivisan discloses an average reconstruction error equation that is equivalent  
                                                                   where L is a loss function such as squared error)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Wierstra as modified to use reconstruction error as disclosed by Srinivasan. The combination would have been obvious because a person of ordinary skill in the art would be motivated to be able to generate reconstruction errors from reconstructed data which will allow one to know how accurately data has been reconstructed.

As per claim 19, the combination of Wierstra. Caspi as shown above teaches the non-transitory computer-readable medium of claim 15, Yamanaka further teaches:

    PNG
    media_image2.png
    99
    291
    media_image2.png
    Greyscale
and the distance measure is a Kullback-Leibler divergence, the Kullback-Leibler divergence being represented as KL(N(E (x), E:(x)))IIN(0,1) (Yamanaka discloses in equation 1 an equivalent Kullback-Leibler divergence                                     )
The combination of Wierstra, Caspi, and Yamanaka fails to explicltly teach:
the data is represented as x; 
the latent variables are represented as zi (i = 1, ...., n); 
the reconstructed data is represented as D(zi); 

    PNG
    media_image3.png
    60
    220
    media_image3.png
    Greyscale
the distance between the reconstructed data and the data is represented as IID(zi)- (x)II;
 the reconstruction error is represented as 
However, Srinivasan teaches:
the data is represented as x; (Data represented in the loss function)
the latent variables are represented as zi (i = 1, ...., n); (Latent variables are represented in the loss function)
the reconstructed data is represented as D(zi); (Reconstructed data represented in the loss function)
the distance between the reconstructed data and the data is represented as IID(zi)- (x)II;  (Loss function such as squared error calculates the distance between two points which is indicative of the reconstructed data and the data)

    PNG
    media_image3.png
    60
    220
    media_image3.png
    Greyscale

    PNG
    media_image4.png
    123
    590
    media_image4.png
    Greyscale
the reconstruction error is represented as                                                      (Srinivisan discloses an average reconstruction error equation that is equivalent  
                                                                   where L is a loss function such as squared error)
Same motivation to combine Wierstra, Caspi, Yamanaka, and Srinivisan as claim 9

Claim 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Wierstra, in view of Caspi, further in view of Yamanaka, and further in view Srinivasan, and further in view of U.S. Pub. No. US 20180275642 A1 to Tajima, et al. (hereinafter, “Tajima”)
As per claim 10, the combination of Wierstra, Caspi, Yamanaka, and Srinivasan as shown above teaches the system of claim 9, the combination of Wierstra, Caspi, Yamanaka, and Srinivasan fails to explicitly teach:
wherein the representation for the reconstruction error results in about 100% accurate prediction of whether the data has been used for training
However, Tajima (Tajima addresses the issue of anomaly detection using a generative model (i.e. variational autoencoder)) teaches:
wherein the representation for the reconstruction error results in about 100% accurate prediction of whether the data has been used for training (Tajima, Para. [0158] discloses “By the use of the statistical predictive model for the calculation of reconstruction error, the arithmetic device is able to calculate accurately predicted values.” (Accurately predicated values equates to about an 100% accurate prediction))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Wierstra as modified to indicate that the reconstruction error results in an accurate prediction as disclosed by Tajima. The combination would have been obvious because a person of ordinary skill in the art would be motivated to know that the reconstruction error results in an accurate prediction.

As per claim 11, the combination of Wierstra, Caspi, Yamanaka, Srinivasan, and Tajima as shown above teaches the system of claim 10, Tajima further teaches:
wherein the about 100% in accuracy is 98% or more in accuracy. (Tajima, Para. [0158] discloses “By the use of the statistical predictive model for the calculation of reconstruction error, the arithmetic device is able to calculate accurately predicted values.” (Accurately predicted values equates to 100% accurate prediction which is greater than 98% prediction accuracy))
Same motivation to combine Wierstra, Caspi, Yamanaka, Srinivasan, and Tajima as claim 10

Claims 14 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wierstra, in view of Caspi, further in view of Generating Samples from Probability Distributions (hereinafter, “MIT”)
As per claim 14, the combination of Wierstra and Caspi as shown above teaches the method of claim 13, Caspi further teaches further comprising:
receiving, by the autoencoder, a second plurality of records that are not used for generating the continuous probability distribution (Caspi, Para. [0020] discloses “The system being configured to receive a plurality of data…” (Autoencoder disclosed by Wierstra above))
and generating, by the autoencoder, a second indication representing whether a specific record within the first plurality of records or the second plurality of records has been used for training the autoencoder (Caspi, Para. [0158] discloses “According to some embodiments, the system 1 may provide indications that anomalies exist in data when the reconstruction error of a subset of data is above a predetermined value (i.e., a threshold)” (Anomalies correspond to data that has been received that has been used as training data. Autoencoder disclosed by Wierstra above))
The combination of Wierstra and Caspi fails to explicitly teach:

a size of the first plurality of records is same as a size of the second plurality of records
and the first plurality of records and the second plurality of records are drawn from a common probability distribution
However, MIT (MIT addresses the issue of generating samples from probability distributions) teaches:
wherein: 
a size of the first plurality of records is same as a size of the second plurality of records (MiT, Whole document (Records are generated from the same common probability distribution as shown below, thus it is obvious that both records are of the same size)
and the first plurality of records and the second plurality of records are drawn from a common probability distribution (MiT, Whole document (The document discusses random sampling of observations/data from a probability distribution))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Wierstra as modified to use the random sampling of data from a probability distribution as disclosed by MIT. The combination would have been obvious because a person of ordinary skill in the art would be motivated to generate data from a common probability distribution as it reduces sampling bias and allows for one to make an inference regarding a population of data

As per claim 20, the combination of Wierstra and Caspi as shown above teaches the non-transitory computer-readable medium of claim 15, Caspi further teaches further comprising:
receiving another plurality of records that are not used for generating the continuous probability distribution (Caspi, Para. [0020] discloses “The system being configured to receive a plurality of data…” (Autoencoder disclosed by Wierstra above))
and generating another indication representing whether the records within the first plurality of records or the other plurality of records has been used for training the autoencoder (Caspi, Para. [0158] discloses “According to some embodiments, the system 1 may provide indications that anomalies exist in data when the reconstruction error of a subset of data is above a predetermined value (i.e., a threshold)” (Anomalies may correspond to data that has been received that has been used as training data. Autoencoder disclosed by Wierstra above))
The combination of Wierstra and Caspi fails to explicitly teach:
wherein: 
a size of the first plurality of records is same as a size of the other plurality of records
and the first plurality of records and the other plurality of records are drawn from a common probability distribution
However, MIT teaches:
wherein: 
(MiT, Whole document (Records are generated from the same common probability distribution as shown below, thus it is obvious that both records are of the same size)
and the first plurality of records and the other plurality of records are drawn from a common probability distribution (MiT, Whole document (The document discusses random sampling of observations/data from a probability distribution))
Same motivation to combine Wierstra, Caspi, and MIT as claim 14

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAMZA RAZZAQ MUGHAL whose telephone number is 571-272-8833. The examiner can normally be reached on M-TR from 7:30 to 5:00.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV, can be reached at telephone number 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

	/H.R.M./             Examiner, Art Unit 2123                                                                                                                                                                                           
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123