DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
The following claims are pending in this office action: 1-2, 5-13, 15, and 17-19
The following claims are amended: 1, 5-6, 13, 15, and 17
The following claims are new: None
The following claims are cancelled: 3-4, 14, 16, and 20
The following claims are rejected: 1-2, 5-13, 15, and 17-19
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/11/2021 has been entered.
Response to Arguments
Applicant’s arguments filed amendments on 11/03/2021 to address the 35 U.S.C. 112(b) rejection. In response to the Applicant’s amendments, the 35 U.S.C. 112(b) rejection has been withdrawn.
Applicant’s arguments filed on 11/03/2021 to address the 35 U.S.C. 103 rejection have been fully considered but are moot because the new ground of rejection does not rely on any 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5-6, 12-13, 15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub. No. US 20170230675 A1 to Wierstra, et al. (hereinafter, “Wierstra”), in view of U.S. Patent No. US 10909419 B2 to Itou, et al. (hereinafter, “Itou”)
As per claim 1, Wierstra teaches:
a memory storing a data structure that comprises a machine learning model, the machine learning model comprising a variational autoencoder, the variational autoencoder including a encoder and a decoder (Wierstra, Para. [0064] discloses “The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.” And Fig. 1 discloses Encoder neural network 110 and Fig. 1 discloses decoder system 150)
wherein the encoder is configured to; (Wierstra, Fig. 1 discloses Encoder neural network 110)
receive data (Wierstra, Para. [0016] discloses “The encoder system 100 receives an input image”)
generate a continuous probability distribution associated with the data; (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution…” (A Gaussian distribution is a continuous probability distribution))
sample at least one latent variable from the continuous probability distribution to generate a plurality of samples; (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution from which the latent variables are sampled” and Para. [0030] discloses “To generate the compressed representation 122, the compression subsystem 120 uses as the compression latent variables the latent variables that correspond to a predetermined number of highest levels of the hierarchy and does not use the remaining latent variables that correspond to features that are lower in the hierarchy” (sampling a latent variable from a continuous probability distribution results in a plurality of samples which is representative of the compressed representation)
store the plurality of samples to enable retrieval by the decoder (Wierstra, Para. [0037] discloses “In some implementations, the encoder system 100 and the decoder system 150 are implemented on the same set of one or more computers, i.e., when the compression is being used to reduce the storage size of the image when stored locally by the set of one or more computers. In these implementations, the encoder system 120 stores the compressed representation 122 in a local memory accessible by the one or more computers so that the compressed representation can be accessed by the decoder system 150.”)
wherein the decoder is configured to: (Wierstra, Fig. 1 discloses decoder system 150)
retrieve the stored plurality of samples (Wierstra, Para. [0037] discloses “In some implementations, the encoder system 100 and the decoder system 150 are implemented on the same set of one or more computers, i.e., when the compression is being used to reduce the storage size of the image when stored locally by the set of one or more computers. In these implementations, the encoder system 120 stores the compressed representation 122 in a local memory accessible by the one or more computers so that the compressed representation can be accessed by the decoder system 150.”)
and generate reconstructed data from the plurality of samples (Wierstra, Fig. 3 discloses receiving a compressed representation 302 and generating a reconstructed image 308)
and at least one programmable processor communicatively coupled with the memory to access the machine learning model, the at least one programmable processor configured to (Wierstra, Para. [0061] discloses “The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.”)
	Wierstra fails to explicitly teach:

and generate, based on the reconstruction error, an indication representing whether a specific record within the received data was used to train the machine learning model comprising the variational autoencoder
	However, Itou teaches:
compute a reconstruction error by determining a distance between the reconstructed data and the data (Itou, Col. 1, Lines 48-52 discloses “For this reason, an abnormality (reconstruction error) detected by the auto encoder is determined on the basis of a Euclidean distance from data to a manifold (for example, a distance D1 shown in FIG. 7).”)
and generate, based on the reconstruction error, an indication representing whether a specific record within the received data was used to train the machine learning model comprising the variational autoencoder (Itou, Col. 3, Lines 33-40 discloses “A reconstruction error shows a difference between normal data and reconstruction data generated by compressing and decoding the normal data. It is possible to detect abnormal data by detecting and identifying the reconstruction error. Hereinafter, the phase in which the abnormality detection processing is performed by the abnormality detection device 1A refers to “abnormality detection stage”.” (Indication corresponding to a determination of an abnormality wherein analysis of the reconstruction error results in insights into if trained data was used to train the machine learning model. Note that Para. [0024] of the Applicant’s disclosure states “The value of the function can quantify a leakage (i.e. quantification of how much data from outside the training set is being used to train the VAE)…” where the function is to a reconstruction error thus, one of ordinary skill in the art needs to simply look at the reconstruction error in order to be able to quantify a leakage of training data.))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify the variational autoencoder system for generating reconstructed data as disclosed by Wierstra to use the reconstruction error computation as disclosed by Itou. The combination would have been obvious because a person of ordinary skill in the art would be motivated to “modeling major trends in data and discovering data that should not originally exist” (Itou, Col. 1, Lines 22-23)

As per claim 2, the combination of Wierstra and Itou as shown above teaches the system of claim 1, Wierstra further teaches:
wherein the data comprises at least one of text and images (Wierstra, Para. [0016] discloses “The encoder system 100 receives an input image”)

As per claim 5, the combination of Wierstra and Itou as shown above teaches the system of claim 1, Wierstra further teaches wherein:
the encoder is a first neural network; (Wierstra, Para. [0020] discloses “The encoder neural network 110 is a neural network that has been configured through training to process the input image 102 to generate latent variable data 112 for the input image 102.”)
(Wierstra, Para. [0039] discloses “The decoder system 150 includes a reconstruction subsystem 160 and a generative neural network 170.”)

As per claim 6, the combination of Wierstra and Itou as shown above teaches the system of claim 1, Wierstra further teaches wherein the machine learning model further comprises:
a storage configured to store the plurality of samples prior to the generation of the reconstructed data. (Wierstra, Para. [0037] discloses “In some implementations, the encoder system 100 and the decoder system 150 are implemented on the same set of one or more computers, i.e., when the compression is being used to reduce the storage size of the image when stored locally by the set of one or more computers. In these implementations, the encoder system 120 stores the compressed representation 122 in a local memory accessible by the one or more computers so that the compressed representation can be accessed by the decoder system 150.”)

As per claim 12, the combination of Wierstra and Itou as shown above teaches the system of claim 1, Itou further teaches:
wherein the reconstruction error affects a quantification of a leakage of training data used to train the autoencoder (Itou, Col. 3, Lines 33-40 discloses “A reconstruction error shows a difference between normal data and reconstruction data generated by compressing and decoding the normal data. It is possible to detect abnormal data by detecting and identifying the reconstruction error. Hereinafter, the phase in which the abnormality detection processing is performed by the abnormality detection device 1A refers to “abnormality detection stage”.” (Indication corresponding to a determination of an abnormality wherein analysis of the reconstruction error results in insights into if trained data was used to train the machine learning model. Note that Para. [0024] of the Applicant’s disclosure states “The value of the function can quantify a leakage (i.e. quantification of how much data from outside the training set is being used to train the VAE)…” where the function is to a reconstruction error thus, one of ordinary skill in the art needs to simply look at the reconstruction error in order to be able to quantify a leakage of training data.))
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Wierstra with the teachings of Itou for at least the same reasons as discussed above in claim 1

As per claim 13, Wierstra teaches:
receiving, by an encoder of an autoencoder, a first plurality of record wherein the autoencoder comprises a variational autoencoder, the variational autoencoder including the encoder and decoder (Wierstra, Para. [0016] discloses “The encoder system 100 receives an input image 102” And Fig. 1 discloses Encoder neural network 110 and Fig. 1 discloses decoder system 150)
generating, by the encoder of the autoencoder and based on the first plurality of records, a continuous probability distribution associated with the first plurality of records (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution…”)
sampling, by encoder of the autoencoder, at least one latent variable from the continuous probability distribution (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution from which the latent variables are sampled” and Para. [0030] discloses “To generate the compressed representation 122, the compression subsystem 120 uses as the compression latent variables the latent variables that correspond to a predetermined number of highest levels of the hierarchy and does not use the remaining latent variables that correspond to features that are lower in the hierarchy” (sampling a latent variable from a continuous probability distribution results in a plurality of samples which is representative of the compressed representation)
storing, by the encoder of the autoencoder, the plurality of samples to enable retrieval by the decoder (Wierstra, Para. [0037] discloses “In some implementations, the encoder system 100 and the decoder system 150 are implemented on the same set of one or more computers, i.e., when the compression is being used to reduce the storage size of the image when stored locally by the set of one or more computers. In these implementations, the encoder system 120 stores the compressed representation 122 in a local memory accessible by the one or more computers so that the compressed representation can be accessed by the decoder system 150.”)
(Wierstra, Para. [0037] discloses “In some implementations, the encoder system 100 and the decoder system 150 are implemented on the same set of one or more computers, i.e., when the compression is being used to reduce the storage size of the image when stored locally by the set of one or more computers. In these implementations, the encoder system 120 stores the compressed representation 122 in a local memory accessible by the one or more computers so that the compressed representation can be accessed by the decoder system 150.”)
generating, by the decoder of the autoencoder, reconstructed data based on the latent variable, the reconstructed data characterizing a reconstruction of the first plurality of records (Wierstra, Fig. 3 discloses receiving a compressed representation 302 and generating a reconstructed image 308 (reconstructed data characterizes a reconstruction of input that is based on latent variables))
Wierstra fails to explicitly teach:
computing, by at least one processor operably coupled to the autoencoder, a reconstruction error by determining a value of a function associated with a distance between the reconstructed data and the first plurality of records
and generating, by the at least one processor, a first indication representing whether a specific record of the first plurality of records has been used for training the autoencoder comprising the variational autoencoder
However, Itou teaches:
computing, by at least one processor operably coupled to the autoencoder, a reconstruction error by determining a value of a function associated with a distance between (Itou, Col. 1, Lines 48-52 discloses “For this reason, an abnormality (reconstruction error) detected by the auto encoder is determined on the basis of a Euclidean distance from data to a manifold (for example, a distance D1 shown in FIG. 7).”)
and generating, by the at least one processor, a first indication representing whether a specific record of the first plurality of records has been used for training the autoencoder comprising the variational autoencoder (Itou, Col. 3, Lines 33-40 discloses “A reconstruction error shows a difference between normal data and reconstruction data generated by compressing and decoding the normal data. It is possible to detect abnormal data by detecting and identifying the reconstruction error. Hereinafter, the phase in which the abnormality detection processing is performed by the abnormality detection device 1A refers to “abnormality detection stage”.” (Indication corresponding to a determination of an abnormality wherein analysis of the reconstruction error results in insights into if trained data was used to train the machine learning model. Note that Para. [0024] of the Applicant’s disclosure states “The value of the function can quantify a leakage (i.e. quantification of how much data from outside the training set is being used to train the VAE)…” where the function is to a reconstruction error thus, one of ordinary skill in the art needs to simply look at the reconstruction error in order to be able to quantify a leakage of training data.))
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Wierstra with the teachings of Itou for at least the same reasons as discussed above in claim 1
As per claim 15, Wierstra teaches A non-transitory computer-readable medium storing instructions that, when executed by a computer, cause a system comprising a machine learning model and at least one programmable processor communicatively coupled to the machine learning model to perform operations comprising:
receiving, by an encoder of an autoencoder, a first plurality of record wherein the autoencoder comprises a variational autoencoder, the variational autoencoder including the encoder and decoder (Wierstra, Para. [0016] discloses “The encoder system 100 receives an input image 102” And Fig. 1 discloses Encoder neural network 110 and Fig. 1 discloses decoder system 150)
generating, by the encoder of the autoencoder and based on the first plurality of records, a continuous probability distribution associated with the first plurality of records (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution…”)
sampling, by encoder of the autoencoder, at least one latent variable from the continuous probability distribution (Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution from which the latent variables are sampled” and Para. [0030] discloses “To generate the compressed representation 122, the compression subsystem 120 uses as the compression latent variables the latent variables that correspond to a predetermined number of highest levels of the hierarchy and does not use the remaining latent variables that correspond to features that are lower in the hierarchy” (sampling a latent variable from a continuous probability distribution results in a plurality of samples which is representative of the compressed representation)
storing, by the encoder of the autoencoder, the plurality of samples to enable retrieval by the decoder (Wierstra, Para. [0037] discloses “In some implementations, the encoder system 100 and the decoder system 150 are implemented on the same set of one or more computers, i.e., when the compression is being used to reduce the storage size of the image when stored locally by the set of one or more computers. In these implementations, the encoder system 120 stores the compressed representation 122 in a local memory accessible by the one or more computers so that the compressed representation can be accessed by the decoder system 150.”)
retrieving, by the decoder of the autoencoder, the stored plurality of samples (Wierstra, Para. [0037] discloses “In some implementations, the encoder system 100 and the decoder system 150 are implemented on the same set of one or more computers, i.e., when the compression is being used to reduce the storage size of the image when stored locally by the set of one or more computers. In these implementations, the encoder system 120 stores the compressed representation 122 in a local memory accessible by the one or more computers so that the compressed representation can be accessed by the decoder system 150.”)
generating, by the decoder of the autoencoder, reconstructed data based on the latent variable, the reconstructed data characterizing a reconstruction of the first plurality of records (Wierstra, Fig. 3 discloses receiving a compressed representation 302 and generating a reconstructed image 308 (reconstructed data characterizes a reconstruction of input that is based on latent variables))
Wierstra fails to explicitly teach:
computing, by at least one processor operably coupled to the autoencoder, a reconstruction error by determining a value of a function associated with a distance between the reconstructed data and the first plurality of records
and generating, by the at least one processor, a first indication representing whether a specific record of the first plurality of records has been used for training the autoencoder comprising the variational autoencoder
However, Itou teaches:
computing, by at least one processor operably coupled to the autoencoder, a reconstruction error by determining a value of a function associated with a distance between the reconstructed data and the first plurality of records (Itou, Col. 1, Lines 48-52 discloses “For this reason, an abnormality (reconstruction error) detected by the auto encoder is determined on the basis of a Euclidean distance from data to a manifold (for example, a distance D1 shown in FIG. 7).”)
and generating, by the at least one processor, a first indication representing whether a specific record of the first plurality of records has been used for training the autoencoder comprising the variational autoencoder (Itou, Col. 3, Lines 33-40 discloses “A reconstruction error shows a difference between normal data and reconstruction data generated by compressing and decoding the normal data. It is possible to detect abnormal data by detecting and identifying the reconstruction error. Hereinafter, the phase in which the abnormality detection processing is performed by the abnormality detection device 1A refers to “abnormality detection stage”.” (Indication corresponding to a determination of an abnormality wherein analysis of the reconstruction error results in insights into if trained data was used to train the machine learning model. Note that Para. [0024] of the Applicant’s disclosure states “The value of the function can quantify a leakage (i.e. quantification of how much data from outside the training set is being used to train the VAE)…” where the function is to a reconstruction error thus, one of ordinary skill in the art needs to simply look at the reconstruction error in order to be able to quantify a leakage of training data.))
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Wierstra with the teachings of Itou for at least the same reasons as discussed above in claim 1

As per claim 17, the combination of Wierstra and Itou as shown above teaches the non-transitory computer-readable medium of claim 15, Wierstra further teaches wherein:
the encoder is a first neural network; (Wierstra, Para. [0020] discloses “The encoder neural network 110 is a neural network that has been configured through training to process the input image 102 to generate latent variable data 112 for the input image 102.”)
and the decoder is a second neural network. (Wierstra, Para. [0039] discloses “The decoder system 150 includes a reconstruction subsystem 160 and a generative neural network 170.”)

7 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Wierstra, in view of Itou, further in view of “Variational Autoencoders” to VAE (hereinafter, “VAE”)
As per claim 7, the combination of Wierstra and Itou as shown above teaches the system of claim 1, the combination of Wierstra and Itou fails to explicitly teach:
the continuous probability distribution is a Gaussian distribution represented as N(Eμ(x), EΣ(x)); 
wherein: 
the data is represented as x; 
mean of the continuous probability distribution is represented as Eμ(x); 
and variance of the continuous probability distribution variance is represented as EΣ(x)
However, VAE teaches:

    PNG
    media_image1.png
    120
    308
    media_image1.png
    Greyscale
the continuous probability distribution is a Gaussian distribution represented as N(Eμ(x), EΣ(x)); (VAE discloses the Gaussian distribution being represented as                               )
wherein: 
the data is represented as x; (Calculating the mean and variance of data is represented in the equation. To be able to take the mean and variance means that there has to be data available in the first place)
mean of the continuous probability distribution is represented as Eμ(x); (Mean is represented as μ. A Gaussian distribution is a normal distribution, which is a type of a continuous probability distribution)
(Variance represented as Σ. A Gaussian distribution is a normal distribution, which is a type of a continuous probability distribution)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Wierstra as modified to use Gaussian distributions as disclosed by VAE. The combination would have been obvious because a person of ordinary skill in the art would be motivated to sample latent variables from a Gaussian distribution in order to generate reconstructed data via a decoder thus enabling for a deconstructed output to be produced which additionally allows for a variational autoencoder to quickly generate new samples thus improving speed of a system.

As per claim 18, the combination of Wierstra and Itou as shown above teaches the non-transitory computer-readable medium of claim 15, the combination of Wierstra and Itou fails to explicitly teach:
the continuous probability distribution is a Gaussian distribution represented as N(Eμ(x), EΣ(x)); 
wherein: 
the data is represented as x; 
mean of the continuous probability distribution is represented as Eμ(x); 
and variance of the continuous probability distribution variance is represented as EΣ(x)
However, VAE teaches:

    PNG
    media_image1.png
    120
    308
    media_image1.png
    Greyscale
the continuous probability distribution is a Gaussian distribution represented as N(Eμ(x), EΣ(x)); (VAE discloses the Gaussian distribution being represented as                               )
wherein: 
the data is represented as x; (Calculating the mean and variance of data is represented in the equation. To be able to take the mean and variance means that there has to be data available in the first place)
mean of the continuous probability distribution is represented as Eμ(x); (Mean is represented as μ. A Gaussian distribution is a normal distribution, which is a type of a continuous probability distribution)
and variance of the continuous probability distribution variance is represented as EΣ(x). (Variance represented as Σ. A Gaussian distribution is a normal distribution, which is a type of a continuous probability distribution)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Wierstra with the teachings of VAE for at least the same reasons as discussed above in claim 7

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Wierstra, in view of Itou, further in view of JP. Pub. No. JP 2018073258 A to Yamanaka, et al. (hereinafter, “Yamanaka”)
As per claim 8, the combination of Wierstra and Itou as shown above teaches the system of claim 1, Wierstra further teaches:
(Wierstra, Para. [0021] discloses “In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution from which the latent variables are sampled” (latent variable is one of the plurality of laten variables being sampled)
The combiniation of Wierstra and Itou fails to explicitly teach:
and the decoder minimizes a distance measure between a distribution of the reconstructed data of the plurality of latent variables and the continuous probability distribution
However, Yamanaka teaches:
and the decoder minimizes a distance measure between a distribution of the reconstructed data of the plurality of latent variables and the continuous probability distribution (Yamanaka, Para [0025]-[0026] discloses “In addition, the 2 term of the above equation (1) is called KL divergence. This 2 term represents the distance between the conditional probability distribution q (z x), which produces a latent variable z from the observed data x, and the previous distribution p (z), which is not dependent on the data x. Thus, minimizing the loss function L, represented by the sum of the 2 distances, is to minimize the 2 distances at the same time, meaning that the probability distribution of the latent variable z which is not dependent on the data x as much as possible is determined”  and Para. [0022] discloses “Specifically, the learning unit 15 b learns the input data by optimizing the parameters of the model so as to minimize the loss function L” (The learning unit itself is a variational auto encoder which consists of a encoder and decoder which aims to minimize the loss function. Minimizing the loss function means minimizing the distance measure.)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Wierstra as modified to minimize a distance measure as disclosed by Yamanaka. The combination would have been obvious because a person of ordinary skill in the art would be motivated to be able improve the accuracy of a variational autoencoder as minimizing the distance measure results in an improved output of a decoder.

Claim 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wierstra, in view of Itou, further in view of Yamanaka, and further in view of “Guide to Autoencoders” to Srinivasan (hereinafter, “Srinivasan”)
As per claim 9, the combination of Wierstra. Itou and Yamanaka as shown above teaches the system of claim 8, Yamanaka further teaches:

    PNG
    media_image2.png
    99
    291
    media_image2.png
    Greyscale
and the distance measure is a Kullback-Leibler divergence, the Kullback-Leibler divergence being represented as KL(N(E (x), E:(x)))IIN(0,1) (Yamanaka discloses in equation 1 an equivalent Kullback-Leibler divergence                                     )
The combination of Wierstra, Itou, and Yamanaka fails to explicitly teach:
the data is represented as x; 
the latent variables are represented as zi (i = 1, ...., n); 
the reconstructed data is represented as D(zi); 

    PNG
    media_image3.png
    60
    220
    media_image3.png
    Greyscale
the distance between the reconstructed data and the data is represented as IID(zi)- (x)II;
 the reconstruction error is represented as 
However, Srinivasan teaches:
the data is represented as x; (Data represented in the loss function)
the latent variables are represented as zi (i = 1, ...., n); (Latent variables are represented in the loss function)
the reconstructed data is represented as D(zi); (Reconstructed data represented in the loss function)
the distance between the reconstructed data and the data is represented as IID(zi)- (x)II;  (Loss function such as squared error calculates the distance between two points which is indicative of the reconstructed data and the data)

    PNG
    media_image3.png
    60
    220
    media_image3.png
    Greyscale

    PNG
    media_image4.png
    123
    590
    media_image4.png
    Greyscale
the reconstruction error is represented as                                                      (Srinivisan discloses an average reconstruction error equation that is equivalent  
                                                                   where L is a loss function such as squared error)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Wierstra as modified to use reconstruction error as disclosed by Srinivasan. The combination would have been obvious because a person of ordinary skill in the art would be motivated to be able to generate reconstruction errors from reconstructed data which will allow one to know how accurately data has been reconstructed by a variational autoencoder thus enabling one to make an objective determination regarding the accuracy.
As per claim 19, the combination of Wierstra, and Itou as shown above teaches the non-transitory computer-readable medium of claim 15, the combination of Wierstra and Itou fails to explicitly teach: 
and the distance measure is a Kullback-Leibler divergence, the Kullback-Leibler divergence being represented as KL(N(E (x), E:(x)))IIN(0,1)
However, Yamanaka further teaches:

    PNG
    media_image2.png
    99
    291
    media_image2.png
    Greyscale
and the distance measure is a Kullback-Leibler divergence, the Kullback-Leibler divergence being represented as KL(N(E (x), E:(x)))IIN(0,1) (Yamanaka discloses in equation 1 an equivalent Kullback-Leibler divergence                                     )
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Wierstra with the teachings of Yamanaka for at least the same reasons as discussed above in claim 8
The combination of Wierstra, and Itou fails to explicitly teach:
the data is represented as x; 
the latent variables are represented as zi (i = 1, ...., n); 
the reconstructed data is represented as D(zi); 

    PNG
    media_image3.png
    60
    220
    media_image3.png
    Greyscale
the distance between the reconstructed data and the data is represented as IID(zi)- (x)II;
 the reconstruction error is represented as 
However, Srinivasan teaches:
the data is represented as x; (Data represented in the loss function)
(Latent variables are represented in the loss function)
the reconstructed data is represented as D(zi); (Reconstructed data represented in the loss function)
the distance between the reconstructed data and the data is represented as IID(zi)- (x)II;  (Loss function such as squared error calculates the distance between two points which is indicative of the reconstructed data and the data)

    PNG
    media_image3.png
    60
    220
    media_image3.png
    Greyscale

    PNG
    media_image4.png
    123
    590
    media_image4.png
    Greyscale
the reconstruction error is represented as                                                      (Srinivisan discloses an average reconstruction error equation that is equivalent  
                                                                   where L is a loss function such as squared error)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Wierstra with the teachings of Srinivisan for at least the same reasons as discussed above in claim 9

Claim 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Wierstra, in view of Itou, further in view of Yamanaka, and further in view Srinivasan, and further in view of U.S. Pub. No. US 20180275642 A1 to Tajima, et al. (hereinafter, “Tajima”)
As per claim 10, the combination of Wierstra, Itou, Yamanaka, and Srinivasan as shown above teaches the system of claim 9, the combination of Wierstra, Itou, Yamanaka, and Srinivasan fails to explicitly teach:
wherein the representation for the reconstruction error results in about 100% accurate prediction of whether the data has been used for training
However, Tajima teaches:
wherein the representation for the reconstruction error results in about 100% accurate prediction of whether the data has been used for training (Tajima, Para. [0158] discloses “By the use of the statistical predictive model for the calculation of reconstruction error, the arithmetic device is able to calculate accurately predicted values.” (Accurately predicated values equates to about an 100% accurate prediction))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Wierstra as modified to indicate that the reconstruction error results in an accurate prediction as disclosed by Tajima. The combination would have been obvious because a person of ordinary skill in the art would be motivated to know that the reconstruction error results in an accurate prediction.

As per claim 11, the combination of Wierstra, Itou, Yamanaka, Srinivasan, and Tajima as shown above teaches the system of claim 10, Tajima further teaches:
wherein the about 100% in accuracy is 98% or more in accuracy. (Tajima, Para. [0158] discloses “By the use of the statistical predictive model for the calculation of reconstruction error, the arithmetic device is able to calculate accurately predicted values.” (Accurately predicted values equates to 100% accurate prediction which is greater than 98% prediction accuracy))
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Wierstra with the teachings of Tajima for at least the same reasons as discussed above in claim 10
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAMZA RAZZAQ MUGHAL whose telephone number is (571)272-8833. The examiner can normally be reached M-TR 7:30-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/H.R.M./Examiner, Art Unit 2123                                  
                                                                                                                                                                      /NICHOLAS KLICOS/Primary Examiner, Art Unit 2145