DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-20 are pending under this Office action.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-13, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Goodsitt, etc. (US 11030526 B1) in view of Zhong, etc. (US 20200349447 A1), further in view of Ryu, etc. (US 20200134499 A1).
Regarding claim 1, Goodsitt teaches that a method for operating a neural network (See Goodsitt: Figs. 2-3, and Col. 8 Lines 63-67 ~ Col. 9 Lines 1-5, “FIG. 2B illustrates a method 250 of training a parent model to generate intercorrelated synthetic data, consistent with disclosed embodiments. In some embodiments, data-management system 102 performs steps of process 200. It should be noted that other components of system 100, including, for example, client device 104 and/or third-party system 108 may perform operations of one or more steps of process 200. Process 250 may include training models according to architecture 300, architecture 302, architecture 304, and/or any other architecture consistent with disclosed embodiments”), the method comprising: 
training the neural network (See Goodsitt: Figs. 2-3, and Col. 8 Lines 62-64, “FIG. 2B illustrates a method 250 of training a parent model to generate intercorrelated synthetic data, consistent with disclosed embodiments”), wherein: 
the neural network comprises a variational autoencoder (See Goodsitt: Figs. 2-3, and Col. 8 Lines 20-31, “As shown in the illustration of FIG. 2A, data-management system 102 may provide respective latent-space data to a child model A and a child model B. A child model may include a GAN model, a neural network model, a recurrent neural network (RNN) model, a convolutional neural network (CNN) model, a random forest model, an autoencoder model, a variational autoencoder model, and/or any other machine learning model. A child model may include a synthetic data model (i.e., a model configured to generate synthetic data). As one of skill in the art will appreciate, step 206 may involve a different number of child models than the two depicted in FIG. 2A”), comprising: 
an encoder network (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”) configured: 
to receive a sample of a first random variable (See Goodsitt: Figs. 2-3, and Col. 9 Lines 33-40, “At step 252, data-management system 102 receives a plurality of intercorrelated datasets, consistent with disclosed embodiments. In the example of FIG. 2B, individual datasets of the intercorrelated datasets are represented by boxes at step 252, including a dark-gray box, light-gray box, and a plurality of white boxes. Consistent with the present disclosure, intercorrelated datasets of step 252 may be referred to as training data used to train a parent model”), and to produce a mean and a variance of each of: a first latent variable and a second latent variable (See Goodsitt: Figs. 2-3, and Col. 10 Lines 4-12, “For example, a parent model may generate first latent-space data corresponding to a first intercorrelated dataset and second latent-space data corresponding to a second intercorrelated dataset, etc. In the illustration of FIG. 2B latent-space data corresponding to the plurality of interconnected datasets are represented by the dotted boxes of step 256, including a dark gray dotted and light gray dotted box corresponding to a dark gray box and light gray box depicted in step 252”; and Col. 7 Lines 14-19, “In embodiments consistent with the present disclosure, an intercorrelated dataset may have a data profile including a data schema and/or a statistical profile of a dataset. A statistical profile may include a statistical distribution, a noise factor, a moment (e.g., a mean), a variance, and/or any other statistical metric of a dataset”), or 
to receive a sample of a second random variable (See Goodsitt: Figs. 2-3, and Col. 9 Lines 57-61, “At step 256, a parent model may generate latent-space data, consistent with disclosed embodiments. Consistent with the present disclosure, latent-space data may refer to any data output by a parent model, and latent-space data may be in a different format from an intercorrelated dataset"), and to produce a mean and a variance of each of: the second latent variable and a third latent variable (See Goodsitt: Figs. 2-3, and Col. 10 Lines 13-29, “At step 258, data-management system 102 may provide latent-space data to a plurality of child models, consistent with disclosed embodiments. A child model may include a child model trained according to process 200. In the example of FIG. 2B, data-management system 102 may provide first latent-space data corresponding to a first intercorrelated dataset (dark gray box with dots) to child model A, and data-management system 102 may provide second latent-space data corresponding to a second intercorrelated dataset (dark gray box with dots) to child model B. As one of skill in the art will appreciate, step 258 may include providing latent-space data to a different number of child models than the two depicted in FIG. 2B. In some embodiments, the latent-space data provided to one or more child models partially or wholly overlaps (i.e., shares some or all data elements)”); and 
a decoder network (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”)”) configured: 
to receive a sample of the first latent variable and a sample of the second latent variable, and to generate a generated sample of the first random variable (See Goodsitt: Figs. 2A-B, and Col. 8 Lines 32-38, “At step 208, data-management system 102 may train a plurality of child models to generate synthetic data based on latent-space data, consistent with disclosed embodiments. For example, in the illustration of FIG. 2A, synthetic data are represented by boxes with diagonal shading at step 208, and latent-space data are represented by the many-pointed stars labelled as latent-space data A and latent-space data B”), or 
to receive a sample of the second latent variable and a sample of the third latent variable, and to generate a generated sample of the second random variable (See Goodsitt: Fig. 3, and Col. 12 Lines 4-15, “As an illustrative example of architecture 302, parent model 1 may be configured to generate latent-space-data comprising synthetic price data for a product (i.e., “supply data”). Parent model 2 may be configured to generate latent-space-data comprising synthetic income data associated with a plurality of consumers and social network data associated with the plurality of consumers (i.e., “demand data”). In the example, child models may correspond to the plurality of consumers. Child models may be configured to generate synthetic transaction data associated with their respective consumers based on supply data of parent 1 and demand data of parent 2”), 
the training of the neural network comprising training the variational autoencoder with (See Goodsitt: Fig. 1, and Col. 5 Lines 12-28, “Third-party system 108 may provide data to data-management system. For example, third-party system 108 may provide training data to data-management system 102 and/or a machine learning model, consistent with disclosed embodiments. As an example, third-party system 108 may transmit time series data, music data in an audio format, musical composition data, financial data, demographic data, health data, environmental data, education data, governmental data, and/or any other kind of data. In some embodiments, third-party system 108 provides data to data-management system via a subscription, a feed, a socket, or the like. In some embodiments, third-party system 108 sends a request to third-party system to retrieve data. In some embodiments, third-party system 108 sends a request for correlated synthetic data and/or one or more models configured to generate correlated synthetic data to data-management system”): 
a plurality of samples of the first random variable (See Goodsitt: Figs. 2-3, and Col. 6 Lines 25-32, “For example, in some embodiments, process 200 may include child model output that may include a column of data related to states (state data). Another child model output may include a data column related to cities (city data). A parent model may be trained to reproduce correlations between state data and city data. A parent model output may include a vector of floating-point numbers, for example, which may be passed as input to the child models (i.e., latent space data). In the example, the input to the parent model may also be a vector of floating-point numbers”); and 
a plurality of samples of the second random variable (See Goodsitt: Figs. 2-3, and Col. 6 Lines 25-32, “For example, in some embodiments, process 200 may include child model output that may include a column of data related to states (state data). Another child model output may include a data column related to cities (city data). A parent model may be trained to reproduce correlations between state data and city data. A parent model output may include a vector of floating-point numbers, for example, which may be passed as input to the child models (i.e., latent space data). In the example, the input to the parent model may also be a vector of floating-point numbers”), 
the plurality of samples of the first random variable and the plurality of samples of the second random variable being unpaired (See Goodsitt: Figs. 2A-B, and Col. 8 Lines 57-61, “A similarity metric may be based on a correlation, covariance matrix, a variance, a frequency of overlapping values, or other measure of statistical similarity. Training may include hyperparameter tuning. Training may be supervised or unsupervised”; and Claim 13, “generating, using the second parent model, third latent-space data and fourth latent-space data based on second input data, the second input data at least partially overlapping with the first input data”. Note that unsupervised training may be mapped to unpaired data), 
the training of the neural network comprising updating weights in the neural network based on a first loss function (See Goodsitt: Figs. 1-2, and Col. 8 Lines 47-57, “In some embodiments, training of a child model may terminate when a performance criterion (i.e., training criterion) is satisfied. A training criterion may include a number of epochs, a training time, a performance metric (e.g., an estimate of accuracy in reproducing test data), or the like. Data-management system 102 may be configured to adjust model parameters during training. Model parameters may include weights, coefficients, offsets, or the like. A training criterion may be based on a similarity metric representing a measure of similarity between a synthetic dataset and an original dataset”), the first loss function being based on a measure of deviation from consistency between: 
a conditional generation path from the first random variable to the second random variable, and 
a conditional generation path from the second random variable to the first random variable.
However, Goodsitt fails to explicitly disclose that the first loss function being based on a measure of deviation from consistency between: a conditional generation path from the first random variable to the second random variable, and a conditional generation path from the second random variable to the first random variable.
However, Zhong teaches that the first loss function being based on a measure of deviation from consistency (See Zhong: Figs. 5A-B, and [0077], “As already mentioned above, the outputs E(G(Z)) and E(G(E(G(Z))) of the encoder 506 are used by the generator 502 as part of the loss function of the generator 502, as shown in equation (7) and as illustrated by the dashed line 518. As also already mentioned, the encoder 506 is trained to minimize the absolute difference between E(G(Z)) and E(G(E(G(Z))))”) between: 
a conditional generation path from the first random variable to the second random variable (See Zhong: Figs. 5A-B, and [0070], “The encoder 506 receives as input, along a path 512, the generated sample, G(Z). When the encoder 506 receives G(Z), the encoder 506 outputs a value E(G(Z)). E(G(Z)) is the latent space representation of the ambient space representation G(Z) of the noise Z”), and 
a conditional generation path from the second random variable to the first random variable (See Zhong: Figs. 5A-B, and [0071], “E(G(Z)) is fed back into the generator 502, as shown by a path 514. When the generator 502 receive E(G(Z)), the generator 502 outputs G(E(G(Z))), which is the ambient space representation of latent space representation E(G(Z)). G(E(G(Z))) is fed, along a path 516, to the encoder 506, which then outputs the latent space representation E(G(E(G(Z))))”).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Goodsitt to have the first loss function being based on a measure of deviation from consistency between: a conditional generation path from the first random variable to the second random variable, and a conditional generation path from the second random variable to the first random variable as taught by Zhong in order to enable the image space generator and the latent space encoder to be optimized simultaneously, which results in significantly improved performance (See Zhong: Figs. 5A-B, and [0020], “Implementations and GAN architectures according to this disclosure leverage the Lipschitz continuity condition in the training of GAN architectures according to this disclosure. The Lipschitz continuity condition can be used to introduce an encoder E into a GAN architecture. The encoder introduces, into the GAN architecture, a new latent space regularization. Both the image space generator G and the latent space encoder E can be optimized simultaneously. This results in significantly improved performance”). Goodsitt teaches a method and system that may terminate the neural network parent model training based on test correlation metric and adjusting model parameters; while Zhong teaches a system and method that may optimize the neural network by minimizing the loss functions between the outputs of the neural network. Therefore, it is obvious to one of ordinary skill in the art to modify Goodsitt by Zhong to train the neural network by minimizing the loss functions. The motivation to modify Goodsitt by Zhong is “Use of known technique to improve similar devices (methods, or products) in the same way”.
However, Goodsitt, modified by Zhong, fails to explicitly disclose a conditional generation path from the second random variable to the first random variable.
However, Ryu teaches that a conditional generation path from the second random variable to the first random variable (See Ryu: Figs. 10-13, and [0050], “Inference may be performed by conditional generation, such that given x.sup.(0) the system would conditionally generate Y. If q.sub.ϕ.sub.x(u.sub.x,,w|x) is already trained, then a sample (u.sub.x.sup.(0),w.sup.(0))˜q.sub.ϕ.sub.x(u.sub.x,,w|x.sup.(0)) can be taken and Y can be generated as Y˜p.sub.θ(y|u.sub.y.sup.(0),w.sup.(0)) where u.sub.y.sup.(0)˜p.sub.θ(u.sub.y)”; and claim 6, “The method of claim 4, wherein performing inference comprises conditionally generating the first variable from the second variable”. Note that switching conditional x to y to conditional y to x is not novel).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Goodsitt to have a conditional generation path from the second random variable to the first random variable as taught by Ryu in order to be easily generalized to a multivariate model (See Ryu: Fig. 4, and [0047], “The first algorithm has no hyperparameter to be tuned, thus it can be easily generalized to a multivariate model, whereas the number of hyperparameters in the second algorithm becomes larger for a multivariate model. However, the second algorithm becomes advantageous under a semi-supervised learning setting as there are only a few number of paired samples and comparably more unpaired samples as it can naturally incorporate the semi-supervised dataset”). Goodsitt teaches a method and system that may terminate the neural network parent model training based on test correlation metric and adjusting model parameters; while Ryu teaches a system and method that may develop a joint latent variable model for neural network applicable for stochastic inference between multiple random variables. Therefore, it is obvious to one of ordinary skill in the art to modify Goodsitt by Ryu to generate the conditional output in order to make the neural network algorithms applicable to stochastic inference. The motivation to modify Goodsitt by Ryu is “Use of known technique to improve similar devices (methods, or products) in the same way”.
Regarding claim 2, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 1 as outlined above. Further, Ryu teaches that the method of claim 1, wherein the first loss function includes: 
a first term representing reconstruction loss of the first random variable; 
a second term representing deviations from consistency in the second latent variable; 
a third term representing deviations from consistency in the first latent variable; and 
a fourth term representing deviations from consistency in the third latent variable (See Ryu: Fig. 7, and [0065], “The reconstruction loss 722 is combined with the regularization loss 708 and the regularization loss for common information extraction 710 to determine the loss function 724 as in Equation (27)”), and the screen shot of the equation 27 below.

    PNG
    media_image1.png
    414
    526
    media_image1.png
    Greyscale

Regarding claim 3, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 1 as outlined above. Further, Goodsitt and Ryu teach that the method of claim 1, further comprising performing conditional generation, by the variational autoencoder, the performing of conditional generation comprising: 
receiving, by the encoder network, a sample of the first random variable (See Goodsitt: Figs. 2-3, and Col. 9 Lines 33-40, “At step 252, data-management system 102 receives a plurality of intercorrelated datasets, consistent with disclosed embodiments. In the example of FIG. 2B, individual datasets of the intercorrelated datasets are represented by boxes at step 252, including a dark-gray box, light-gray box, and a plurality of white boxes. Consistent with the present disclosure, intercorrelated datasets of step 252 may be referred to as training data used to train a parent model”); 
producing a mean and a variance of each of: the first latent variable and the second latent variable (See Goodsitt: Figs. 2-3, and Col. 10 Lines 4-12, “For example, a parent model may generate first latent-space data corresponding to a first intercorrelated dataset and second latent-space data corresponding to a second intercorrelated dataset, etc. In the illustration of FIG. 2B latent-space data corresponding to the plurality of interconnected datasets are represented by the dotted boxes of step 256, including a dark gray dotted and light gray dotted box corresponding to a dark gray box and light gray box depicted in step 252”; and Col. 7 Lines 14-19, “In embodiments consistent with the present disclosure, an intercorrelated dataset may have a data profile including a data schema and/or a statistical profile of a dataset. A statistical profile may include a statistical distribution, a noise factor, a moment (e.g., a mean), a variance, and/or any other statistical metric of a dataset”); 
receiving, by the decoder network (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”)”), a sample of each of: 
a distribution having the produced mean and the produced variance of the first latent variable (See Ryu: Fig. 4, and [0040], “If p.sub.θ(z)p.sub.θ(x|z) is given such that p.sub.θ(x)≈p.sub.data(x), then minimizing the objective would ensure that q.sub.ϕ.sub.x(z|x)≈p.sub.θ(x|z)”), 
a distribution having the produced mean and the produced variance of the second latent variable (See Ryu: Fig. 4, and [0040], “Thus, the full latent variable model p.sub.θ(z)p.sub.θ(x|z)p.sub.θ(y|z) can be trained with the assistance of a variational posterior q.sub.ϕ.sub.x,y(z|x,y) to ensure that p.sub.θ(x|z) and p.sub.θ(y|z)”), and 
a distribution having the mean and the variance of a prior distribution of the third latent variable (See Ryu: Fig. 4, and [0040], “are well-fitted to the joint distribution p.sub.data(x,y)”); and 
generating, by the decoder network, a generated sample of the second random variable (See Ryu: Figs. 10-13, and [0050], “Inference may be performed by conditional generation, such that given x.sup.(0) the system would conditionally generate Y. If q.sub.ϕ.sub.x(u.sub.x,,w|x) is already trained, then a sample (u.sub.x.sup.(0),w.sup.(0))˜q.sub.ϕ.sub.x(u.sub.x,,w|x.sup.(0)) can be taken and Y can be generated as Y˜p.sub.θ(y|u.sub.y.sup.(0),w.sup.(0)) where u.sub.y.sup.(0)˜p.sub.θ(u.sub.y)”; and claim 6, “The method of claim 4, wherein performing inference comprises conditionally generating the first variable from the second variable”).
Regarding claim 4, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 1 as outlined above. Further, Goodsitt and Zhong teach that the method of claim 1, further comprising performing joint generation, by the variational autoencoder, the performing of joint generation comprising: 
receiving, by the decoder network, a sample of each of (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”)”): 
the first latent variable (See Goodsitt: Figs. 2A-B, and Col. 8 Lines 32-38, “At step 208, data-management system 102 may train a plurality of child models to generate synthetic data based on latent-space data, consistent with disclosed embodiments. For example, in the illustration of FIG. 2A, synthetic data are represented by boxes with diagonal shading at step 208, and latent-space data are represented by the many-pointed stars labelled as latent-space data A and latent-space data B”), 
the second latent variable (See Goodsitt: Fig. 3, and Col. 12 Lines 4-15, “As an illustrative example of architecture 302, parent model 1 may be configured to generate latent-space-data comprising synthetic price data for a product (i.e., “supply data”). Parent model 2 may be configured to generate latent-space-data comprising synthetic income data associated with a plurality of consumers and social network data associated with the plurality of consumers (i.e., “demand data”). In the example, child models may correspond to the plurality of consumers. Child models may be configured to generate synthetic transaction data associated with their respective consumers based on supply data of parent 1 and demand data of parent 2”), and 
the third latent variable (See Zhong: Figs. 5A-B, and [0069], “The discriminator 504 can receive either the generated sample G(Z) along a path 508 or receive, along a path 510, a target Y”); and 
generating, by the decoder network (See Zhong: Figs. 5A-B, and [0092], “Elements of FIG. 5B having the same numerals as those of FIG. 5A can be exactly as described with respect to FIG. 5A. FIG. 5B includes a discriminator 552 that is similar to the discriminator 506 of FIG. 5A with the exception that the discriminator 506 also receives, along a path 554, the generator output G(E(G(Z))) and the discriminator 552 uses the objective function of equation (11)”): 
a generated sample of the first random variable, based on the first latent variable and the second latent variable (See Zhong: Figs. 5A-B, and [0092], “the generator output G(E(G(Z))) and the discriminator 552 uses the objective function of equation (11)”), 
a generated sample of the second random variable, based on the second latent variable and the third latent variable (See Zhong: Figs. 5A-B, and [0092], “When the discriminator 552 receives G(E(G(Z))), the discriminator 552 outputs a value D(G(E(G(Z)))) indicating whether the discriminator 552 determines the G(E(G(Z))) to be real or generated”).
Regarding claim 6, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 5 as outlined above. Further, Ryu teaches that the method of claim 5, wherein the training comprises updating weights in the neural network based on a first loss function, the first loss function including:
a first term representing reconstruction loss of the first random; a second term representing deviations from consistency in the second latent variable; a third term representing deviations from consistency in the first latent variable; and a fourth term representing deviations from consistency in the third latent variable (See Ryu: Fig. 7, and [0065], “The reconstruction loss 722 is combined with the regularization loss 708 and the regularization loss for common information extraction 710 to determine the loss function 724 as in Equation (27)”), and the screen shot of the equation 27 below:

    PNG
    media_image1.png
    414
    526
    media_image1.png
    Greyscale

Regarding claim 7, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 6 as outlined above. Further, Ryu teaches that the method of claim 6, wherein the first loss function further includes a fifth term based on the discriminative neural network (See Ryu: Fig. 7, and [0065], “The reconstruction loss 722 is combined with the regularization loss 708 and the regularization loss for common information extraction 710 to determine the loss function 724 as in Equation (27)”, Note there are 5 terms in Equation 27).
Regarding claim 8, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 7 as outlined above. Further, Goodsitt and Ryu teach that the method of claim 7, further comprising performing conditional generation, by the variational autoencoder, the performing of conditional generation comprising: 
receiving, by the encoder network, a sample of the first random variable (See Goodsitt: Figs. 2-3, and Col. 9 Lines 33-40, “At step 252, data-management system 102 receives a plurality of intercorrelated datasets, consistent with disclosed embodiments. In the example of FIG. 2B, individual datasets of the intercorrelated datasets are represented by boxes at step 252, including a dark-gray box, light-gray box, and a plurality of white boxes. Consistent with the present disclosure, intercorrelated datasets of step 252 may be referred to as training data used to train a parent model”); 
producing a mean and a variance of each of: the first latent variable and the second latent variable (See Goodsitt: Figs. 2-3, and Col. 10 Lines 4-12, “For example, a parent model may generate first latent-space data corresponding to a first intercorrelated dataset and second latent-space data corresponding to a second intercorrelated dataset, etc. In the illustration of FIG. 2B latent-space data corresponding to the plurality of interconnected datasets are represented by the dotted boxes of step 256, including a dark gray dotted and light gray dotted box corresponding to a dark gray box and light gray box depicted in step 252”; and Col. 7 Lines 14-19, “In embodiments consistent with the present disclosure, an intercorrelated dataset may have a data profile including a data schema and/or a statistical profile of a dataset. A statistical profile may include a statistical distribution, a noise factor, a moment (e.g., a mean), a variance, and/or any other statistical metric of a dataset”); 
receiving, by the decoder network (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”)”), a sample of each of: 
a distribution having the produced mean and the produced variance of the first latent variable (See Ryu: Fig. 4, and [0040], “If p.sub.θ(z)p.sub.θ(x|z) is given such that p.sub.θ(x)≈p.sub.data(x), then minimizing the objective would ensure that q.sub.ϕ.sub.x(z|x)≈p.sub.θ(x|z)”), 
a distribution having the produced mean and the produced variance of the second latent variable (See Ryu: Fig. 4, and [0040], “Thus, the full latent variable model p.sub.θ(z)p.sub.θ(x|z)p.sub.θ(y|z) can be trained with the assistance of a variational posterior q.sub.ϕ.sub.x,y(z|x,y) to ensure that p.sub.θ(x|z) and p.sub.θ(y|z)”), and 
a distribution having the mean and the variance of a prior distribution of the third latent variable (See Ryu: Fig. 4, and [0040], “are well-fitted to the joint distribution p.sub.data(x,y)”); and 
generating, by the decoder network, a generated sample of the second random variable (See Ryu: Figs. 10-13, and [0050], “Inference may be performed by conditional generation, such that given x.sup.(0) the system would conditionally generate Y. If q.sub.ϕ.sub.x(u.sub.x,,w|x) is already trained, then a sample (u.sub.x.sup.(0),w.sup.(0))˜q.sub.ϕ.sub.x(u.sub.x,,w|x.sup.(0)) can be taken and Y can be generated as Y˜p.sub.θ(y|u.sub.y.sup.(0),w.sup.(0)) where u.sub.y.sup.(0)˜p.sub.θ(u.sub.y)”; and claim 6, “The method of claim 4, wherein performing inference comprises conditionally generating the first variable from the second variable”).
Regarding claim 9, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 7 as outlined above. Further, Goodsitt and Zhong teach that the method of claim 7, further comprising performing joint generation, by the variational autoencoder, the performing of joint generation comprising: 
receiving, by the decoder network, a sample of each of (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”)”): 
the first latent variable (See Goodsitt: Figs. 2A-B, and Col. 8 Lines 32-38, “At step 208, data-management system 102 may train a plurality of child models to generate synthetic data based on latent-space data, consistent with disclosed embodiments. For example, in the illustration of FIG. 2A, synthetic data are represented by boxes with diagonal shading at step 208, and latent-space data are represented by the many-pointed stars labelled as latent-space data A and latent-space data B”), 
the second latent variable (See Goodsitt: Fig. 3, and Col. 12 Lines 4-15, “As an illustrative example of architecture 302, parent model 1 may be configured to generate latent-space-data comprising synthetic price data for a product (i.e., “supply data”). Parent model 2 may be configured to generate latent-space-data comprising synthetic income data associated with a plurality of consumers and social network data associated with the plurality of consumers (i.e., “demand data”). In the example, child models may correspond to the plurality of consumers. Child models may be configured to generate synthetic transaction data associated with their respective consumers based on supply data of parent 1 and demand data of parent 2”), and 
the third latent variable (See Zhong: Figs. 5A-B, and [0069], “The discriminator 504 can receive either the generated sample G(Z) along a path 508 or receive, along a path 510, a target Y”); and 
generating, by the decoder network (See Zhong: Figs. 5A-B, and [0092], “Elements of FIG. 5B having the same numerals as those of FIG. 5A can be exactly as described with respect to FIG. 5A. FIG. 5B includes a discriminator 552 that is similar to the discriminator 506 of FIG. 5A with the exception that the discriminator 506 also receives, along a path 554, the generator output G(E(G(Z))) and the discriminator 552 uses the objective function of equation (11)”): 
a generated sample of the first random variable, based on the first latent variable and the second latent variable (See Zhong: Figs. 5A-B, and [0092], “the generator output G(E(G(Z))) and the discriminator 552 uses the objective function of equation (11)”), 
a generated sample of the second random variable, based on the second latent variable and the third latent variable (See Zhong: Figs. 5A-B, and [0092], “When the discriminator 552 receives G(E(G(Z))), the discriminator 552 outputs a value D(G(E(G(Z)))) indicating whether the discriminator 552 determines the G(E(G(Z))) to be real or generated”).
Regarding claim 10, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 1 as outlined above. Further, Goodsitt, Zhong, and Ryu teach that a system (See Goodsitt: Figs. 2-3, and Col. 8 Lines 63-67 ~ Col. 9 Lines 1-5, “FIG. 2B illustrates a method 250 of training a parent model to generate intercorrelated synthetic data, consistent with disclosed embodiments. In some embodiments, data-management system 102 performs steps of process 200. It should be noted that other components of system 100, including, for example, client device 104 and/or third-party system 108 may perform operations of one or more steps of process 200. Process 250 may include training models according to architecture 300, architecture 302, architecture 304, and/or any other architecture consistent with disclosed embodiments”), comprising: 
a processing circuit (See Goodsitt: Fig. 1, and Col. 4 Lines 27-30, “Data-management system 102 may include at least one memory and one or more processors configured to perform operations consistent with disclosed embodiments”), and 
a neural network (See Goodsitt: Figs. 2A-B, and Col. 8 Lines 20-27, “As shown in the illustration of FIG. 2A, data-management system 102 may provide respective latent-space data to a child model A and a child model B. A child model may include a GAN model, a neural network model, a recurrent neural network (RNN) model, a convolutional neural network (CNN) model, a random forest model, an autoencoder model, a variational autoencoder model, and/or any other machine learning model”), 
the processing circuit being configured to train the neural network (See Goodsitt: Figs. 2-3, and Col. 8 Lines 62-64, “FIG. 2B illustrates a method 250 of training a parent model to generate intercorrelated synthetic data, consistent with disclosed embodiments”), wherein: 
the neural network comprises a variational autoencoder (See Goodsitt: Figs. 2-3, and Col. 8 Lines 20-31, “As shown in the illustration of FIG. 2A, data-management system 102 may provide respective latent-space data to a child model A and a child model B. A child model may include a GAN model, a neural network model, a recurrent neural network (RNN) model, a convolutional neural network (CNN) model, a random forest model, an autoencoder model, a variational autoencoder model, and/or any other machine learning model. A child model may include a synthetic data model (i.e., a model configured to generate synthetic data). As one of skill in the art will appreciate, step 206 may involve a different number of child models than the two depicted in FIG. 2A”), comprising: 
an encoder network (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”) configured: 
to receive a sample of a first random variable (See Goodsitt: Figs. 2-3, and Col. 9 Lines 33-40, “At step 252, data-management system 102 receives a plurality of intercorrelated datasets, consistent with disclosed embodiments. In the example of FIG. 2B, individual datasets of the intercorrelated datasets are represented by boxes at step 252, including a dark-gray box, light-gray box, and a plurality of white boxes. Consistent with the present disclosure, intercorrelated datasets of step 252 may be referred to as training data used to train a parent model”), and to produce a mean and a variance of each of: a first latent variable and a second latent variable (See Goodsitt: Figs. 2-3, and Col. 10 Lines 4-12, “For example, a parent model may generate first latent-space data corresponding to a first intercorrelated dataset and second latent-space data corresponding to a second intercorrelated dataset, etc. In the illustration of FIG. 2B latent-space data corresponding to the plurality of interconnected datasets are represented by the dotted boxes of step 256, including a dark gray dotted and light gray dotted box corresponding to a dark gray box and light gray box depicted in step 252”; and Col. 7 Lines 14-19, “In embodiments consistent with the present disclosure, an intercorrelated dataset may have a data profile including a data schema and/or a statistical profile of a dataset. A statistical profile may include a statistical distribution, a noise factor, a moment (e.g., a mean), a variance, and/or any other statistical metric of a dataset”), or 
to receive a sample of a second random variable (See Goodsitt: Figs. 2-3, and Col. 9 Lines 57-61, “At step 256, a parent model may generate latent-space data, consistent with disclosed embodiments. Consistent with the present disclosure, latent-space data may refer to any data output by a parent model, and latent-space data may be in a different format from an intercorrelated dataset"), and to produce a mean and a variance of each of: the second latent variable and a third latent variable (See Goodsitt: Figs. 2-3, and Col. 10 Lines 13-29, “At step 258, data-management system 102 may provide latent-space data to a plurality of child models, consistent with disclosed embodiments. A child model may include a child model trained according to process 200. In the example of FIG. 2B, data-management system 102 may provide first latent-space data corresponding to a first intercorrelated dataset (dark gray box with dots) to child model A, and data-management system 102 may provide second latent-space data corresponding to a second intercorrelated dataset (dark gray box with dots) to child model B. As one of skill in the art will appreciate, step 258 may include providing latent-space data to a different number of child models than the two depicted in FIG. 2B. In some embodiments, the latent-space data provided to one or more child models partially or wholly overlaps (i.e., shares some or all data elements)”); and 
a decoder network (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”)”) configured:
to receive a sample of the first latent variable and a sample of the second latent variable, and to generate a generated sample of the first random variable (See Goodsitt: Figs. 2A-B, and Col. 8 Lines 32-38, “At step 208, data-management system 102 may train a plurality of child models to generate synthetic data based on latent-space data, consistent with disclosed embodiments. For example, in the illustration of FIG. 2A, synthetic data are represented by boxes with diagonal shading at step 208, and latent-space data are represented by the many-pointed stars labelled as latent-space data A and latent-space data B”), or 
to receive a sample of the second latent variable and a sample of the third latent variable, and to generate a generated sample of the second random variable (See Goodsitt: Fig. 3, and Col. 12 Lines 4-15, “As an illustrative example of architecture 302, parent model 1 may be configured to generate latent-space-data comprising synthetic price data for a product (i.e., “supply data”). Parent model 2 may be configured to generate latent-space-data comprising synthetic income data associated with a plurality of consumers and social network data associated with the plurality of consumers (i.e., “demand data”). In the example, child models may correspond to the plurality of consumers. Child models may be configured to generate synthetic transaction data associated with their respective consumers based on supply data of parent 1 and demand data of parent 2”), 
the training of the neural network comprising training the variational autoencoder with (See Goodsitt: Fig. 1, and Col. 5 Lines 12-28, “Third-party system 108 may provide data to data-management system. For example, third-party system 108 may provide training data to data-management system 102 and/or a machine learning model, consistent with disclosed embodiments. As an example, third-party system 108 may transmit time series data, music data in an audio format, musical composition data, financial data, demographic data, health data, environmental data, education data, governmental data, and/or any other kind of data. In some embodiments, third-party system 108 provides data to data-management system via a subscription, a feed, a socket, or the like. In some embodiments, third-party system 108 sends a request to third-party system to retrieve data. In some embodiments, third-party system 108 sends a request for correlated synthetic data and/or one or more models configured to generate correlated synthetic data to data-management system”): 
a plurality of samples of the first random variable (See Goodsitt: Figs. 2-3, and Col. 6 Lines 25-32, “For example, in some embodiments, process 200 may include child model output that may include a column of data related to states (state data). Another child model output may include a data column related to cities (city data). A parent model may be trained to reproduce correlations between state data and city data. A parent model output may include a vector of floating-point numbers, for example, which may be passed as input to the child models (i.e., latent space data). In the example, the input to the parent model may also be a vector of floating-point numbers”); and 
a plurality of samples of the second random variable (See Goodsitt: Figs. 2-3, and Col. 6 Lines 25-32, “For example, in some embodiments, process 200 may include child model output that may include a column of data related to states (state data). Another child model output may include a data column related to cities (city data). A parent model may be trained to reproduce correlations between state data and city data. A parent model output may include a vector of floating-point numbers, for example, which may be passed as input to the child models (i.e., latent space data). In the example, the input to the parent model may also be a vector of floating-point numbers”), 
the plurality of samples of the first random variable and the plurality of samples of the second random variable being unpaired (See Goodsitt: Figs. 2A-B, and Col. 8 Lines 57-61, “A similarity metric may be based on a correlation, covariance matrix, a variance, a frequency of overlapping values, or other measure of statistical similarity. Training may include hyperparameter tuning. Training may be supervised or unsupervised”; and Claim 13, “generating, using the second parent model, third latent-space data and fourth latent-space data based on second input data, the second input data at least partially overlapping with the first input data”. Note that unsupervised training may be mapped to unpaired data), 
the training of the neural network comprising updating weights in the neural network based on a first loss function (See Goodsitt: Figs. 1-2, and Col. 8 Lines 47-57, “In some embodiments, training of a child model may terminate when a performance criterion (i.e., training criterion) is satisfied. A training criterion may include a number of epochs, a training time, a performance metric (e.g., an estimate of accuracy in reproducing test data), or the like. Data-management system 102 may be configured to adjust model parameters during training. Model parameters may include weights, coefficients, offsets, or the like. A training criterion may be based on a similarity metric representing a measure of similarity between a synthetic dataset and an original dataset”), the first loss function being based on a measure of deviation from consistency between (See Zhong: Figs. 5A-B, and [0077], “As already mentioned above, the outputs E(G(Z)) and E(G(E(G(Z))) of the encoder 506 are used by the generator 502 as part of the loss function of the generator 502, as shown in equation (7) and as illustrated by the dashed line 518. As also already mentioned, the encoder 506 is trained to minimize the absolute difference between E(G(Z)) and E(G(E(G(Z))))”): 
a conditional generation path from the first random variable to the second random variable See Zhong: Figs. 5A-B, and [0070], “The encoder 506 receives as input, along a path 512, the generated sample, G(Z). When the encoder 506 receives G(Z), the encoder 506 outputs a value E(G(Z)). E(G(Z)) is the latent space representation of the ambient space representation G(Z) of the noise Z”), and 
a conditional generation path from the second random variable to the first random variable (See Ryu: Figs. 10-13, and [0050], “Inference may be performed by conditional generation, such that given x.sup.(0) the system would conditionally generate Y. If q.sub.ϕ.sub.x(u.sub.x,,w|x) is already trained, then a sample (u.sub.x.sup.(0),w.sup.(0))˜q.sub.ϕ.sub.x(u.sub.x,,w|x.sup.(0)) can be taken and Y can be generated as Y˜p.sub.θ(y|u.sub.y.sup.(0),w.sup.(0)) where u.sub.y.sup.(0)˜p.sub.θ(u.sub.y)”; and claim 6, “The method of claim 4, wherein performing inference comprises conditionally generating the first variable from the second variable”. Note that switching conditional x to y to conditional y to x is not novel).
Regarding claim 11, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 10 as outlined above. Further, Ryu teaches that the system of claim 10, wherein the first loss function includes: 
a first term representing reconstruction loss of the first random variable; 
a second term representing deviations from consistency in the second latent variable; 
a third term representing deviations from consistency in the first latent variable; and 
a fourth term representing deviations from consistency in the third latent variable (See Ryu: Fig. 7, and [0065], “The reconstruction loss 722 is combined with the regularization loss 708 and the regularization loss for common information extraction 710 to determine the loss function 724 as in Equation (27)”), and the screen shot of the equation 27 below:

    PNG
    media_image1.png
    414
    526
    media_image1.png
    Greyscale

Regarding claim 12, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 10 as outlined above. Further, Goodsitt and Ryu teach that the system of claim 10, wherein the processing circuit is configured to cause the variational autoencoder to perform conditional generation, the performing of conditional generation comprising: 
receiving, by the encoder network, a sample of the first random variable (See Goodsitt: Figs. 2-3, and Col. 9 Lines 33-40, “At step 252, data-management system 102 receives a plurality of intercorrelated datasets, consistent with disclosed embodiments. In the example of FIG. 2B, individual datasets of the intercorrelated datasets are represented by boxes at step 252, including a dark-gray box, light-gray box, and a plurality of white boxes. Consistent with the present disclosure, intercorrelated datasets of step 252 may be referred to as training data used to train a parent model”); 
producing a mean and a variance of each of: the first latent variable and the second latent variable (See Goodsitt: Figs. 2-3, and Col. 10 Lines 4-12, “For example, a parent model may generate first latent-space data corresponding to a first intercorrelated dataset and second latent-space data corresponding to a second intercorrelated dataset, etc. In the illustration of FIG. 2B latent-space data corresponding to the plurality of interconnected datasets are represented by the dotted boxes of step 256, including a dark gray dotted and light gray dotted box corresponding to a dark gray box and light gray box depicted in step 252”; and Col. 7 Lines 14-19, “In embodiments consistent with the present disclosure, an intercorrelated dataset may have a data profile including a data schema and/or a statistical profile of a dataset. A statistical profile may include a statistical distribution, a noise factor, a moment (e.g., a mean), a variance, and/or any other statistical metric of a dataset”); 
receiving, by the decoder network (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”)”), a sample of each of: 
a distribution having the produced mean and the produced variance of the first latent variable (See Ryu: Fig. 4, and [0040], “If p.sub.θ(z)p.sub.θ(x|z) is given such that p.sub.θ(x)≈p.sub.data(x), then minimizing the objective would ensure that q.sub.ϕ.sub.x(z|x)≈p.sub.θ(x|z)”), 
a distribution having the produced mean and the produced variance of the second latent variable (See Ryu: Fig. 4, and [0040], “Thus, the full latent variable model p.sub.θ(z)p.sub.θ(x|z)p.sub.θ(y|z) can be trained with the assistance of a variational posterior q.sub.ϕ.sub.x,y(z|x,y) to ensure that p.sub.θ(x|z) and p.sub.θ(y|z)”), and 
a distribution having the mean and the variance of a prior distribution of the third latent variable (See Ryu: Fig. 4, and [0040], “are well-fitted to the joint distribution p.sub.data(x,y)”); and 
generating, by the decoder network, a generated sample of the second random variable (See Ryu: Figs. 10-13, and [0050], “Inference may be performed by conditional generation, such that given x.sup.(0) the system would conditionally generate Y. If q.sub.ϕ.sub.x(u.sub.x,,w|x) is already trained, then a sample (u.sub.x.sup.(0),w.sup.(0))˜q.sub.ϕ.sub.x(u.sub.x,,w|x.sup.(0)) can be taken and Y can be generated as Y˜p.sub.θ(y|u.sub.y.sup.(0),w.sup.(0)) where u.sub.y.sup.(0)˜p.sub.θ(u.sub.y)”; and claim 6, “The method of claim 4, wherein performing inference comprises conditionally generating the first variable from the second variable”).
Regarding claim 13, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 10 as outlined above. Further, Goodsitt and Zhong teach that the system of claim 10, wherein the processing circuit is configured to cause the variational autoencoder to perform joint generation, the performing of joint generation comprising:
receiving, by the decoder network (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”)”), a sample of each of:
the first latent variable (See Goodsitt: Figs. 2A-B, and Col. 8 Lines 32-38, “At step 208, data-management system 102 may train a plurality of child models to generate synthetic data based on latent-space data, consistent with disclosed embodiments. For example, in the illustration of FIG. 2A, synthetic data are represented by boxes with diagonal shading at step 208, and latent-space data are represented by the many-pointed stars labelled as latent-space data A and latent-space data B”), 
the second latent variable (See Goodsitt: Fig. 3, and Col. 12 Lines 4-15, “As an illustrative example of architecture 302, parent model 1 may be configured to generate latent-space-data comprising synthetic price data for a product (i.e., “supply data”). Parent model 2 may be configured to generate latent-space-data comprising synthetic income data associated with a plurality of consumers and social network data associated with the plurality of consumers (i.e., “demand data”). In the example, child models may correspond to the plurality of consumers. Child models may be configured to generate synthetic transaction data associated with their respective consumers based on supply data of parent 1 and demand data of parent 2”), and 
the third latent variable (See Zhong: Figs. 5A-B, and [0069], “The discriminator 504 can receive either the generated sample G(Z) along a path 508 or receive, along a path 510, a target Y”); and 
generating, by the decoder network See Zhong: Figs. 5A-B, and [0092], “Elements of FIG. 5B having the same numerals as those of FIG. 5A can be exactly as described with respect to FIG. 5A. FIG. 5B includes a discriminator 552 that is similar to the discriminator 506 of FIG. 5A with the exception that the discriminator 506 also receives, along a path 554, the generator output G(E(G(Z))) and the discriminator 552 uses the objective function of equation (11)”): 
a generated sample of the first random variable, based on the first latent variable and the second latent variable (See Zhong: Figs. 5A-B, and [0092], “the generator output G(E(G(Z))) and the discriminator 552 uses the objective function of equation (11)”),
a generated sample of the second random variable, based on the second latent variable and the third latent variable (See Zhong: Figs. 5A-B, and [0092], “When the discriminator 552 receives G(E(G(Z))), the discriminator 552 outputs a value D(G(E(G(Z)))) indicating whether the discriminator 552 determines the G(E(G(Z))) to be real or generated”).
Regarding claim 15, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 14 as outlined above. Further, Ryu teaches that the system of claim 14, wherein the training comprises updating weights in the neural network based on a first loss function, the first loss function including:
a first term representing reconstruction loss of the first random variable;
a second term representing deviations from consistency in the second latent variable:
a third term representing deviations from consistency in the first latent variable; and
a fourth term representing deviations from consistency in the third latent variable (See Ryu: Fig. 7, and [0065], “The reconstruction loss 722 is combined with the regularization loss 708 and the regularization loss for common information extraction 710 to determine the loss function 724 as in Equation (27)”), and the screen shot of the equation 27 below:

    PNG
    media_image1.png
    414
    526
    media_image1.png
    Greyscale

Regarding claim 16, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 15 as outlined above. Further, Ryu teaches that the system of claim 15, wherein the first loss function further includes a term based on the discriminative neural network (See Ryu: Fig. 7, and [0065], “The reconstruction loss 722 is combined with the regularization loss 708 and the regularization loss for common information extraction 710 to determine the loss function 724 as in Equation (27)”, Note there are 5 terms in Equation 27).
Regarding claim 17, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 16 as outlined above. Further, Goodsitt and Ryu teach that the system of claim 16, wherein the processing circuit is configured to cause the variational autoencoder to perform conditional generation, the performing of conditional generation comprising:
receiving, by the encoder network, a sample of the first random variable (See Goodsitt: Figs. 2-3, and Col. 9 Lines 33-40, “At step 252, data-management system 102 receives a plurality of intercorrelated datasets, consistent with disclosed embodiments. In the example of FIG. 2B, individual datasets of the intercorrelated datasets are represented by boxes at step 252, including a dark-gray box, light-gray box, and a plurality of white boxes. Consistent with the present disclosure, intercorrelated datasets of step 252 may be referred to as training data used to train a parent model”);
producing a mean and a variance of each of: the first latent variable and the second latent variable (See Goodsitt: Figs. 2-3, and Col. 10 Lines 4-12, “For example, a parent model may generate first latent-space data corresponding to a first intercorrelated dataset and second latent-space data corresponding to a second intercorrelated dataset, etc. In the illustration of FIG. 2B latent-space data corresponding to the plurality of interconnected datasets are represented by the dotted boxes of step 256, including a dark gray dotted and light gray dotted box corresponding to a dark gray box and light gray box depicted in step 252”; and Col. 7 Lines 14-19, “In embodiments consistent with the present disclosure, an intercorrelated dataset may have a data profile including a data schema and/or a statistical profile of a dataset. A statistical profile may include a statistical distribution, a noise factor, a moment (e.g., a mean), a variance, and/or any other statistical metric of a dataset”);
receiving, by the decoder network (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”)”), a sample of each of:
a distribution having the produced mean and the produced variance of the first latent variable (See Ryu: Fig. 4, and [0040], “If p.sub.θ(z)p.sub.θ(x|z) is given such that p.sub.θ(x)≈p.sub.data(x), then minimizing the objective would ensure that q.sub.ϕ.sub.x(z|x)≈p.sub.θ(x|z)”), 
a distribution having the produced mean and the produced variance of the second latent variable (See Ryu: Fig. 4, and [0040], “Thus, the full latent variable model p.sub.θ(z)p.sub.θ(x|z)p.sub.θ(y|z) can be trained with the assistance of a variational posterior q.sub.ϕ.sub.x,y(z|x,y) to ensure that p.sub.θ(x|z) and p.sub.θ(y|z)”), and 
a distribution having the mean and the variance of a prior distribution of the third latent variable (See Ryu: Fig. 4, and [0040], “are well-fitted to the joint distribution p.sub.data(x,y)”); and 
generating, by the decoder network, a generated sample of the second random variable (See Ryu: Figs. 10-13, and [0050], “Inference may be performed by conditional generation, such that given x.sup.(0) the system would conditionally generate Y. If q.sub.ϕ.sub.x(u.sub.x,,w|x) is already trained, then a sample (u.sub.x.sup.(0),w.sup.(0))˜q.sub.ϕ.sub.x(u.sub.x,,w|x.sup.(0)) can be taken and Y can be generated as Y˜p.sub.θ(y|u.sub.y.sup.(0),w.sup.(0)) where u.sub.y.sup.(0)˜p.sub.θ(u.sub.y)”; and claim 6, “The method of claim 4, wherein performing inference comprises conditionally generating the first variable from the second variable”).
Regarding claim 18, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 16 as outlined above. Further, Goodsitt and Zhong teach that the system of claim 16, wherein the processing circuit is configured to cause the variational autoencoder to perform joint generation, the performing of joint generation comprising: 
receiving, by the decoder network, a sample of each of (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”)”): 
the first latent variable (See Goodsitt: Figs. 2A-B, and Col. 8 Lines 32-38, “At step 208, data-management system 102 may train a plurality of child models to generate synthetic data based on latent-space data, consistent with disclosed embodiments. For example, in the illustration of FIG. 2A, synthetic data are represented by boxes with diagonal shading at step 208, and latent-space data are represented by the many-pointed stars labelled as latent-space data A and latent-space data B”), 
the second latent variable (See Goodsitt: Fig. 3, and Col. 12 Lines 4-15, “As an illustrative example of architecture 302, parent model 1 may be configured to generate latent-space-data comprising synthetic price data for a product (i.e., “supply data”). Parent model 2 may be configured to generate latent-space-data comprising synthetic income data associated with a plurality of consumers and social network data associated with the plurality of consumers (i.e., “demand data”). In the example, child models may correspond to the plurality of consumers. Child models may be configured to generate synthetic transaction data associated with their respective consumers based on supply data of parent 1 and demand data of parent 2”), and 
the third latent variable (See Zhong: Figs. 5A-B, and [0069], “The discriminator 504 can receive either the generated sample G(Z) along a path 508 or receive, along a path 510, a target Y”); and 
generating, by the decoder network See Zhong: Figs. 5A-B, and [0092], “Elements of FIG. 5B having the same numerals as those of FIG. 5A can be exactly as described with respect to FIG. 5A. FIG. 5B includes a discriminator 552 that is similar to the discriminator 506 of FIG. 5A with the exception that the discriminator 506 also receives, along a path 554, the generator output G(E(G(Z))) and the discriminator 552 uses the objective function of equation (11)”):
a generated sample of the first random variable, based on the first latent variable and the second latent variable (See Zhong: Figs. 5A-B, and [0092], “the generator output G(E(G(Z))) and the discriminator 552 uses the objective function of equation (11)”),
a generated sample of the second random variable, based on the second latent variable and the third latent variable (See Zhong: Figs. 5A-B, and [0092], “When the discriminator 552 receives G(E(G(Z))), the discriminator 552 outputs a value D(G(E(G(Z)))) indicating whether the discriminator 552 determines the G(E(G(Z))) to be real or generated”).
Regarding claim 19, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 1 as outlined above. Further, Goodsitt, Zhong, and Ryu teach that a system (See Goodsitt: Figs. 2-3, and Col. 8 Lines 63-67 ~ Col. 9 Lines 1-5, “FIG. 2B illustrates a method 250 of training a parent model to generate intercorrelated synthetic data, consistent with disclosed embodiments. In some embodiments, data-management system 102 performs steps of process 200. It should be noted that other components of system 100, including, for example, client device 104 and/or third-party system 108 may perform operations of one or more steps of process 200. Process 250 may include training models according to architecture 300, architecture 302, architecture 304, and/or any other architecture consistent with disclosed embodiments”), comprising:
means for processing (See Goodsitt: Fig. 1, and Col. 4 Lines 27-30, “Data-management system 102 may include at least one memory and one or more processors configured to perform operations consistent with disclosed embodiments”), and 
a neural network (See Goodsitt: Figs. 2A-B, and Col. 8 Lines 20-27, “As shown in the illustration of FIG. 2A, data-management system 102 may provide respective latent-space data to a child model A and a child model B. A child model may include a GAN model, a neural network model, a recurrent neural network (RNN) model, a convolutional neural network (CNN) model, a random forest model, an autoencoder model, a variational autoencoder model, and/or any other machine learning model”),
the means for processing being configured to train the neural network (See Goodsitt: Figs. 2-3, and Col. 8 Lines 62-64, “FIG. 2B illustrates a method 250 of training a parent model to generate intercorrelated synthetic data, consistent with disclosed embodiments”), wherein: 
the neural network comprises a variational autoencoder (See Goodsitt: Figs. 2-3, and Col. 8 Lines 20-31, “As shown in the illustration of FIG. 2A, data-management system 102 may provide respective latent-space data to a child model A and a child model B. A child model may include a GAN model, a neural network model, a recurrent neural network (RNN) model, a convolutional neural network (CNN) model, a random forest model, an autoencoder model, a variational autoencoder model, and/or any other machine learning model. A child model may include a synthetic data model (i.e., a model configured to generate synthetic data). As one of skill in the art will appreciate, step 206 may involve a different number of child models than the two depicted in FIG. 2A”), comprising: 
an encoder network (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”) configured: 
to receive a sample of a first random variable (See Goodsitt: Figs. 2-3, and Col. 9 Lines 33-40, “At step 252, data-management system 102 receives a plurality of intercorrelated datasets, consistent with disclosed embodiments. In the example of FIG. 2B, individual datasets of the intercorrelated datasets are represented by boxes at step 252, including a dark-gray box, light-gray box, and a plurality of white boxes. Consistent with the present disclosure, intercorrelated datasets of step 252 may be referred to as training data used to train a parent model”), and to produce a mean and a variance of each of: a first latent variable and a second latent variable (See Goodsitt: Figs. 2-3, and Col. 10 Lines 4-12, “For example, a parent model may generate first latent-space data corresponding to a first intercorrelated dataset and second latent-space data corresponding to a second intercorrelated dataset, etc. In the illustration of FIG. 2B latent-space data corresponding to the plurality of interconnected datasets are represented by the dotted boxes of step 256, including a dark gray dotted and light gray dotted box corresponding to a dark gray box and light gray box depicted in step 252”; and Col. 7 Lines 14-19, “In embodiments consistent with the present disclosure, an intercorrelated dataset may have a data profile including a data schema and/or a statistical profile of a dataset. A statistical profile may include a statistical distribution, a noise factor, a moment (e.g., a mean), a variance, and/or any other statistical metric of a dataset”), or 
to receive a sample of a second random variable (See Goodsitt: Figs. 2-3, and Col. 9 Lines 57-61, “At step 256, a parent model may generate latent-space data, consistent with disclosed embodiments. Consistent with the present disclosure, latent-space data may refer to any data output by a parent model, and latent-space data may be in a different format from an intercorrelated dataset"), and to produce a mean and a variance of each of: the second latent variable and a third latent variable (See Goodsitt: Figs. 2-3, and Col. 10 Lines 13-29, “At step 258, data-management system 102 may provide latent-space data to a plurality of child models, consistent with disclosed embodiments. A child model may include a child model trained according to process 200. In the example of FIG. 2B, data-management system 102 may provide first latent-space data corresponding to a first intercorrelated dataset (dark gray box with dots) to child model A, and data-management system 102 may provide second latent-space data corresponding to a second intercorrelated dataset (dark gray box with dots) to child model B. As one of skill in the art will appreciate, step 258 may include providing latent-space data to a different number of child models than the two depicted in FIG. 2B. In some embodiments, the latent-space data provided to one or more child models partially or wholly overlaps (i.e., shares some or all data elements)”); and 
a decoder network (See Goodsitt: Fig. 4, and Col. 16 Lines 59-65, “For example, data generator 437 may transform words and/or phrases into numbers by applying a lexicon, a parser, and a grammar rule system. In some embodiments, data generator 437 may be configured to receive, train, and/or implement an autoencoder model or components of an autoencoder model (e.g., an encoder model or a decoder model)”)”) configured: 
to receive a sample of the first latent variable and a sample of the second latent variable, and to generate a generated sample of the first random variable (See Goodsitt: Figs. 2A-B, and Col. 8 Lines 32-38, “At step 208, data-management system 102 may train a plurality of child models to generate synthetic data based on latent-space data, consistent with disclosed embodiments. For example, in the illustration of FIG. 2A, synthetic data are represented by boxes with diagonal shading at step 208, and latent-space data are represented by the many-pointed stars labelled as latent-space data A and latent-space data B”), or 
to receive a sample of the second latent variable and a sample of the third latent variable, and to generate a generated sample of the second random variable (See Goodsitt: Fig. 3, and Col. 12 Lines 4-15, “As an illustrative example of architecture 302, parent model 1 may be configured to generate latent-space-data comprising synthetic price data for a product (i.e., “supply data”). Parent model 2 may be configured to generate latent-space-data comprising synthetic income data associated with a plurality of consumers and social network data associated with the plurality of consumers (i.e., “demand data”). In the example, child models may correspond to the plurality of consumers. Child models may be configured to generate synthetic transaction data associated with their respective consumers based on supply data of parent 1 and demand data of parent 2”), 
the training of the neural network comprising training the variational autoencoder with (See Goodsitt: Fig. 1, and Col. 5 Lines 12-28, “Third-party system 108 may provide data to data-management system. For example, third-party system 108 may provide training data to data-management system 102 and/or a machine learning model, consistent with disclosed embodiments. As an example, third-party system 108 may transmit time series data, music data in an audio format, musical composition data, financial data, demographic data, health data, environmental data, education data, governmental data, and/or any other kind of data. In some embodiments, third-party system 108 provides data to data-management system via a subscription, a feed, a socket, or the like. In some embodiments, third-party system 108 sends a request to third-party system to retrieve data. In some embodiments, third-party system 108 sends a request for correlated synthetic data and/or one or more models configured to generate correlated synthetic data to data-management system”): 
a plurality of samples of the first random variable (See Goodsitt: Figs. 2-3, and Col. 6 Lines 25-32, “For example, in some embodiments, process 200 may include child model output that may include a column of data related to states (state data). Another child model output may include a data column related to cities (city data). A parent model may be trained to reproduce correlations between state data and city data. A parent model output may include a vector of floating-point numbers, for example, which may be passed as input to the child models (i.e., latent space data). In the example, the input to the parent model may also be a vector of floating-point numbers”); and 
a plurality of samples of the second random variable (See Goodsitt: Figs. 2-3, and Col. 6 Lines 25-32, “For example, in some embodiments, process 200 may include child model output that may include a column of data related to states (state data). Another child model output may include a data column related to cities (city data). A parent model may be trained to reproduce correlations between state data and city data. A parent model output may include a vector of floating-point numbers, for example, which may be passed as input to the child models (i.e., latent space data). In the example, the input to the parent model may also be a vector of floating-point numbers”), 
the plurality of samples of the first random variable and the plurality of samples of the second random variable being unpaired (See Goodsitt: Figs. 2A-B, and Col. 8 Lines 57-61, “A similarity metric may be based on a correlation, covariance matrix, a variance, a frequency of overlapping values, or other measure of statistical similarity. Training may include hyperparameter tuning. Training may be supervised or unsupervised”; and Claim 13, “generating, using the second parent model, third latent-space data and fourth latent-space data based on second input data, the second input data at least partially overlapping with the first input data”. Note that unsupervised training may be mapped to unpaired data), 
the training of the neural network comprising updating weights in the neural network based on a first loss function (See Goodsitt: Figs. 1-2, and Col. 8 Lines 47-57, “In some embodiments, training of a child model may terminate when a performance criterion (i.e., training criterion) is satisfied. A training criterion may include a number of epochs, a training time, a performance metric (e.g., an estimate of accuracy in reproducing test data), or the like. Data-management system 102 may be configured to adjust model parameters during training. Model parameters may include weights, coefficients, offsets, or the like. A training criterion may be based on a similarity metric representing a measure of similarity between a synthetic dataset and an original dataset”), the first loss function being based on a measure of deviation from consistency (See Zhong: Figs. 5A-B, and [0077], “As already mentioned above, the outputs E(G(Z)) and E(G(E(G(Z))) of the encoder 506 are used by the generator 502 as part of the loss function of the generator 502, as shown in equation (7) and as illustrated by the dashed line 518. As also already mentioned, the encoder 506 is trained to minimize the absolute difference between E(G(Z)) and E(G(E(G(Z))))”) between: 
a conditional generation path from the first random variable to the second random variable See Zhong: Figs. 5A-B, and [0070], “The encoder 506 receives as input, along a path 512, the generated sample, G(Z). When the encoder 506 receives G(Z), the encoder 506 outputs a value E(G(Z)). E(G(Z)) is the latent space representation of the ambient space representation G(Z) of the noise Z”), and 
a conditional generation path (See Ryu: Figs. 10-13, and [0050], “Inference may be performed by conditional generation, such that given x.sup.(0) the system would conditionally generate Y. If q.sub.ϕ.sub.x(u.sub.x,,w|x) is already trained, then a sample (u.sub.x.sup.(0),w.sup.(0))˜q.sub.ϕ.sub.x(u.sub.x,,w|x.sup.(0)) can be taken and Y can be generated as Y˜p.sub.θ(y|u.sub.y.sup.(0),w.sup.(0)) where u.sub.y.sup.(0)˜p.sub.θ(u.sub.y)”; and claim 6, “The method of claim 4, wherein performing inference comprises conditionally generating the first variable from the second variable”. Note that switching conditional x to y to conditional y to x is not novel) from the second random variable to the first random variable (See Zhong: Figs. 5A-B, and [0071], “E(G(Z)) is fed back into the generator 502, as shown by a path 514. When the generator 502 receive E(G(Z)), the generator 502 outputs G(E(G(Z))), which is the ambient space representation of latent space representation E(G(Z)). G(E(G(Z))) is fed, along a path 516, to the encoder 506, which then outputs the latent space representation E(G(E(G(Z))))”).
Regarding claim 20, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 19 as outlined above. Further, Ryu teaches that the system of claim 19, wherein the first loss function includes: 
a first term representing reconstruction loss of the first random variable; 
a second term representing deviations from consistency in the second latent variable; 
a third term representing deviations from consistency in the first latent variable; and 
a fourth term representing deviations from consistency in the third latent variable (See Ryu: Fig. 7, and [0065], “The reconstruction loss 722 is combined with the regularization loss 708 and the regularization loss for common information extraction 710 to determine the loss function 724 as in Equation (27)”), and the screen shot of the equation 27 below:

    PNG
    media_image1.png
    414
    526
    media_image1.png
    Greyscale




Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Goodsitt, etc. (US 11030526 B1) in view of Zhong, etc. (US 20200349447 A1), further in view of Ryu, etc. (US 20200134499 A1) and Oono, etc. (US 20190018933 A1).
Regarding claim 5, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 1 as outlined above. Further, Zhong teaches that the method of claim 1, wherein the neural network further comprises a discriminative neural network, and the training of the neural network further comprises updating weights in the discriminative neural network (See Zhong: Fig. 6, and [0104], “Using the second output E(G(Z)) and fourth output E(G(E(G(Z)))) to constrain the training of the generator G can include using the second output E(G(Z)) and fourth output E(G(E(G(Z)))) in a loss function that is used to update weights of the generator G”) based on a second loss function, the second loss function comprising an f-divergence.
However, Goodsitt, modified by Zhong and Ryu, fails to explicitly disclose that based on a second loss function, the second loss function comprising an f-divergence.
However, Oono teaches that based on a second loss function, the second loss function comprising an f-divergence (See Oono: Fig. 6, and [0099], “In various embodiments, the multimodal DBMs or sub-modules thereof described herein are trained using approximate learning methods, for example by using a variational approach. Mean-field inference may be used to estimate data-dependent expectations. Markov Chain Monte Carlo (MCMC) based stochastic approximation procedures may be used to approximate a model's expected statistics. Without being bound by theory, to minimize the distance between an estimated probability distribution and the prior distribution of the ground truth or the distance between an approximate distribution for the hidden units and the posterior, the training method may optimize, e.g. minimize, the Kullback Leibler Divergence (KL-Divergence), often in an iterative process. A variational lower bound for the log likelihood of the model parameters may be maximized by minimizing the KL-Divergence. KL-Divergence between to distributions P1(x) and P2(x) may be denoted by D (P1(x)∥P2(x)) and given by”).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Goodsitt to have that based on a second loss function, the second loss function comprising an f-divergence as taught by Oono in order to ensure simple and efficient operation of the computer system (See Oono: Fig. 6, and [0100], “KL-Divergence may be minimized by reducing the difference between the prior distribution and the reconstruction distribution or the difference between the posterior distribution and the modeled approximation thereof, such as by using a variational Bayes EM algorithm. The multimodal DBMs or sub-modules thereof may be cycled through layers, updating the mean-field parameters within each individual layer”). Goodsitt teaches a method and system that may terminate the neural network parent model training based on test correlation metric and adjusting model parameters; while Oono teaches a system and method that may use the loss function with KL divergence to train the neural network for the multimodal generative model. Therefore, it is obvious to one of ordinary skill in the art to modify Goodsitt by Oono to train the neural network with loss functions having KL terms in order to be efficient in the operation of the computer system. The motivation to modify Goodsitt by Oono is “Use of known technique to improve similar devices (methods, or products) in the same way”.
Regarding claim 14, Goodsitt, Zhong, and Ryu teach all the features with respect to claim 10 as outlined above. Further, Zhong and Oono teach that the system of claim 10, wherein the neural network further comprises a discriminative neural network, and the training of the neural network further comprises updating weights in the discriminative neural network (See Zhong: Fig. 6, and [0104], “Using the second output E(G(Z)) and fourth output E(G(E(G(Z)))) to constrain the training of the generator G can include using the second output E(G(Z)) and fourth output E(G(E(G(Z)))) in a loss function that is used to update weights of the generator G”)  based on a second loss function, the second loss function comprising an f-divergence (See Oono: Fig. 6, and [0099], “In various embodiments, the multimodal DBMs or sub-modules thereof described herein are trained using approximate learning methods, for example by using a variational approach. Mean-field inference may be used to estimate data-dependent expectations. Markov Chain Monte Carlo (MCMC) based stochastic approximation procedures may be used to approximate a model's expected statistics. Without being bound by theory, to minimize the distance between an estimated probability distribution and the prior distribution of the ground truth or the distance between an approximate distribution for the hidden units and the posterior, the training method may optimize, e.g. minimize, the Kullback Leibler Divergence (KL-Divergence), often in an iterative process. A variational lower bound for the log likelihood of the model parameters may be maximized by minimizing the KL-Divergence. KL-Divergence between to distributions P1(x) and P2(x) may be denoted by D (P1(x)∥P2(x)) and given by”).







Conclusion


Any inquiry concerning this communication or earlier communications from the examiner should be directed to GORDON G LIU whose telephone number is (571)270-0382. The examiner can normally be reached Monday - Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GORDON G LIU/Primary Examiner, Art Unit 2612