DETAILED ACTION
This action is in response to the submission filed 16 June 2022 for application 16/562,972. Currently claims 1-7 are canceled. Claims 21-26 are new. Claims 8-26 are pending and have been examined.

Election/Restrictions
Applicant’s election without traverse of claims 8-26 in the reply filed on 16 June 2022 is acknowledged.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
Information disclosure statements (IDS) were submitted on 16 March 2021, 10 February 2021, 15 October 2019, and 6 September 2019. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 9-11, 16-18, and 21-26 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “similar” in claims 9, 11, 16, 18, 22, and 24 is a relative term which renders the claims indefinite. The term “similar” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. Hence, claims 9-11, 16-18, and 22-24 are rejected for the same reason because claims 10, 17, and 23 are dependent on independent claims 9, 15, and 21 respectively.
Claim 21 recites the limitation "the memory" in line 2.  There is insufficient antecedent basis for this limitation in the claim. Hence, claims 21-26 are rejected for the same reason because claims 22-26 are dependent on claim 21.
Claims 22-26 recite the limitation "The system" in line 1.  There is insufficient antecedent basis for this limitation in the claims.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 8, 9, 11-16, 18-22, and 24-26 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Tolstikhin et al (AdaGAN: Boosting Generative Models, 2017).

Regarding claim 8
Tolstikhin teaches: A computer-implemented method, comprising: analyzing a plurality of original records to identify a probability distribution function (PDF), wherein a sample space of the PDF comprises the plurality of original records ([Page 6, Paragraph 2] The definition of Df holds for both continuous and discrete probability measures. Note: Thus, the framework accommodates both continuous and discrete probability measures. [Page 17, Section 4.1.2, Paragraph 1] We compute the probability mass of the true data "covered" by the model distribution Pmodel. [Page 17, Section 4.1.2, Paragraph 2] Another metric is the likelihood of the true data under the generated distribution. Note: True data corresponds to original records.);
generating a plurality of new records using the PDF ([Page 2, Section 1.1, Paragraph 3] (d) update our mixture of generators Gt = (1􀀀t)Gt􀀀1 +tGct (notation expressing the mixture of Gt􀀀1 and Gct with probabilities 1􀀀t and t). [Page 2, Section 1.1, Last Paragraph] The effect of the described procedure is illustrated in a toy example in Figure 1. the blue dots are points sampled from the model mixture of generators Gt. The background colour gives the density of the distribution corresponding to Gt, non zero around the generated points, (almost) zero everywhere else. [Page 5, Paragraph 5] In Section 2 we present our main theoretical results regarding opti-mization of mixture models under general f-divergences. Note: Blue dots sampled from the model mixture of generators Gt correspond to the new records);
creating an augmented dataset that comprises the plurality of new records ([Page 2, Section 1.1, Last Paragraph] On the left images, the red dots are the training (true data) points, the blue dots are points sampled from the model mixture of generators Gt. Note: Blue dots sampled from the model mixture of generators Gt correspond to the plurality of  the new records. The red dots and blue dots together corresponds to an augmented dataset);
and training a machine-learning model using the augmented dataset ([Page 3] Note: Algorithm 1 shows training. [Page 15, Section 3.3] Additionally, we write D DGAN(SN;G) to denote a procedure that returns a discriminator from the GAN algorithm trained on a given set of true data examples SN and examples sampled from the mixture of generators G. Note: True data examples SN and examples sampled from the mixture of generators together correspond to the augmented dataset. ).
Regarding claim 9
Tolstikhin teaches: The computer-implemented method of claim 8, wherein analyzing the plurality of original records to identify the probability distribution function further comprises: training a generator machine-learning model to create a new record that is similar to individual ones of the plurality of original records ([Page 3] Note: Algorithm 1 shows training the generators (G). [Page 5, Section: 2.1 Preliminaries and notations, Paragraph 1] In this work we will write Pd and Pmodel to denote a real data distribution and our approximate model distribution, respectively, both defined over the data space X.  [Page 5, Section: 2.1 Preliminaries and notations, Last Paragraph] So the problem of generative density estimation becomes a problem of finding a function G such that Pmodel looks like Pd in the sense that samples from Pmodel and from Pd look similar);
training a discriminator machine-learning model to distinguish between the new record and the individual ones of the plurality of original records ([Page 5, Section: 2.1 Preliminaries and notations, Paragraph 1] In this work we will write Pd and Pmodel to denote a real data distribution and our approximate model distribution, respectively, both defined over the data space X. [Page 7, Below equation (7)] where we denoted Pg := Pt model. [Page 13, Section 3, Paragraph 3] Indeed, we can train a discriminator D to distinguish between samples from Pd and Pg);
and identifying the probability distribution function in response to the new record created by the generator machine-learning model being mistaken by the discriminator machine-learning model at a predefined rate ([Page 5, Section 2.1, Paragraph 2] Generative Density Estimation In the generative approach to density estimation, instead of building a probabilistic model of the data directly, one builds a function G : Z ! X that transforms a fixed probability distribution PZ (often called the noise distribution) over a latent space Z into a distribution over X. Hence Pmodel is the pushforward of PZ, i.e. Pmodel(A) = PZ(G􀀀1(A)). [Page 5, Paragraph 5] In Section 2.5 we show that if the GAN optimization at each step is perfect, the process converges to the true data distribution at exponential rate (or even in a finite number of steps, for which we provide a necessary and sufficient condition). [Page 7, Below equation (7)] where we denoted Pg := Pt model. [Page 13, Section 3, Paragraph 3] Indeed, we can train a discriminator D to distinguish between samples from Pd and Pg. It is known that for an arbitrary f-divergence, there exists a corresponding function h (see [7]) such that the values of the optimal discriminator D are related to the density ratio in the following way. Note: G corresponds to generator and D corresponds to Discriminator).
	
Regarding claim 11
Tolstikhin teaches: The computer-implemented method of claim 9, wherein the generator machine-learning model is one of a plurality of generator machine-learning models and the method further comprises ([Page 2, Section 1.1, Paragraph 1] Motivated by the problem of missing modes, in this work we propose to use multiple generative models combined into a mixture. These generative models are trained iteratively by adding, at each step, another model to the mixture that should hopefully cover the areas of the space not covered by the previous mixture components):
training each of the plurality of generator machine-learning models to create the new record that is similar to individual ones of the plurality of original records ([Page 2, Section 1.1, Paragraph 1] These generative models are trained iteratively by adding, at each step, another model to the mixture that should hopefully cover the areas of the space not covered by the previous mixture components. [Page 2, Section 1.1, Paragraph 3] mixture of generators Gt-1.  [Page 2, Section 1.1, Last Paragraph] The effect of the described procedure is illustrated in a toy example in Figure 1. On the left images, the red dots are the training (true data) points, the blue dots are points sampled from the model mixture of generators Gt. Note: Red dots correspond to original records and blue dots correspond to new record);
selecting the generator machine-learning model from the plurality of generator machine learning models based at least in part on: a run length associated with each generator machine-learning model and the discriminator machine-learning model, a generator loss rank associated with each generator machine- learning model and the discriminator machine-learning model, a discriminator loss rank associated with each generator machine- learning model and the discriminator machine-learning model, a different rank associated with each generator machine-learning model and the discriminator machine-learning model, or at least one result of a Kolmogorov-Smirnov (KS) test that includes a first probability distribution function associated with the plurality of original records and a second probability distribution function associated with the plurality of new records ([Page 16, Section 4.1.1, Paragraph 2] The best model out of T runs of GAN, that is: run T GAN instances independently, then take the run that performs best on a validation set. Note: The best model out of T runs of GAN corresponds to selecting the generator machine-learning model based at least in part on: a run length associated with each generator machine-learning model and the discriminator machine-learning model);
and identifying the probability distribution function further occurs in response to selecting the generator machine-learning model from the plurality of generator machine learning models ([Page 18, Last Paragraph] Table 1: Performance of the different algorithms on varying number of mixtures of Gaussians. The reported score is the coverage C, probability mass of Pd covered by the 5th percentile of Pg defined in Section 4.1.2. Note: Column 1 corresponds to selecting the generator machine-learning model and the other columns correspond to identifying the probability distribution).

Regarding claim 12
Tolstikhin teaches: The computer-implemented method of claim 8, wherein generating the plurality of new records using the probability distribution function further comprises randomly selecting a predefined number of points in the sample space defined by the probability distribution function ([Page 2, Section 1.1, Paragraph 3] For sampling from the resulting model we first define a generator Gci, by sampling the index i from a multinomial distribution with parameters 1; : : : ; T , and then we return Gci (Z), where Z PZ is a standard latent noise variable used in the GAN literature. [Page 2, Section 1.1, Last Paragraph] The effect of the described procedure is illustrated in a toy example in Figure 1. On the left images, the red dots are the training (true data) points, the blue dots are points sampled from the model mixture of generators Gt. The background colour gives the density of the distribution corresponding to Gt, non zero around the generated points, (almost) zero everywhere else. On the right images, the color corresponds to the weights of training points, following the reweighting scheme proposed in this work. The top row corresponds to the first iteration of AdaGAN, and the bottom row to the second iteration).

Regarding claim 13
Tolstikhin teaches: The computer-implemented method of claim 8, further comprising adding the plurality of original records to the augmented dataset ([Page 15, Section 3.3] a given set of true data examples SN and examples sampled from the mixture of generators G. Note: Set of true data examples correspond to original records. Examples sampled from the mixture of generators correspond to the augmented dataset).

Regarding claim 14
Tolstikhin teaches: The computer-implemented method of claim 8, wherein the machine learning model comprises a neural network ([Page 6, Below equation (2)] where D and G are two functions represented by neural networks).

Regarding claim 15
Tolstikhin teaches: A system, comprising: a computing device comprising a processor and a memory; and machine-readable instructions stored in the memory that, when executed by the processor, cause the computing device to at least ([Page 15, Section 4] Code available online):
analyze a plurality of original records to identify a probability distribution function (PDF), wherein a sample space of the PDF comprises the plurality of original records ([Page 6, Paragraph 2] The definition of Df holds for both continuous and discrete probability measures. Note: Thus, the framework accommodates both continuous and discrete probability measures. [Page 17, Section 4.1.2, Paragraph 1] We compute the probability mass of the true data "covered" by the model distribution Pmodel. [Page 17, Section 4.1.2, Paragraph 2] Another metric is the likelihood of the true data under the generated distribution. Note: True data corresponds to original records);
generate a plurality of new records using the PDF ([Page 2, Section 1.1, Paragraph 3] (d) update our mixture of generators Gt = (1􀀀t)Gt􀀀1 +tGct (notation expressing the mixture of Gt􀀀1 and Gct with probabilities 1􀀀t and t). [Page 2, Section 1.1, Last Paragraph] The effect of the described procedure is illustrated in a toy example in Figure 1. the blue dots are points sampled from the model mixture of generators Gt. The background colour gives the density of the distribution corresponding to Gt, non zero around the generated points, (almost) zero everywhere else. Note: Blue dots sampled from the model mixture of generators Gt correspond to the new records. [Page 5, Paragraph 5] In Section 2 we present our main theoretical results regarding opti-mization of mixture models under general f-divergences);
create an augmented dataset that comprises the plurality of new records ([Page 2, Section 1.1, Last Paragraph] On the left images, the red dots are the training (true data) points, the blue dots are points sampled from the model mixture of generators Gt. Note: Blue dots sampled from the model mixture of generators Gt correspond to the plurality of  the new records.The red dots and blue dots together corresponds to an augmented dataset);
and train a machine-learning model using the augmented dataset ([Page 3] Note: Algorithm 1 shows training. [Page 15, Section 3.3] Additionally, we write D DGAN(SN;G) to denote a procedure that returns a discriminator from the GAN algorithm trained on a given set of true data examples SN and examples sampled from the mixture of generators G. Note: True data examples SN and examples sampled from the mixture of generators together correspond to the augmented dataset).

Regarding claim 16
Tolstikhin teaches: The system of claim 15, wherein the machine-readable instructions that cause the computing device to analyze the plurality of original records to identify the probability distribution function further cause the computing device to at least: train a generator machine-learning model to create a new record that is similar to individual ones of the plurality of original records ([Page 3] Note: Algorithm 1 shows training the generators (G). [Page 5, Section: 2.1 Preliminaries and notations,Paragraph 1] In this work we will write Pd and Pmodel to denote a real data distribution and our approximate model distribution, respectively, both defined over the data space X.  [Page 5, Section: 2.1 Preliminaries and notations, Last Paragraph] So the problem of generative density estimation becomes a problem of finding a function G such that Pmodel looks like Pd in the sense that samples from Pmodel and from Pd look similar);
train a discriminator machine-learning model to distinguish between the new record and the individual ones of the plurality of original records ([Page 5, Section: 2.1 Preliminaries and notations,Paragraph 1] In this work we will write Pd and Pmodel to denote a real data distribution and our approximate model distribution, respectively, both defined over the data space X. [Page 7, Below equation (7)] where we denoted Pg := Pt model. [Page 13, Section 3, Paragraph 3] Indeed, we can train a discriminator D to distinguish between samples from Pd and Pg);
and identify the probability distribution function in response to the new record created by the generator machine-learning model being mistaken by the discriminator machine-learning model at a predefined rate ([Page 5, Section 2.1, Paragraph 2] Generative Density Estimation In the generative approach to density estimation, instead of building a probabilistic model of the data directly, one builds a function G : Z ! X that transforms a fxed probability distribution PZ (often called the noise distribution) over a latent space Z into a distribution over X. Hence Pmodel is the pushforward of PZ, i.e. Pmodel(A) = PZ(G􀀀1(A)). [Page 5, Paragraph 5] In Section 2.5 we show that if the GAN optimization at each step is perfect, the process converges to the true data distribution at exponential rate (or even in a finite number of steps, for which we provide a necessary and sufficient condition). [Page 7, Below equation (7)] where we denoted Pg := Pt model. [Page 13, Section 3, Paragraph 3] Indeed, we can train a discriminator D to distinguish between samples from Pd and Pg. It is known that for an arbitrary f-divergence, there exists a corresponding function h (see [7]) such that the values of the optimal discriminator D are related to the density ratio in the following way. Note: G corresponds to generator and D corresponds to Discriminator).

Regarding claim 18
Tolstikhin teaches: The system of claim 16, wherein the generator machine-learning model is one of a plurality of generator machine-learning models and the machine- readable instructions further cause the computing device to at least: ([Page 2, Section 1.1, Paragraph 1] Motivated by the problem of missing modes, in this work we propose to use multiple generative models combined into a mixture. These generative models are trained iteratively by adding, at each step, another model to the mixture that should hopefully cover the areas of the space not covered by the previous mixture components):
train each of the plurality of generator machine-learning models to create the new record that is similar to individual ones of the plurality of original records ([Page 2, Section 1.1, Paragraph 1] These generative models are trained iteratively by adding, at each step, another model to the mixture that should hopefully cover the areas of the space not covered by the previous mixture components. [Page 2, Section 1.1, Paragraph 3] mixture of generators Gt-1.  [Page 2, Section 1.1, Last Paragraph] The effect of the described procedure is illustrated in a toy example in Figure 1. On the left images, the red dots are the training (true data) points, the blue dots are points sampled from the model mixture of generators Gt. Note: Red dots correspond to original records and blue dots correspond to new record);
select the generator machine-learning model from the plurality of generator machine learning models based at least in part on: a run length associated with each generator machine-learning model and the discriminator machine-learning model, a generator loss rank associated with each generator machine- learning model and the discriminator machine-learning model, a discriminator loss rank associated with each generator machine- learning model and the discriminator machine-learning model, a different rank associated with each generator machine-learning model and the discriminator machine-learning model, or at least one result of a Kolmogorov-Smirnov (KS) test that includes a first probability distribution function associated with the plurality of original records and a second probability distribution function associated with the plurality of new records ([Page 16, Section 4.1.1, Paragraph 2] The best model out of T runs of GAN, that is: run T GAN instances independently, then take the run that performs best on a validation set. Note: The best model out of T runs of GAN corresponds to selecting the generator machine-learning model based at least in part on: a run length associated with each generator machine-learning model and the discriminator machine-learning model);
and identification of the probability distribution function further occurs in response to selecting the generator machine-learning model from the plurality of generator machine learning models ([Page 18, Last Paragraph] Table 1: Performance of the different algorithms on varying number of mixtures of Gaussians. The reported score is the coverage C, probability mass of Pd covered by the 5th percentile of Pg defined in Section 4.1.2. Note: Column 1 corresponds to selecting the generator machine-learning model and the other columns correspond to identifying the probability distribution).

Regarding claim 19
Tolstikhin teaches: The system of claim 15, wherein the machine-readable instructions that cause the computing device to generate the plurality of new records using the probability distribution function further cause the computing device to randomly select a predefined number of points in the sample space defined by the probability distribution function (Page 2, Section 1.1, Paragraph 3] For sampling from the resulting model we first define a generator Gci, by sampling the index i from a multinomial distribution with parameters 1; : : : ; T , and then we return Gci (Z), where Z PZ is a standard latent noise variable used in the GAN literature. [Page 2, Section 1.1, Last Paragraph] The effect of the described procedure is illustrated in a toy example in Figure 1. On the left images, the red dots are the training (true data) points, the blue dots are points sampled from the model mixture of generators Gt. The background colour gives the density of the distribution corresponding to Gt, non zero around the generated points, (almost) zero everywhere else. On the right images, the color corresponds to the weights of training points, following the reweighting scheme proposed in this work. The top row corresponds to the first iteration of AdaGAN, and the bottom row to the second iteration).



Regarding claim 20
Tolstikhin teaches: The system of claim 15, wherein the machine-readable instructions, when executed by the processor, further cause the computing device to at least add the plurality of original records to the augmented dataset ([Page 15, Section 3.3] a given set of true data examples SN and examples sampled from the mixture of generators G. Note: Set of true data examples correspond to original records. Examples sampled from the mixture of generators correspond to the augmented dataset).

Regarding claim 21
Tolstikhin teaches: A non-transitory, computer-readable medium comprising machine- readable instructions stored in the memory that, when executed by a processor of a computing device, cause the computing device to at least ([Page 15, Section 4] Code available online):
analyze a plurality of original records to identify a probability distribution function (PDF), wherein a sample space of the PDF comprises the plurality of original records ([Page 6, Paragraph 2] The definition of Df holds for both continuous and discrete probability measures. Note: Thus, the framework accommodates both continuous and discrete probability measures. [Page 17, Section 4.1.2, Paragraph 1] We compute the probability mass of the true data "covered" by the model distribution Pmodel. [Page 17, Section 4.1.2, Paragraph 2] Another metric is the likelihood of the true data under the generated distribution. Note: True data corresponds to original records);
generate a plurality of new records using the PDF ([Page 2, Section 1.1, Paragraph 3] (d) update our mixture of generators Gt = (1􀀀t)Gt􀀀1 +tGct (notation expressing the mixture of Gt􀀀1 and Gct with probabilities 1􀀀t and t). [Page 2, Section 1.1, Last Paragraph] The effect of the described procedure is illustrated in a toy example in Figure 1. the blue dots are points sampled from the model mixture of generators Gt. The background colour gives the density of the distribution corresponding to Gt, non zero around the generated points, (almost) zero everywhere else. Note: Blue dots sampled from the model mixture of generators Gt correspond to the new records. [Page 5, Paragraph 5] In Section 2 we present our main theoretical results regarding opti-mization of mixture models under general f-divergences);
create an augmented dataset that comprises the plurality of new records ([Page 2, Section 1.1, Last Paragraph] On the left images, the red dots are the training (true data) points, the blue dots are points sampled from the model mixture of generators Gt. Note: Blue dots sampled from the model mixture of generators Gt correspond to the plurality of  the new records.The red dots and blue dots together corresponds to an augmented dataset);
and train a machine-learning model using the augmented dataset ([Page 3] Note: Algorithm 1 shows training. [Page 15, Section 3.3] Additionally, we write D DGAN(SN;G) to denote a procedure that returns a discriminator from the GAN algorithm trained on a given set of true data examples SN and examples sampled from the mixture of generators G. Note: True data examples SN and examples sampled from the mixture of generators together correspond to the augmented dataset).

Regarding claim 22
Tolstikhin teaches: The system of claim 21, wherein the machine-readable instructions that cause the computing device to analyze the plurality of original records to identify the probability distribution function further cause the computing device to at least: train a generator machine-learning model to create a new record that is similar to individual ones of the plurality of original records ([Page 3] Note: Algorithm 1 shows training the generators (G). [Page 5, Section: 2.1 Preliminaries and notations,Paragraph 1] In this work we will write Pd and Pmodel to denote a real data distribution and our approximate model distribution, respectively, both defined over the data space X.  [Page 5, Section: 2.1 Preliminaries and notations, Last Paragraph] So the problem of generative density estimation becomes a problem of finding a function G such that Pmodel looks like Pd in the sense that samples from Pmodel and from Pd look similar);
train a discriminator machine-learning model to distinguish between the new record and the individual ones of the plurality of original records ([Page 5, Section: 2.1 Preliminaries and notations,Paragraph 1] In this work we will write Pd and Pmodel to denote a real data distribution and our approximate model distribution, respectively, both defined over the data space X. [Page 7, Below equation (7)] where we denoted Pg := Pt model. [Page 13, Section 3, Paragraph 3] Indeed, we can train a discriminator D to distinguish between samples from Pd and Pg);
and identify the probability distribution function in response to the new record created by the generator machine-learning model being mistaken by the discriminator machine-learning model at a predefined rate ([Page 5, Section 2.1, Paragraph 2] Generative Density Estimation In the generative approach to density estimation, instead of building a probabilistic model of the data directly, one builds a function G : Z ! X that transforms a fxed probability distribution PZ (often called the noise distribution) over a latent space Z into a distribution over X. Hence Pmodel is the pushforward of PZ, i.e. Pmodel(A) = PZ(G􀀀1(A)). [Page 5, Paragraph 5] In Section 2.5 we show that if the GAN optimization at each step is perfect, the process converges to the true data distribution at exponential rate (or even in a finite number of steps, for which we provide a necessary and sufficient condition). [Page 7, Below equation (7)] where we denoted Pg := Pt model. [Page 13, Section 3, Paragraph 3] Indeed, we can train a discriminator D to distinguish between samples from Pd and Pg. It is known that for an arbitrary f-divergence, there exists a corresponding function h (see [7]) such that the values of the optimal discriminator D are related to the density ratio in the following way. Note: G corresponds to generator and D corresponds to Discriminator).

Regarding claim 24
Tolstikhin teaches: The system of claim 22, wherein the generator machine-learning model is one of a plurality of generator machine-learning models and the machine- readable instructions further cause the computing device to at least ([Page 2, Section 1.1, Paragraph 1] Motivated by the problem of missing modes, in this work we propose to use multiple generative models combined into a mixture. These generative models are trained iteratively by adding, at each step, another model to the mixture that should hopefully cover the areas of the space not covered by the previous mixture components):
train each of the plurality of generator machine-learning models to create the new record that is similar to individual ones of the plurality of original records ([Page 2, Section 1.1, Paragraph 1] These generative models are trained iteratively by adding, at each step, another model to the mixture that should hopefully cover the areas of the space not covered by the previous mixture components. [Page 2, Section 1.1, Paragraph 3] mixture of generators Gt-1.  [Page 2, Section 1.1, Last Paragraph] The effect of the described procedure is illustrated in a toy example in Figure 1. On the left images, the red dots are the training (true data) points, the blue dots are points sampled from the model mixture of generators Gt. Note: Red dots correspond to original records and blue dots correspond to new record);
select the generator machine-learning model from the plurality of generator machine learning models based at least in part on: a run length associated with each generator machine-learning model and the discriminator machine-learning model, a generator loss rank associated with each generator machine- learning model and the discriminator machine-learning model, a discriminator loss rank associated with each generator machine- learning model and the discriminator machine-learning model, a different rank associated with each generator machine-learning model and the discriminator machine-learning model, or at least one result of a Kolmogorov-Smirnov (KS) test that includes a first probability distribution function associated with the plurality of original records and a second probability distribution function associated with the plurality of new records ([Page 16, Section 4.1.1, Paragraph 2] The best model out of T runs of GAN, that is: run T GAN instances independently, then take the run that performs best on a validation set. Note: The best model out of T runs of GAN corresponds to selecting the generator machine-learning model based at least in part on: a run length associated with each generator machine-learning model and the discriminator machine-learning model);
and identification of the probability distribution function further occurs in response to selecting the generator machine-learning model from the plurality of generator machine learning models ([Page 18, Last Paragraph] Table 1: Performance of the different algorithms on varying number of mixtures of Gaussians. The reported score is the coverage C, probability mass of Pd covered by the 5th percentile of Pg defined in Section 4.1.2. Note: Column 1 corresponds to selecting the generator machine-learning model and the other columns correspond to identifying the probability distribution).

Regarding claim 25
Tolstikhin teaches: The system of claim 21, wherein the machine-readable instructions that cause the computing device to generate the plurality of new records using the probability distribution function further cause the computing device to randomly select a predefined number of points in the sample space defined by the probability distribution function ([Page 2, Section 1.1, Paragraph 3] For sampling from the resulting model we first define a generator Gci, by sampling the index i from a multinomial distribution with parameters 1; : : : ; T , and then we return Gci (Z), where Z PZ is a standard latent noise variable used in the GAN literature. [Page 2, Section 1.1, Last Paragraph] The effect of the described procedure is illustrated in a toy example in Figure 1. On the left images, the red dots are the training (true data) points, the blue dots are points sampled from the model mixture of generators Gt. The background colour gives the density of the distribution corresponding to Gt, non zero around the generated points, (almost) zero everywhere else. On the right images, the color corresponds to the weights of training points, following the reweighting scheme proposed in this work. The top row corresponds to the first iteration of AdaGAN, and the bottom row to the second iteration).

Regarding claim 26
Tolstikhin teaches: The system of claim 21, wherein the machine-readable instructions, when executed by the processor, further cause the computing device to at least add the plurality of original records to the augmented dataset ([Page 15, Section 3.3] a given set of true data examples SN and examples sampled from the mixture of generators G. Note: Set of true data examples correspond to original records. Examples sampled from the mixture of generators correspond to the augmented dataset).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 10, 17, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Tolstikhin et al (AdaGAN: Boosting Generative Models, 2017) in view of Goodfellow (Generative adversarial nets, 2014).
Regarding claim 10
Tolstikhin teaches: The computer-implemented method of claim 9 (as shown above).
However, Tolstikhin does not explicitly disclose: wherein the predefined rate is approximately fifty percent of comparisons performed by the discriminator between the new record and the plurality of original records.
Goodfellow teaches, in an analogous system: wherein the predefined rate is approximately fifty percent of comparisons performed by the discriminator between the new record and the plurality of original records ([Page 4, Paragraph 1, Figure 1] The discriminator is unable to differentiate between the two distributions, i.e. D(x) = 1/2 . Note: 1/2 corresponds to fifty percent).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer-implemented method of Tolstikhin  to incorporate the teachings of Goodfellow  wherein the predefined rate is approximately fifty percent of comparisons performed by the discriminator between the new record and the plurality of original records. One would have been motivated to do this modification because doing so would give the benefit of training by simultaneously updating the discriminative distribution (D, blue, dashed line) so that it discriminates between samples from the data generating distribution (black, dotted line) px from those of the generative distribution pg (G) (green, solid line), as taught by Goodfellow paragraph [Page 4, Paragraph 1, Figure 1].

Regarding claim 17
Tolstikhin teaches: The system of claim 16 (as shown above).
However, Tolstikhin does not explicitly disclose: wherein the predefined rate is approximately fifty percent of comparisons performed by the discriminator between the new record and the plurality of original records.
Goodfellow teaches, in an analogous system: wherein the predefined rate is approximately fifty percent of comparisons performed by the discriminator between the new record and the plurality of original records ([Page 4, Paragraph 1, Figure 1] The discriminator is unable to differentiate between the two distributions, i.e. D(x) = 1/2 . Note: 1/2 corresponds to fifty percent).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer-implemented method of Tolstikhin  to incorporate the teachings of Goodfellow  wherein the predefined rate is approximately fifty percent of comparisons performed by the discriminator between the new record and the plurality of original records. One would have been motivated to do this modification because doing so would give the benefit of training by simultaneously updating the discriminative distribution (D, blue, dashed line) so that it discriminates between samples from the data generating distribution (black, dotted line) px from those of the generative distribution pg (G) (green, solid line), as taught by Goodfellow paragraph [Page 4, Paragraph 1, Figure 1].

Regarding claim 23
Tolstikhin teaches: The system of claim 22 (as shown above).
However, Tolstikhin does not explicitly disclose: wherein the predefined rate is approximately fifty percent of comparisons performed by the discriminator between the new record and the plurality of original records.
Goodfellow teaches, in an analogous system: wherein the predefined rate is approximately fifty percent of comparisons performed by the discriminator between the new record and the plurality of original records ([Page 4, Paragraph 1, Figure 1] The discriminator is unable to differentiate between the two distributions, i.e. D(x) = 1/2 . Note: 1/2 corresponds to fifty percent).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer-implemented method of Tolstikhin  to incorporate the teachings of Goodfellow  wherein the predefined rate is approximately fifty percent of comparisons performed by the discriminator between the new record and the plurality of original records. One would have been motivated to do this modification because doing so would give the benefit of training by simultaneously updating the discriminative distribution (D, blue, dashed line) so that it discriminates between samples from the data generating distribution (black, dotted line) px from those of the generative distribution pg (G) (green, solid line), as taught by Goodfellow paragraph [Page 4, Paragraph 1, Figure 1].

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Creswell et al (2018) discloses Generative Adversarial Networks.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAITANYA RAMESH JAYAKUMAR whose telephone number is (571)272-3369. The examiner can normally be reached Mon-Fri 7am-1pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/C.R.J./Examiner, Art Unit 2128                                                                                                                                                                                         
/ALAN CHEN/Primary Examiner, Art Unit 2125