DETAILED ACTION
This action is in response to the claims filed 03/27/2020 for application 16/831,971. Claims 1-18 are currently pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/26/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 2, 5, 8-11, 14, 17, and 18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1, 
Step 1 Analysis: Claim 1 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 1 recites, in part leveraging… to create a synthetic message dataset that substantially mimics real medical data, and lowering re-identification risk associated with the synthetic message dataset based on presence disclosure assessment, wherein the synthetic message dataset is compared to the real medical data using hamming distance thresholds. The limitations of leveraging… to create a synthetic message dataset that substantially mimics real medical data, and lowering re-identification risk associated with the synthetic message dataset based on presence disclosure assessment, wherein the synthetic message dataset is compared to the real medical data using hamming distance thresholds, as drafted, are processes that, under broadest reasonable interpretation, covers the performance of the limitation in the mind which falls within the “Mental Processes” grouping of abstract ideas. The limitations of: 
leveraging… to create a synthetic message dataset that substantially mimics real medical data can be considered to be an evaluation in the human mind,
lowering re-identification risk associated with the synthetic message dataset based on presence disclosure assessment, wherein the synthetic message dataset is compared to the real medical data using hamming distance thresholds, can be considered to be an evaluation in the human mind,
Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements – “two adversarial neural networks”. These elements that are recited are only generally linked to the judicial exception. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing two adversarial neural networks to perform the claimed steps amount to no more than generally linking the elements to the judicial exception. The claim is not patent eligible.

Regarding claim 2, the rejection of claim 1 is further incorporated, and further, the claim recites: further comprising generating a positive train dataset using the two adversarial networks. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does recite the additional elements of “two adversarial networks”, however they do not amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception, for the same reasons set forth in connection with the rejection of claim 1 above. The claim is not patent eligible.

Regarding claim 5, the rejection of claim 1 is further incorporated, and further, the claim recites: further comprising generating a negative train dataset using the two adversarial networks. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does recite the additional elements of “two adversarial networks”, however they do not amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception, for the same reasons set forth in connection with the rejection of claim 1 above. The claim is not patent eligible.

Regarding claim 8, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the lowering the re-identification risk associated with the synthetic message dataset involves a presence disclosure test that compares synthetic data records with real data records using the hamming distance thresholds. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
The claim does not include any additional elements that amount to an integration of the judicial exceptions into a practical application, nor to significantly more than the judicial exceptions. The claim is not patent eligible.

Regarding claim 9, the rejection of claim 1 is further incorporated, and further, the claim recites: further comprising determining a degree of the re- identification risk based on the hamming distance thresholds. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
The claim does not include any additional elements that amount to an integration of the judicial exceptions into a practical application, nor to significantly more than the judicial exceptions. The claim is not patent eligible.

Regarding claim 10, 
Step 1 Analysis: Claim 10 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 10 recites, in part generating a first message dataset from the real medical data, generating a second message dataset from the real medical data, generating a synthetic message dataset having at least a portion of the medical data based on the first message dataset and the second message dataset the synthetic message dataset having synthetic medical data being substantially indistinguishable from the real medical data; and lowering a re-identification risk associated with the synthetic message dataset based on a match between a synthetic data record in the synthetic medical data and a real data record in the real medical data. The limitations of generating a first message dataset from the real medical data, generating a second message dataset from the real medical data, generating a synthetic message dataset having at least a portion of the medical data based on the first message dataset and the second message dataset the synthetic message dataset having synthetic medical data being substantially indistinguishable from the real medical data; and lowering a re-identification risk associated with the synthetic message dataset based on a match between a synthetic data record in the synthetic medical data and a real data record in the real medical data, as drafted, are processes that, under broadest reasonable interpretation, covers the performance of the limitation in the mind which falls within the “Mental Processes” grouping of abstract ideas. The limitations of: 
generating a first message dataset from the real medical data, can be considered to be an evaluation in the human mind
generating a second message dataset from the real medical data, can be considered to be an evaluation in the human mind
generating a synthetic message dataset having at least a portion of the medical data based on the first message dataset and the second message dataset, the synthetic message dataset having synthetic medical data being substantially indistinguishable from the real medical data can be considered to be an evaluation in the human mind and 
lowering a re-identification risk associated with the synthetic message dataset based on a match between a synthetic data record in the synthetic medical data and a real data record in the real medical data., can be considered to be an evaluation in the human mind,
Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements – “first neural network and second neural network”. These elements that are recited are only generally linked to the judicial exception. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a first and second neural network to perform the claimed steps amount to no more than generally linking the elements to the judicial exception. The claim is not patent eligible.

Regarding claim 11, the rejection of claim 10 is further incorporated, and further, the claim recites: wherein generating the first message comprises generating a positive train dataset of the first message dataset for training the first neural network. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 10 above.
The claim does recite the additional element of “training the first neural network”, however it does not amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception, for the same reasons set forth in connection with the rejection of claim 10 above. The claim is not patent eligible.

Regarding claim 14, the rejection of claim 10 is further incorporated, and further, the claim recites: wherein generating the second message comprises generating a negative train dataset of the second message dataset for training the second neural network. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 10 above.
The claim does recite the additional element of “training the second neural network”, however it does not amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception, for the same reasons set forth in connection with the rejection of claim 10 above. The claim is not patent eligible.

Regarding claim 17, the rejection of claim 10 is further incorporated, and further, the claim recites: wherein lowering the re-identification risk associated with the synthetic message dataset comprises using a hamming distance between the synthetic data record and the real data record. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 10, thus recites a judicial exception.
The claim does not include any additional elements that amount to an integration of the judicial exceptions into a practical application, nor to significantly more than the judicial exceptions. The claim is not patent eligible.

Regarding claim 18, the rejection of claim 10 is further incorporated, and further, the claim recites: further comprising determining a degree of the re-identification risk based on the hamming distance. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 10, thus recites a judicial exception.
The claim does not include any additional elements that amount to an integration of the judicial exceptions into a practical application, nor to significantly more than the judicial exceptions. The claim is not patent eligible.

Claims 3, 4, 6, 7, 12, 13, 15 and 16 recite additional elements or steps that amount to a practical application of the abstract idea or significantly more than the exception and would be eligible if incorporated into the respective parent independent claim.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 10-14 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Beaulieu-Jones et al. ("Privacy-preserving generative deep neural networks support clinical data sharing", cited by Applicant in the IDS filed 08/26/2020, hereinafter "Beaulieu-Jones").


Regarding claim 10, Beaulieu-Jones teaches A method of processing real medical data, comprising: 
generating a first message dataset from the real medical data using a first neural network (“A differentially private AC-GAN was trained using the 6,000 training set providing a differentially private training simulated training set (labeled private).” [pg. 10, Transfer Learning Task; private training dataset corresponds to a “first message dataset”]); 
generating a second message dataset from the real medical data using a second neural network (“A vanilla AC-GAN was trained using the 6,000 participant training set providing a simulated training set (labeled non-private).” [pg. 10, Transfer Learning Task; non-private training dataset corresponds to a “second message dataset”]); 
generating a synthetic message dataset having at least a portion of the medical data based on the first message dataset and the second message dataset (“Each classifier was then trained on the real, non-private and private training sets and evaluated on the same, real test set of participants. This allows for a comparison of classification performance between models trained on the real data, synthetic data and private synthetic data.” [pg. 10, Transfer Learning Task]), the synthetic message dataset having synthetic medical data being substantially indistinguishable from the real medical data (“We trained the discriminator to differentiate real and simulated data from a dataset containing both groups. We repeated this process until the generator created synthetic participants that were difficult to discriminate from real ones.” [pg. 2, §Results, ¶2]); and 
lowering a re-identification risk associated with the synthetic message dataset based on a match between a synthetic data record in the synthetic medical data and a real data record in the real medical data (“The practice of generating data under differential privacy with deep neural networks offers a technical solution for those who wish to share data to the challenge of patient privacy. This technical work complements ongoing efforts to change the data sharing culture of clinical research.” [pg. 9, ¶3; See further: “We developed an approach to train auxiliary classifier generative adversarial networks (AC-GANs) in a differentially private manner to enable privacy preserving data sharing. Generative adversarial networks offer the ability to simulate realistic-looking data that closely matches the distribution of the source data. AC-GANs add the ability to generate labeled samples. By training AC-GANs under the differential privacy framework we generated realistic samples that can be used for initial analysis while guaranteeing a specified level of participant privacy.”]).

Regarding claim 11, Beaulieu-Jones teaches The method of claim 10, wherein generating the first message comprises generating a positive train dataset of the first message dataset for training the first neural network (“This was done by splitting the 6,502 into a training set of 6,000 participants (labeled real)… A differentially private AC-GAN was trained using the 6,000 training set providing a differentially private training simulated training set (labeled private).” [pg. 10, Transfer Learning Task; private training dataset corresponds to a “positive train dataset”]).

Regarding claim 12, Beaulieu-Jones teaches The method of claim 11, wherein generating the first message comprises generating a positive holdout dataset of the first message dataset that excludes the positive train dataset (“This was done by splitting the 6,502 into a training set of 6,000 participants (labeled real) and a test set of 502 participants” [pg. 10, Transfer Learning Task; The test set of real data corresponds to a “positive holdout dataset”. See also, pg. 2, ¶3: “comparing machine learning predictors constructed on real vs. simulated data. We find that the model learns realistic distributions and that models constructed from the simulated data successfully classify participants in a held-out portion of the underlying real dataset.”]).

Regarding claim 13, Beaulieu-Jones teaches The method of claim 11, wherein generating the synthetic message dataset comprises generating a positive model based on the positive train dataset (“Each classifier was then trained on the real, non-private and private training sets and evaluated on the same, real test set of participants. This allows for a comparison of classification performance between models trained on the real data, synthetic data and private synthetic data. We evaluated both accuracy as well as the correlation between important features (random forest) and model coefficients (logistic regression and support vector machine).” [pg. 10, Transfer Learning Task; classifier trained on the real test set corresponds to a “positive” model. The same, real test set of participants would include the “positive holdout dataset”]).

Regarding claim 14, Beaulieu-Jones teaches The method of claim 10, wherein generating the second message comprises generating a negative train dataset of the second message dataset for training the second neural network. (“A vanilla AC-GAN was trained using the 6,000 participant training set providing a simulated training set (labeled non-private).” [pg. 10, Transfer Learning Task; non-private training dataset corresponds to a “negative train dataset”])

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5, 8, 9, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Beaulieu-Jones in view of Choi et al. ("Generating Multi-label Discrete Patient Records using Generative Adversarial Networks", cited by Applicant in the IDS filed 08/26/2020, hereinafter "Choi").

Regarding claim 1, Beaulieu-Jones teaches A method of generating synthetic medical data for enabling machine learning research (note: The machine learning research is not given any patentable weight because it appears to be an intended use of the claimed invention recited in the preamble. Please see MPEP 2111.02(II)), comprising: 
leveraging two adversarial neural networks that compete with each other (“An AC-GAN (Figure IA in the Data Supplement) is made up of 2 neural networks competing with each other.” [pg. 2, §Results, ¶2]) to create a synthetic message dataset that substantially mimics real medical data (“We trained the discriminator to differentiate real and simulated data from a dataset containing both groups. We repeated this process until the generator created synthetic participants that were difficult to discriminate from real ones.” [pg. 2, §Results, ¶2]); and 
lowering re-identification risk associated with the synthetic message dataset based on presence disclosure assessment (“The practice of generating data under differential privacy with deep neural networks offers a technical solution for those who wish to share data to the challenge of patient privacy. This technical work complements ongoing efforts to change the data sharing culture of clinical research.” [pg. 9, ¶3; See further: “We developed an approach to train auxiliary classifier generative adversarial networks (AC-GANs) in a differentially private manner to enable privacy preserving data sharing. Generative adversarial networks offer the ability to simulate realistic-looking data that closely matches the distribution of the source data. AC-GANs add the ability to generate labeled samples. By training AC-GANs under the differential privacy framework we generated realistic samples that can be used for initial analysis while guaranteeing a specified level of participant privacy.”]), 
Although Beaulieu-Jones teaches lowering re-identification risk associated with the synthetic message dataset, the reference fails to explicitly teach wherein the synthetic message dataset is compared to the real medical data using hamming distance thresholds.
Choi also teaches lowering re-identification risk associated with the synthetic message dataset based on presence disclosure assessment (“Presence disclosure occurs when an attacker can determine that medGAN was trained with a dataset including the record from patient x. Presence disclosure for medGAN happens when a powerful attacker, one who already possesses the complete records of a set of patients P, can determine whether anyone from P are in the training set by observing the generated patient records. More recently, for machine learned models, this has been referred to as an membership inference attack (Shokri et al., 2017). the knowledge gained by the attacker may be limited, if the dataset is well balanced in its clinical concepts” [pg. 6, Presence disclosure])
wherein the synthetic message dataset is compared to the real medical data using hamming distance thresholds (“We assume the attacker has complete knowledge on those 2r records. Then for each record, we calculate its hamming distance to each sample from the synthetic dataset S ∈ {0, 1} N×|C|. If there is at least one synthetic sample within a certain distance, we treat that as its claimed match. Now, since we sample from both R and T, the match could be a true positive (i.e., attacker correctly claims their targeted record is in the GAN training set), false positive (i.e., attacker incorrectly claims their targeted record is in the GAN training set), true negative (i.e., attacker correctly claims their targeted record is not in the GAN training set” [pg. 20, top para]).
Beaulieu-Jones and Choi are both in the same field of endeavor of training GANs using synthetic medical data. Beaulieu-Jones discloses a method of training GANs to support clinical data sharing. Choi discloses generating discrete patient records using GANs. It would have been obvious to one of ordinary skill in the art to modify Beaulieu-Jones’ teachings by implementing hamming distance thresholds to compare real and synthetic datasets as taught by Choi. One would have been motivated to make this modification in order to protect patient privacy data. [pg. 1, § Introduction, ¶1, Choi]

	Regarding claim 2, Beaulieu-Jones/Choi teaches The method of claim 1, where Beaulieu-Jones teaches further comprising generating a positive train dataset using the two adversarial networks (“This was done by splitting the 6,502 into a training set of 6,000 participants (labeled real)… A differentially private AC-GAN was trained using the 6,000 training set providing a differentially private training simulated training set (labeled private).” [pg. 10, Transfer Learning Task; private training dataset corresponds to a “positive train dataset”]).
	Regarding claim 3, Beaulieu-Jones/Choi teaches The method of claim 2, where Beaulieu-Jones teaches further comprising generating a positive holdout dataset using the two adversarial networks, wherein the positive holdout dataset excludes the positive train dataset (“This was done by splitting the 6,502 into a training set of 6,000 participants (labeled real) and a test set of 502 participants” [pg. 10, Transfer Learning Task; The test set of real data corresponds to a “positive holdout dataset”. See also, pg. 2, ¶3: “comparing machine learning predictors constructed on real vs. simulated data. We find that the model learns realistic distributions and that models constructed from the simulated data successfully classify participants in a held-out portion of the underlying real dataset.”]).

Regarding claim 4, Beaulieu-Jones/Choi teaches The method of claim 3, where Beaulieu-Jones wherein the synthetic message dataset is used to train a classification model, wherein the classification model is tested using the positive holdout dataset (“Each classifier was then trained on the real, non-private and private training sets and evaluated on the same, real test set of participants. This allows for a comparison of classification performance between models trained on the real data, synthetic data and private synthetic data. We evaluated both accuracy as well as the correlation between important features (random forest) and model coefficients (logistic regression and support vector machine).” [pg. 10, Transfer Learning Task; classifier corresponds to a classification model. The same, real test set of participants would include the “positive holdout dataset”]).

Regarding claim 5, Beaulieu-Jones/Choi teaches The method of claim 1, where Beaulieu-Jones teaches further comprising generating a negative train dataset using the two adversarial networks (“A vanilla AC-GAN was trained using the 6,000 participant training set providing a simulated training set (labeled non-private).” [pg. 10, Transfer Learning Task; non-private training dataset corresponds to a “negative train dataset”]).

Regarding claim 8, Beaulieu-Jones/Choi teaches The method of claim 1, where Choi further teaches wherein the lowering the re-identification risk associated with the synthetic message dataset involves a presence disclosure test that compares synthetic data records with real data records using the hamming distance thresholds (“Figures 11a and 11b depict the sensitivity (i.e. recall) and the precision of the presence disclosure test when varying the number of real patient the attacker knows. In this setting, x% sensitivity means the attacker has successfully discovered that x% of the records that he/she already knows were used to train medGAN. Similarly, x% precision means, when an attacker claims that a certain number of patients were used for training medGAN, only x% of them were actually used. Figure 11a shows that with low threshold of hamming distance (e.g. hamming distance of 0) attacker can only discover 10% percent of the known patients to attacker were used to train medGAN. Figure 11b shows that, the precision is mostly 50% except when the number of known patients are small. This indicates that the attacker’s knowledge is basically useless for presence disclosure attack unless the attacker is focusing on a small number of patients (less than a hundred), in which case the precision is approximately 80%.” [pg. 20, § Impact of attacker’s knowledge]).
Beaulieu-Jones and Choi are both in the same field of endeavor of training GANs using synthetic medical data. Beaulieu-Jones discloses a method of training GANs to support clinical data sharing. Choi discloses generating discrete patient records using GANs. It would have been obvious to one of ordinary skill in the art to modify Beaulieu-Jones’ teachings by implementing hamming distance thresholds to compare real and synthetic datasets as taught by Choi. One would have been motivated to make this modification in order to protect patient privacy data. [pg. 1, § Introduction, ¶1, Choi]

Regarding claim 9, Beaulieu-Jones/Choi teaches The method of claim 8, where Choi further comprising determining a degree of the re-identification risk based on the hamming distance thresholds (“Figure 11a shows that with low threshold of hamming distance (e.g. hamming distance of 0) attacker can only discover 10% percent of the known patients to attacker were used to train medGAN. Figure 11b shows that, the precision is mostly 50% except when the number of known patients are small. This indicates that the attacker’s knowledge is basically useless for presence disclosure attack unless the attacker is focusing on a small number of patients (less than a hundred), in which case the precision is approximately 80%.” [pg. 20, § Impact of attacker’s knowledge]).
Beaulieu-Jones and Choi are both in the same field of endeavor of training GANs using synthetic medical data. Beaulieu-Jones discloses a method of training GANs to support clinical data sharing. Choi discloses generating discrete patient records using GANs. It would have been obvious to one of ordinary skill in the art to modify Beaulieu-Jones’ teachings by implementing hamming distance thresholds to compare real and synthetic datasets as taught by Choi. One would have been motivated to make this modification in order to protect patient privacy data. [pg. 1, § Introduction, ¶1, Choi]

Regarding claim 17, Beaulieu-Jones teaches The method of claim 10, however fails to explicitly teach wherein lowering the re-identification risk associated with the synthetic message dataset comprises using a hamming distance between the synthetic data record and the real data record.
Choi teaches wherein lowering the re-identification risk associated with the synthetic message dataset comprises using a hamming distance between the synthetic data record and the real data record (“We assume the attacker has complete knowledge on those 2r records. Then for each record, we calculate its hamming distance to each sample from the synthetic dataset S ∈ {0, 1} N×|C|. If there is at least one synthetic sample within a certain distance, we treat that as its claimed match. Now, since we sample from both R and T, the match could be a true positive (i.e., attacker correctly claims their targeted record is in the GAN training set), false positive (i.e., attacker incorrectly claims their targeted record is in the GAN training set), true negative (i.e., attacker correctly claims their targeted record is not in the GAN training set” [pg. 20, top para]).
Beaulieu-Jones and Choi are both in the same field of endeavor of training GANs using synthetic medical data. Beaulieu-Jones discloses a method of training GANs to support clinical data sharing. Choi discloses generating discrete patient records using GANs. It would have been obvious to one of ordinary skill in the art to modify Beaulieu-Jones’ teachings by implementing hamming distance thresholds to compare real and synthetic datasets as taught by Choi. One would have been motivated to make this modification in order to protect patient privacy data. [pg. 1, § Introduction, ¶1, Choi]

Regarding claim 18, Beaulieu-Jones/Choi teaches The method of claim 17, further comprising determining a degree of the re-identification risk based on the hamming distance (“Figure 11a shows that with low threshold of hamming distance (e.g. hamming distance of 0) attacker can only discover 10% percent of the known patients to attacker were used to train medGAN. Figure 11b shows that, the precision is mostly 50% except when the number of known patients are small. This indicates that the attacker’s knowledge is basically useless for presence disclosure attack unless the attacker is focusing on a small number of patients (less than a hundred), in which case the precision is approximately 80%.” [pg. 20, § Impact of attacker’s knowledge]).
Beaulieu-Jones and Choi are both in the same field of endeavor of training GANs using synthetic medical data. Beaulieu-Jones discloses a method of training GANs to support clinical data sharing. Choi discloses generating discrete patient records using GANs. It would have been obvious to one of ordinary skill in the art to modify Beaulieu-Jones’ teachings by implementing hamming distance thresholds to compare real and synthetic datasets as taught by Choi. One would have been motivated to make this modification in order to protect patient privacy data. [pg. 1, § Introduction, ¶1, Choi]

Claims 6 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Beaulieu-Jones in view of Choi and further in view of Kliger et al. ("Novelty Detection with GAN", hereinafter "Kliger").

Regarding claim 6, Beaulieu-Jones/Choi teaches The method of claim 4, however fails to explicitly teach further comprising generating a negative holdout dataset using the two adversarial networks, wherein the negative holdout dataset excludes the negative train dataset.
Kliger teaches further comprising generating a negative holdout dataset using the two adversarial networks, wherein the negative holdout dataset excludes the negative train dataset (“In the holdout experiment, we train an SSL-GAN model and multi-class classifier with the same architecture as discriminator for each of the ten digits. Every model is trained on nine out of the ten classes. During testing, we compute the novelty scores for all of the classes, including the holdout class which is considered to be novel (negative). For the holdout class testing is performed on both the train and test examples in order to balance the number of nominal and novel examples” [pg. 7, ¶2; “novel examples” corresponds to a “negative holdout dataset”]).
Beaulieu-Jones, Choi, and Kliger are all in the same field of endeavor of training GANs. Beaulieu-Jones discloses a method of training GANs to support clinical data sharing. Choi discloses generating discrete patient records using GANs. Kliger discloses novelty detection using GANs It would have been obvious to one of ordinary skill in the art to modify Beaulieu-Jones’/Choi’s teachings by including a negative holdout dataset as taught by Kliger. One would have been motivated to make this modification in order to train a classifier to detect unknown inputs. [Abstract, Kliger]

Regarding claim 7, Beaulieu-Jones/Choi/Kliger teaches The method of claim 6, wherein the synthetic message dataset is used to train a classification model, wherein the classification model is tested using the negative holdout datasets (“In the holdout experiment, we train an SSL-GAN model and multi-class classifier with the same architecture as discriminator for each of the ten digits. Every model is trained on nine out of the ten classes. During testing, we compute the novelty scores for all of the classes, including the holdout class which is considered to be novel (negative). For the holdout class testing is performed on both the train and test examples in order to balance the number of nominal and novel examples” [pg. 7, ¶2; “novel examples” corresponds to a “negative holdout dataset”]).
Beaulieu-Jones, Choi, and Kliger are all in the same field of endeavor of training GANs. Beaulieu-Jones discloses a method of training GANs to support clinical data sharing. Choi discloses generating discrete patient records using GANs. Kliger discloses novelty detection using GANs It would have been obvious to one of ordinary skill in the art to modify Beaulieu-Jones’/Choi’s teachings by including a negative holdout dataset as taught by Kliger. One would have been motivated to make this modification in order to train a classifier to detect unknown inputs. [Abstract, Kliger]

Claims 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Beaulieu-Jones in view of Kliger.

Regarding claim 15, Beaulieu-Jones teaches The method of claim 14, however fails to explicitly teach wherein generating the second message comprises generating a negative holdout dataset of the first message dataset that excludes the negative train dataset
Kliger teaches wherein generating the second message comprises generating a negative holdout dataset of the first message dataset that excludes the negative train dataset (“In the holdout experiment, we train an SSL-GAN model and multi-class classifier with the same architecture as discriminator for each of the ten digits. Every model is trained on nine out of the ten classes. During testing, we compute the novelty scores for all of the classes, including the holdout class which is considered to be novel (negative). For the holdout class testing is performed on both the train and test examples in order to balance the number of nominal and novel examples” [pg. 7, ¶2; “novel examples” corresponds to a “negative holdout dataset”]).
Beaulieu-Jones and Kliger are both in the same field of endeavor of training GANs. Beaulieu-Jones discloses a method of training GANs to support clinical data sharing. Kliger discloses novelty detection using GANs It would have been obvious to one of ordinary skill in the art to modify Beaulieu-Jones’ teachings by including a negative holdout dataset as taught by Kliger. One would have been motivated to make this modification in order to train a classifier to detect unknown inputs. [Abstract, Kliger]

Regarding claim 16, Beaulieu-Jones teaches The method of claim 14, however fails to explicitly teach wherein generating the synthetic message dataset comprises generating a negative model based on the negative train dataset.
Kliger teaches wherein generating the synthetic message dataset comprises generating a negative model based on the negative train dataset (“In the holdout experiment, we train an SSL-GAN model and multi-class classifier with the same architecture as discriminator for each of the ten digits. Every model is trained on nine out of the ten classes. During testing, we compute the novelty scores for all of the classes, including the holdout class which is considered to be novel (negative). For the holdout class testing is performed on both the train and test examples in order to balance the number of nominal and novel examples” [pg. 7, ¶2; “novel examples” corresponds to a “negative holdout dataset”. The multi-class classifier trained on these novel examples corresponds to a “negative model”])
Beaulieu-Jones and Kliger are both in the same field of endeavor of training GANs. Beaulieu-Jones discloses a method of training GANs to support clinical data sharing. Kliger discloses novelty detection using GANs It would have been obvious to one of ordinary skill in the art to modify Beaulieu-Jones’ teachings by including a negative holdout dataset as taught by Kliger. One would have been motivated to make this modification in order to train a classifier to detect unknown inputs. [Abstract, Kliger]

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Shokri et al. (“Membership Inference Attacks Against Machine Learning Models”) discloses membership inference attacks, similarly to the presence disclosure assessment. 
Hou et al. (“Generative Adversarial Positive-Unlabelled Learning”) discloses training with positive/negative examples with GANs
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/M.H.H./Examiner, Art Unit 2122  

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122