DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/06/2020 is being considered by the examiner.
Priority
Applicant’s claim for the benefit of a prior-filed U.S. Provisional Application No. 62/694,968, filed July 6, 2018 is acknowledged.
 
Drawings
The drawings were received on 10/04/2018.  These drawings are acceptable
Response to Arguments
Applicant’s arguments, pgs. 11-16 of applicant remarks filed 09/01/2021 have been fully considered. Below are the examiner’s response.

Applicant’s arguments, pgs. 14-17 of applicant remarks, with respect to the 35 USC § 103 rejections, have been fully considered and were unpersuasive. 
Applicant argues, the cited prior art fails to disclose the amended claim features, in remarks Pgs. 13-14:
Szeto fails to disclose or suggest, for example, "normalizing, by a dataset generator, the reference dataset, the normalizing comprising; identifying categorical data within the reference dataset and converting categorical data to numerical values." Instead, Szeto teaches normalizing data by arranging zip codes into a histogram to form a zip code probability distribution. Szeto at [0087]. Szeto further teaches normalizing data when calculating a similarity score. Id. at [0095]-[0096]. In normalizing data when calculating a similarity score, Szeto describes analyzing "differences among the model parameters" and then calculating "the sum of the differences or the sum of the squares of the differences" and then "normaliz[ing] or "weigh[ing]" the differences "so that each difference contributes equally or according to their importance." Id. Constructing a histogram of zip codes and calculating a weighted difference of a value that can be added or squared are not "normalizing ... the reference dataset, the normalizing comprising; identifying categorical data within the reference dataset, and converting categorical data to numerical values" as recited by amended claim 1 because arranging zip codes and calculating a weighted difference involve manipulation of existing numerical values.

The relevant claim limitations appear to be “normalizing, by a dataset generator, the reference dataset, normalizing comprising: identifying categorical data within the reference dataset,  and converting categorical data to numerical values;”

As noted in office action Szeto et al. (US Pat. Pub. No. 2018/0018590, hereinafter Szeto) teaches:
in 0018: … More specifically, the modeling engine is able to receive model instructions from one or more remote computing devices over a network. The model instructions can be considered as one or more command that instruct the modeling engine to use at least some of the local private data in order to create a trained actual model according to an implementation of a machine learning algorithm (e.g., sup­port vector machine, neural network, decision tree, random forest, deep learning neural network, etc.). The modeling engine creates the trained actual model as a function of the local private data [claimed identification of categorical data] (i.e., a selected or filtered training data set) after any required preprocessing requirements, if any, have been met ( e.g., filtering, validating, normalizing [claimed normalizing of reference training dataset], etc.) …The modeling engine uses the private data distribu­tions to generate a set of proxy data, which can be consid­ered synthetic data or Monte Carlo data having the same general data distribution characteristics as the local private data [claimed identifying categorical data within the reference dataset and converting categorical data to numerical values represented as Monte Carlo data], while also lacking the actual private or restricted features of the local, private data. In some cases, Monte Carlo simulations generate deterministic sets of proxy data, by using a seed for a pseudo random number generator. A source for truly random seeds includes those provided by random.org (see URL www.random.org). Private or restricted features of the local private data include, but are not limited to, social security numbers, patient names, addresses or any other personally identifying information [claimed identified categorical data in a reference training data set], especially information protected under the HIPAA Act…)

In other words Szeto teaches the identification of confidential information, that is considered categorical data,  from the proxy training dataset subset as local proxy data captured per HIPPA regulations. The identified proxy data is used as part of training data as converted data using Monte Carlo data, that is a numerical representation of the proxy data characteristics as the claimed categorical private data can be converted to a numerical value to represent similar characteristics of the original data set for generating the claimed  categorical data within the reference dataset, therefore Szeto teaches the claim limitation as claimed, See full rejection.

Second, applicant argues, 
Szeto's discussions of a "similarity score" fail to teach or suggest "receiving, by the dataset generator, a similarity criterion, the similarity criterion including a predetermined difference in value between the normalized dataset and an output dataset of the data model." Szeto discusses training a data model with proxy data and calculating a "model similarity score." Szeto at [103]-[0104]. The model similarity score identifies whether the "accuracy of the predictions" of the private data and the proxy data is "sufficiently high." Id. at 104. Calculating whether "accuracy of the predictions" of the private data and the proxy data is "sufficiently high" is not a "a similarity criterion, the similarity criterion including a predetermined difference in value between the normalized dataset and an output dataset of the data model."


 Szeto teaches, as noted in the current office action:
 in [0050]:  As proxy data 260 is generated and relayed to the global model server 130, the global model server aggregates the data and generates an updated global model. Once the global model is updated, it can be determined whether the updated global model is an improvement over the previous version of the global model. If the updated global model is an improvement (e.g., the predictive accuracy is improved), new parameters may be provided to the private data servers via the updated model instructions 230. At the private data server 124, the performance [claimed evaluation of similarity metric] of the trained actual model (e.g., whether the model improves or worsens) can be evaluated to determine whether the models instructions provided by the updated global model result in an improved trained actual model [claimed evaluation of the similarity metric meets claimed similarity metric]; And in 0104: …If the accuracy of the predictions from the trained proxy model on the actual private data training set is sufficiently high (e.g., within 10%, 5%, 1 %, or closer) [claimed receiving, by the dataset generator, a similarity criterion, the similarity criterion including a predetermined difference in value between the normalized dataset and an output dataset of the data model], then the trained proxy model could be considered similar to the trained actual model… Further, if the similarity score fails to satisfy simi­larity criteria (e.g., falls below a threshold, etc.), then the modeling engine can repeat operations 540 through 560. )

In other words the learning model using a similarity  criterion by determining an accuracy percentage including the difference between the proxy data and an output of the data model as the trained proxy model having reduce difference in accuracy within the predetermined percentage range, as the claimed similarity criterion. New art has also been provided to teach the use of generative neural network that learn using converted categorical data, see full rejection in the current office action.
Lastly, applicant argues, the depend claims 6-7, 18-19, 8-10 and independent claims 11 and  20 are allowable in light of the deficiencies noted above in independent claim limitations. The examiners respectfully disagrees, and notes that applicant’s arguments are unpersuasive. Furthermore, no deficiencies were determined as alleged by the arguments and analyzed by the examiner above.
For at least these reasons, the rejection(s) under 35 U.S.C. § 103 are maintained.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-5, 11-12, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over by Szeto et al. (US Pat. Pub. No. 2018/0018590, hereinafter Szeto) in view of Wu et al. (US Pub No. 2019/0122120, hereinafter ‘Wu’) and in further view of Choi et al. (NPL: “Generating Multi-label Discrete Patient Records using Generative Adversarial Networks”, hereinafter ‘Choi’).

Regarding independent claim 1 limitations, Szeto teaches a cloud computing system for generating data models, comprising:
at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor cause the cloud computing system to perform operations comprising: (Szeto teaches a cloud computing system to perform machine learning and information processing operations in a computing server-based environment including one or more processors for executing system functions, in 0018: One aspect of the inventive subject matter includes a distributed machine learning system...; and 0027: …One of ordinary skill in the art should appreciate that the computing devices comprise one or more processors configured to execute software instructions that are stored on a tangible, non-transitory computer readable storage medium (e.g., hard drive, FPGA, PLA, PLD, solid state drive, RAM, flash, ROM, external drive, memory stick, etc.)…)
provisioning, by a model optimizer, computing resources with a data model; (Szeto teaches provisioning computing resources to generate one or more machine learning models as provisioned by the implementation of a machine learning algorithm,  in [0018]-[0019]: One aspect of the inventive subject matter includes a distributed machine learning system. In some embodi­ments, the distributed machine learning system has a plu­rality of private data servers, possibly operating as peers in a distributed computing environment…One embodiment of a method includes a private data server receiving model instructions [claimed provisioning process] to create a trained actual model based on at least some local, private data. The model instructions [claimed model optimizer], for example, can include a request to build the trained actual model from an implementation of a machine learning algo­rithm… & [0078] & [0081], that is that is considered computing instructions executed by the model optimizer as the instructions executed in a computing server-based environment, in [0028].)
retrieving, by the model optimizer, a reference dataset; (claimed reference data, as actual data, in 0090-0091: … For example, the modeling engine can use a genetic algorithm to alter the values of proxy data 360 until a suitable similar trained proxy model emerges using the similarity score as a fitness function, or using differences between the actual data's covariance matrix and proxy data's 360 covariance matrix to ensure proxy data 360 retains the same or similar shape as the actual data [reference data]… FIG. 4 illustrates possible techniques for calculat­ing similarity score 490 between two trained models; trained actual model 440 and trained proxy model 470 in this example….)
normalizing, by a dataset generator, the reference dataset, normalizing comprising: identifying categorical data within the reference dataset,  (in 0018: … More specifically, the modeling engine is able to receive model instructions from one or more remote computing devices over a network. The model instructions can be considered as one or more command that instruct the modeling engine to use at least some of the local private data in order to create a trained actual model according to an implementation of a machine learning algorithm (e.g., sup­port vector machine, neural network, decision tree, random forest, deep learning neural network, etc.). The modeling engine creates the trained actual model as a function of the local private data [claimed identification of categorical data] (i.e., a selected or filtered training data set) after any required preprocessing requirements, if any, have been met ( e.g., filtering, validating, normalizing [claimed normalizing of reference training dataset], etc.)) 
and converting categorical data to numerical values; (in 0018: …The modeling engine uses the private data distribu­tions to generate a set of proxy data, which can be consid­ered synthetic data or Monte Carlo data having the same general data distribution characteristics as the local private data [claimed identifying categorical data within the reference dataset and converting categorical data to numerical values represented as Monte Carlo data], while also lacking the actual private or restricted features of the local, private data. In some cases, Monte Carlo simulations generate deterministic sets of proxy data, by using a seed for a pseudo random number generator. A source for truly random seeds includes those provided by random.org (see URL www.random.org). Private or restricted features of the local private data include, but are not limited to, social security numbers, patient names, addresses or any other personally identifying information [claimed identified categorical data in a reference training data set], especially information protected under the HIPAA Act…)
 receiving, by the dataset generator, a similarity criterion, the similarity criterion including a predetermined difference in value between the normalized dataset and an output dataset of the data model; ( in [0050]:  As proxy data 260 is generated and relayed to the global model server 130, the global model server aggregates the data and generates an updated global model. Once the global model is updated, it can be determined whether the updated global model is an improvement over the previous version of the global model. If the updated global model is an improvement (e.g., the predictive accuracy is improved), new parameters may be provided to the private data servers via the updated model instructions 230. At the private data server 124, the performance [claimed evaluation of similarity metric] of the trained actual model (e.g., whether the model improves or worsens) can be evaluated to determine whether the models instructions provided by the updated global model result in an improved trained actual model [claimed evaluation of the similarity metric meets claimed similarity metric]; And in 0104: …If the accuracy of the predictions from the trained proxy model on the actual private data training set is sufficiently high (e.g., within 10%, 5%, 1 %, or closer) [claimed receiving, by the dataset generator, a similarity criterion, the similarity criterion including a predetermined difference in value between the normalized dataset and an output dataset of the data model], then the trained proxy model could be considered similar to the trained actual model… Further, if the similarity score fails to satisfy simi­larity criteria (e.g., falls below a threshold, etc.), then the modeling engine can repeat operations 540 through 560. )
generating, by the dataset generator, a synthetic dataset for training the data model; (Szeto teaches generating synthetic dataset as proxy data from actual data used to train as part of the machine learning algorithm, in 0051-0052: …a private data server 124 may receive proxy related information (including for example proxy data 260, proxy data distributions 362, proxy model parameters 475, other proxy related data combined with seeds, etc.) from a peer private data server (a different private data server 124)… the information (e.g., machine learning models [includes claimed data model] including trained proxy models, trained actual models, private data distributions, synthetic/ proxy data distributions [claimed generated synthetic dataset for training the claimed data model], actual model parameters, proxy model parameters, similarity scores or any other information generated as part of the machine learning process, etc.)….; And for training the data model as a model trained using the machine learning algorithm and generated using the claimed training data set, in 0051-0052; And that make machine learning base inferences regarding the actual data as predictions, in [0108]-[0111], by a dataset generator as the instructions executed in the computing environment for performing the recited functions, in [0028].)
training, by the computing resources, the data model using the synthetic dataset,  (Szeto teaches generating synthetic dataset as proxy data from actual data that is used to train as part of the machine learning process, in [0051]-[0052], for training the data model as an updated global model, in [0048] & [0050], by the computing resources of the server based environment, in [0042], using instructions executed in the computing environment for performing the recited functions, in [0028].)
the training comprising: generating an output dataset using the data model; generating, based on a comparison of the output data and the normalized reference data, similarity metric of the data model,-2-Application No. 16/151,407Attorney Docket No. 05793.3765-00000 (in 0090: …Still another point of interest is that proxy data [claimed generated output dataset] 360 can be generated iteratively until it has the desired characteristics; acceptable proxy data distribu­tion [claimed the normalized reference data] 362 characteristics, acceptable similar models, or other factors [claimed similarity metric based on claimed comparison]. For example, the modeling engine can use a genetic algorithm to alter the values of proxy data 360 until a suitable similar trained proxy model emerges using the similarity score as a fitness function, or using differences between the actual data's covariance matrix and proxy data's 360 covariance matrix to ensure proxy data 360 retains the same or similar shape as the actual data [claimed comparison to claimed normalized reference data]…)
generating a prediction metric of the data model, evaluating the similarity metric against a similarity criterion, evaluating the prediction metric against a prediction criterion, and updating the data model based on the evaluations of the similarity metric and prediction metric; (Szeto teaches updating the machine learning algorithm as part of the iterative process for by evaluating a predication metric, as the prediction accuracy, that is updated and evaluated for each iteration, to update data model by training the proximity and actual model parameters as depicted in Fig. 6, in [0050]:  As proxy data 260 is generated [claimed evaluation of claimed similarity metric against claimed similarity criterion] and relayed to the global model server 130, the global model server aggregates the data and generates an updated global model. Once the global model is updated, it can be determined whether the updated global model is an improvement over the previous version of the global model. If the updated global model is an improvement (e.g., the predictive accuracy [claimed prediction metric] is improved [claimed prediction criterion]), new parameters may be provided to the private data servers via the updated model instructions 230. At the private data server 124, the performance [claimed evaluation processing using the prediction metric of the model] of the trained actual model (e.g., whether the model improves or worsens) can be evaluated to determine whether the models instructions provided by the updated global model result in an improved trained actual model…; And in 0102-0104: … Each sample of the proxy data can be compared to samples from the training data to identify if proxy samples are too similar to original actual samples [claimed evaluation of claimed similarity metric against claimed similarity criterion]… In addition to using the proxy and actual model parameters, the modeling engine can also use other factors available in calculating the similarity score. Example additional factors can include accuracies of the model, cross fold validation, accuracy gain, sensitivities, specificities, distributions of the pairwise com­parisons (e.g., average value, distributions about zero, etc.)… If the accuracy of the predictions [claimed generated prediction metric] from the trained proxy model on the actual private data training set is sufficiently high (e.g., within 10%, 5%, 1 %, or closer) [claimed generated prediction metric evaluated against claimed prediction criterion], then the trained proxy model could be considered similar to the trained actual model…)
repeating the training and the refining until the similarity criterion and prediction criterion is met by the similarity metric and the prediction metric; [claimed evaluation of the similarity metric meets claimed similarity metric]. (Szeto teaches updating the machine learning algorithm as part of the iterative process for updating the global model by evaluating a predication metric, as the prediction accuracy, that is updated and evaluated for each iteration, to update data model by training the proximity and actual model parameters as depicted in Fig. 6, in [0050]:  As proxy data 260 is generated and relayed to the global model server 130, the global model server aggregates the data and generates an updated global model [claimed repeating and refining process for generating updated models when claimed criteria are met]. Once the global model is updated, it can be determined whether the updated global model is an improvement over the previous version of the global model. If the updated global model is an improvement (e.g., the predictive accuracy is improved) [claimed evaluation processing using the prediction metric of the model], new parameters may be provided to the private data servers via the updated model instructions 230. At the private data server 124, the performance [claimed evaluation of similarity metric] of the trained actual model (e.g., whether the model improves or worsens) can be evaluated to determine whether the models instructions provided by the updated global model result in an improved trained actual model [claimed evaluation of the similarity metric meets claimed similarity metric]; And in 0104: …If the accuracy of the predictions [claimed generated prediction metric] from the trained proxy model on the actual private data training set is sufficiently high (e.g., within 10%, 5%, 1 %, or closer) [claimed generated prediction metric evaluated against claimed prediction criterion], then the trained proxy model could be considered similar to the trained actual model… Further, if the similarity score fails to satisfy simi­larity criteria (e.g., falls below a threshold, etc.), then the modeling engine can repeat operations 540 through 560 [claimed repeating process]. )
 in response to meeting the similarity criterion and prediction criterion  by the similarity metric and the prediction metric, storing, by the model optimizer in a model storage, (in 0050: … If the updated global model is an improvement ( e.g., the predictive accuracy is improved), new parameters may be provided to the private data servers via the updated model instructions 230…; including model meta data when training including claimed meta data and the trained data model, in 0045: …The new knowledge [claimed response process for storing claimed model data and meta data] can then be aggregated into a trained global model via global modeling engine 136. Examples of knowl­edge include (see, e.g., FIG. 2) but are not limited to proxy data 260, trained actual models 240, trained proxy models 270, proxy model parameters, model similarity scores, or other types of data that have been de-identified. In some embodiments, the global model server 130 analyzes sets of proxy related information (including for example proxy data 260, proxy data distributions 362, proxy model parameters 475, other proxy related data combined with seeds, etc.) to determine whether the proxy related information from one of private data server 124 has the same shape and/or overall properties as the proxy related data from another private data server 124, prior to combining such information) 
receiving production data from a data source by a production instance; and (Szeto teaches receiving new data, considered production data, to incorporate into the machine learning models, in [0048]; using a transmission protocol, in [0114].)
processing the production data using the data model; (Szeto teaches that new data can be processed by the global model as it becomes available, in [0107].)
Examiner notes that the claimed engine recited to perform the claimed functions are disclosed as computer instructions executed by a processor(s), in Szeto 0027: … Further, the disclosed technologies can be embodied as a computer program prod­uct that includes a tangible, non-transitory computer read­able medium storing the software instructions executable by a processor to perform the disclosed steps or operations associated with implementations of computer-based algo­rithms, processes, methods, or other instructions. In some embodiments, the various servers, systems, databases, or interfaces exchange data using standardized protocols or algorithms,…
While the Szeto teaches the use of machine learning algorithms for training machine learning models iteratively in a sever distributed computing environment using machine learning model as discussed above. Szeto does not expressly teach the use of a generative adversarial model, that is within the scope of applicant’s recited claim limitations: Alternatively, Wu teaches claim 1 limitations:
training, …, the data model (Generative adversarial network (GAN)) using the synthetic dataset (Fig. 1: Xgen), the training comprising: generating an output dataset (Fig. 1: Xtrain-Xunl) using the data model: generating, based on a comparison of the output data and the reference data (Fig. 1: Xtrain-Xlab), a similarity metric (discriminator output value) of the data model, (Wu teaches training a data model as a set of machine learning models as a data model Generative adversarial network (GAN) model, in using computing resources. In [0015]; where the GAN data model is trained using synthetic data in combination with labelled training data to train; And similarity metric, in 0025:… the discriminator is expected to discern real samples from generated samples by giving the output of 1 or O respectively. In the GAN training process, the generator and discriminator are used to generate samples and classify them respectively to improve the performance of each other in an adversarial manner…; And the data model, in [0027]: This disclosure describes a self-training method and system for semi-supervised GANs. In example embodiments, a first neural network (generally referred to hereinafter as a generator) is used to generate synthetic data (referred to herein as generated samples). A second neural network (generally referred to hereinafter as a discriminator) is configured to receive as inputs a set of training data (referred to hereinafter as a training dataset). The training dataset includes a set of labelled training data (referred to hereinafter as a labelled training dataset) comprising labelled training data (referred to hereinafter as labelled samples), a larger set of unlabeled training data (referred to herein as an unlabeled training dataset) comprising unlabeled training data (referred to hereinafter as unlabeled samples), and the generated samples…

    PNG
    media_image1.png
    336
    607
    media_image1.png
    Greyscale

)
generating a prediction metric (performance of the generator and discriminator to each other) of the data model, evaluating the similarity metric against a similarity criterion (discerning the output class as 0 or 1 by the discriminator criterion), -2-Application No. 16/151,407 Attorney Docket No. 05793.3765-00000 evaluating the prediction metric against a prediction criterion (performance metric using generator and discriminator to each other), (in 0025-0029: Typically, generative adversarial networks (GANs) include two separate deep neural networks: a first neural network (generally referred to in the art as a generator) and a second neural network (generally referred to in the art as a discriminator)… the discriminator is expected to discern real samples from generated samples by giving the output of 1 or O respectively. In the GAN training process, the generator and discriminator are used to generate samples and classify them respectively to improve the performance of each other in an adversarial manner…; And the claimed performance metric in 0046: …the generator G 102 and discriminator D 104 are trained using the current training dataset Xtrain(J) until a validation error [claimed prediction metric] for a validation dataset stops decreasing [claimed prediction criterion], as shown by blocks 210 to 221. In particular an adversarial game is played for improving the discrimination and classification performance by the dis­criminator D 104 and the data generation performance by generator G 102 simultaneously )
and updating the data model based on the evaluations of the similarity metric and prediction metric (as depicted in Fig. 1); repeating the training until the similarity criterion and prediction criterion are met by the similarity metric and the prediction metric (as depicted in Fig. 1); (in 0046: In example embodiments, the training phase 208 is an iterative phase during which the generator G 102 and discriminator D 104 are trained using the current training dataset Xtrain(J) until a validation error for a validation dataset stops decreasing, as shown by blocks 210 to 221. In particular an adversarial game is played for improving the discrimination and classification performance by the dis­criminator D [claimed evaluation of similarity metric based on evaluation claimed criterion] 104 and the data generation performance by generator G 102 [claimed evaluation of prediction metric based on evaluation claimed criterion] simultaneously…)
in response to meeting the similarity criterion and prediction criterion by the similarity metric and the prediction metric, storing, by the model optimizer in a model storage, the data model and metadata of the data model, wherein the metadata of the data model comprises at least the similarity metric and the prediction metric; (claimed storing process as the self training/semi-supervised outcomes, in 0048-0049: As indicated in block 221, a determination is then made whether the validation error on a predetermined vali­dation dataset has stopped decreasing. As known in the art, the validation dataset is a predetermined dataset that is used to determine when training of the discriminator D 104 has reached a level where the validation error reaches its mini­mal level. If the validation error is still decreasing then the GAN 100 has still not been optimally trained using the current training dataset Xtrain(J)· Thus, if the validation error has not yet stopped decreasing, the training phase 208 enters another iteration using the same training dataset Xtrain(J) with an additional set of generated samples xgen' and the actions described above in respect of blocks 210 to 221 are repeated… When the validation testing of block 221 [claimed storing meta data by claimed model optimizer] indicates that the error on the validation dataset has stopped decreas­ing [claimed process in response to meeting claimed criterion] in respect the current training dataset Xtrain(J), an assumption is made that the GAN 100 has been optimally trained in respect of the current training dataset Xtrain(J) and the current training phase 208 is concluded…)
The Szeto and Wu references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose of developing information processing methods in automated computing environments.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method for iterative training of machining learning models using generative adversarial network models as disclosed by Wu with the method of training an inference model from synthetic and actual data sets using machine learning algorithms and models as disclosed by Szeto.

	While the combination of Szeto and Wu disclose the process for information retrieval and data processing using machine learning models. Szeto and Wu do not expressly teach to use of converted categorical variables as inputs for learning using a GAN machine learning model.
	Choi does expressly teach the use of converted categorical variables as inputs for learning using a GAN machine learning model. (Pg. 3, Sec 3.1: We assume there are |C| discrete variables (e.g., diagnosis, medication or procedure codes) in the EHR data that can be expressed as a fixed-size vector x ∈ Z+|C|, where the value of the ith dimension indicates the number of occurrences (i.e., counts) of the i-th variable in the patient record. In addition to the count variables, a visit can also be represented as a binary vector x ∈ {0, 1}|C|, where the ith dimension indicates the absence or occurrence of the ith variable in the patient record. It should be noted that we can also represent demographic information, such as age and gender, as count and binary variables [claimed normalizing, …, the reference dataset, normalizing comprising: identifying categorical data within the reference dataset and converting categorical data to numerical values], respectively…; And in Pgs. 4-5, Sec. 3.3: In this work, We apply the autoencoder to learn the salient features of discrete variables that can be applied to decode the continuous output of G. This allows the gradient flow from D to the decoder Dec to enable the end-to-end fine-tuning. As depicted by Figure 1, an autoencoder consists of an encoder Enc(x; θenc)
that compresses the input x ∈ Z+|C| to Enc(x) ∈ Rh, and a decoder Dec(Enc(x); θdec) that decompresses Enc(x) to Dec(Enc(x)) as the reconstruction of the original input x… For binary variables, we use tanh activation for Enc and the sigmoid activation for Dec. With the pre-trained autoencoder, we can allow GAN to generate distributed representation of patient records (i.e., the output of the encoder Enc) [claimed normalizing, …, the reference dataset, normalizing comprising: identifying categorical data within the reference dataset and converting categorical data to numerical values ], rather than generating patient records directly. Then the pre-trained decoder Dec can pick up the right signals from G(z) to convert it to the patient record Dec(G(z)). The discriminator D is trained to determine whether the given input is a synthetic sample Dec(G(z)) or a real sample x…)
	Additionally, Choi teaches claim 1 limitation: receiving, by the dataset generator, a similarity criterion, the similarity criterion including a predetermined difference in value between the normalized dataset and an output dataset of the data model. (claimed predefined difference as the reconstruction error, as the predetermined difference to compare over the learning process, for receiving and evaluating the reconstruction error to a minimize value, as claimed similarity criterion including difference, in Pg. 4 Sec. 3.3: 
    PNG
    media_image2.png
    494
    1010
    media_image2.png
    Greyscale
)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method for iterative training of machining learning models using generative adversarial network models  and encoded categorical variables as disclosed by Choi with the method of training an inference model from synthetic and actual data sets using machine learning algorithms and models as collectively disclosed by Szeto and Wu.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods in Szeto, Wu, Choi in order to develop machine learning mechanisms for generating and using binary representations of categorical variables for developing training data that help preserve the privacy of patient records (Choi, Sec. 3.6 & Abstract); Doing so will help develop a mapping between the generated data from and the training data of specific patients that is not explicit that helps to better preserve the privacy of patients (Choi, Sec. 3.6).

Regarding claim 2, the rejection of claim 1 is incorporated and Szeto in combination with Wu in view of Choi further teaches the cloud computing system of claim 1,
wherein the data model is provisioned in response to a model generation request received by the model optimizer from an interface, wherein the generation request comprises at least data describing a type of the data model to be generated. (Szeto teaches the request from a user interface server that includes information regarding the type of data classification model as a system input for the machine learning system, in [0108]-[0111], and information about the learning model type as input instructions for the type to be trained, in [0018]-[0019].)

Regarding claim 3, the rejection of claim 1 is incorporated and Szeto in combination with Wu in view of Choi further teaches the cloud computing system of claim 1,
wherein the operations further comprise: extracting, by the model optimizer, from the metadata of the data model, the similarity metric and the prediction metric, evaluating, by the model optimizer, the similarity metric of the data model; evaluating, by the model optimizer, the prediction metric of the data model; and  (in 0050: … If the updated global model is an improvement ( e.g., the predictive accuracy is improved), new parameters may be provided to the private data servers via the updated model instructions 230…; including model meta data when training including claimed meta data and the trained data model, in 0045: …The new knowledge can then be aggregated [claimed retrieving of meta data of the model] into a trained global model via global modeling engine 136.. Examples of knowl­edge include (see, e.g., FIG. 2) but are not limited to proxy data 260, trained actual models 240, trained proxy models 270, proxy model parameters, model similarity scores, or other types of data that have been de-identified. In some embodiments, the global model server 130 analyzes sets of proxy related information (including for example proxy data 260, proxy data distributions 362, proxy model parameters 475, other proxy related data combined with seeds, etc.) to determine whether the proxy related information from one of private data server 124 has the same shape and/or overall properties as the proxy related data from another private data server 124, prior to combining such information [claimed retrieving of meta data of the model]) 
determining, by a model curator, that the data model satisfies governance criteria. (Szeto teaches the determining the global model satisfies the criteria of being an improvement that is considered governance criteria by the global model server, considered a model curator, in [0050]; Szeto also teaches the global model satisfies a governance criterion for aggregating new knowledge by the global modeling engine considered a model curator, in [0045].)

Regarding claim 4, the rejection of claim 1 is incorporated and Szeto in combination with Wu in view of Choi further teaches the cloud computing system of claim 1,
wherein the similarity metric comprises at least one of a statistical correlation score, data similarity score, or data quality score, (Szeto teaches storing a similarity score (similarity metric) and requirements for training machine learning models using the modeling engine that comprises a data similarity score as the model similarity score, in [0078]-[0079]: The similarity between trained proxy model 270 and trained actual model 240 can be measured through various techniques by modeling engine 226 calculating model similarity score 280 as a function of proxy model parameters 275 and actual model parameters 245….)  
and the prediction metric includes at least one of a prediction accuracy validation, a prediction accuracy cross validation, a regression validation, a regression cross validation, or a principal component analysis validation. (Szeto teaches the use of the prediction metric including predictive accuracy and the prediction accuracy validation, in [0050]: … If the updated global model is an improvement (e.g., the predictive accuracy is improved), new parameters may be provided to the private data servers via the updated model instructions 230…)

Regarding claim 5, the rejection of claim 1 is incorporated and Szeto in combination with Wu in view of Choi further teaches the cloud computing system of claim 1,
wherein generating the synthetic dataset for training the data model comprises: retrieving a synthetic dataset model from the model storage; (Szeto teaches the generating synthetic dataset as proxy data from actual data using the retrieved trained proxy models that are used retrieved as part of the machine learning process from the model storage from the data server, in [0051]-[0052].)
retrieving a training dataset from a database; and (Szeto teaches the retrieving a training dataset from a database as the private database to learn salient features, in [0074].)
generating the synthetic dataset using the synthetic dataset model and the training dataset. (Szeto teaches generating the synthetic dataset using the synthetic data proxy data model, in [0078], using the training dataset from the private database, in [0078] and the synthetic dataset  as the seed proxy data set and the proxy data samples used with the retrieved actual data training dataset, in [0074]-[0075].)
Regarding independent claim 11 limitations, Szeto in combination with Wu teaches a method for generating data models, comprising:
receiving, by a model optimizer from an interface, a data model generation request, wherein the generation request comprises at least data describing a type of the data model to be generated; (Szeto teaches receiving, by a model optimizer as computing instructions executed by the model optimizer as the instructions executed in a computing server-based environment, in [0028], a request from a user interface server that includes information regarding the type of data classification model as a system input for the machine learning system, in [0108]-[0111], and information about the learning model type as input instructions for the type to be trained, in [0018]-[0019].)
The remaining claim limitations are similar to claim limitations in claim 1; and the limitations are rejection under the same rationale as the claim 1 limitations.

Regarding claim 12, the rejection of claim 11 is incorporated and Szeto in combination with Wu in view of Choi further teaches the method of claim 11,
the method further comprising, determining, by a model curator, that the data model satisfies governance criteria, before processing the production data using the data model. (Szeto teaches the evaluation of evaluating the accuracy of the data model information associated with the model parameters [0079], for the training the global model, in [0045] before using the  trained global model to process the production data as new data that can be processed using the global model as the production data becomes available, in [0107].)

Regarding claim 16, the rejection of claim 11 is incorporated and Szeto in combination with Wu,  in view of Choi further teaches the method of claim 11,
wherein generating the synthetic dataset for training the data model comprises: retrieving a synthetic dataset model from the model storage; (Szeto teaches the generating synthetic dataset as proxy data from actual data using the retrieved trained proxy models that are used retrieved as part of the machine learning process from the model storage from the data server, in [0051]-[0052].)
retrieving a training dataset from a database; and (Szeto teaches the retrieving a training dataset from a database as the private database to learn salient features, in [0074].)
generating the synthetic dataset using the synthetic dataset model and the training dataset. (Szeto teaches generating the synthetic dataset using the synthetic data proxy data model, in [0078], using the training dataset from the private database, in [0078] and the synthetic dataset  as the seed proxy data set and the proxy data samples used with the retrieved actual data training dataset, in [0074]-[0075].)

Regarding claim 17, the rejection of claim 16 is incorporated and Szeto in combination with Wu, in view of Choi further teaches the method of claim 16,
wherein: the synthetic dataset model comprises a class-specific model corresponding to a data class; and (Szeto teaches the synthetic dataset model as the proxy model from the class-specific model corresponding to the type (that is class) of the learning algorithm on a set of the class-specific data as depicted in Fig. 6)
generating the synthetic dataset using the synthetic dataset model and the training dataset comprises: determining a sensitive portion of the training dataset belongs to the data class; wherein the sensitive portion comprises personal information; (Szeto teaches identifying sensitive portion of the training data of the actual data during the de-identification process of personal information for patient information in compliance with HIPPA standards, in [0077]-[0078].)
generating a synthetic portion using the class-specific model; and (Szeto teaches generating the synthetic dataset using the synthetic data proxy data model, in [0078], that is the class-specific model, as depicted in Fig. 6.)
replacing the sensitive portion of the training dataset with the synthetic portion. (Szeto teaches generating training data sequence a training sequence of proxy data by combining features with samples in the private database, [0074]-[0075], as part of the de-identification processes in compliance with HIPPA standards to produce proxy data including information from the actual data, considered a process for replacing sensitive portion with the synthetic portion to de-identify the actual data, in [0077]-[0078].)

Claims 6-7 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over  by Szeto et al. (US Pat. Pub. No. 2018/0018590, hereinafter Szeto), in view of Wu et al. (US Pub No. 2019/0122120, hereinafter ‘Wu’), and in further view of Choi et al. (NPL: “Generating Multi-label Discrete Patient Records using Generative Adversarial Networks”, hereinafter ‘Choi’), and in further view of Dernoncourt et al. (NPL: “De-identification of patient notes with recurrent neural networks”, hereinafter ‘Dern’).

Regarding claim 6, the rejection of claim 5 is incorporated and Szeto in combination with Wu in view of Choi further teaches the cloud computing system of claim 5,
wherein generating the synthetic dataset using the synthetic dataset model and the training dataset comprises: identifying a sensitive portion of the training dataset using a …, wherein the sensitive portion comprises personal information. (Szeto teaches identifying sensitive portion of the training data of the actual data during the de-identification process of personal information for patient information in compliance with HIPPA standards, in [0077]-[0078].)
Szeto, Wu, and Choi do not expressly teach claim 6 limitation:
identifying a sensitive portion of the training dataset using a recurrent neural network…
Dern teaches claim 6 limitation:
identifying a sensitive portion of the training dataset using a recurrent neural network… (Dern teaches de-identification system using artificial neural networks (ANN), in pg. 598:Col. 1: Last Para.), where the ANN are recurrent neural networks (RNN), in pg. 598:Col. 2: Last Para.)
The Szeto, Wu, Choi, and Dern references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing methods in automated computing environment.
It would have been obvious to one of ordinary skill in the art before the effective filing date of claimed invention to integrate the method for de-identification of personal data using recurrent neural networks environments as disclosed by Dern with the method of information processing in machine learning  computing environments as collectively disclosed by Szeto, Wu, and Choi.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by the Dern, Szeto and Wu in order to provide reliable automated de-identification system using artificial neural networks that require no handcrafted features or rules (Dern, Abstract). Doing so will improve the performs of de-identification systems based on artificial networks (Dern, Abstract).

Regarding claim 7, the rejection of claim 6 is incorporated and Szeto in combination with Wu in view of Choi and Dern further teaches the cloud computing system of claim 6,
wherein the operations further comprise: receiving a data sequence, wherein the data sequence comprises at least one of an account number, a social security numbers, a name, or an address; (Szeto teaches de-identification processes in compliance with HIPPA standards, in [0077], including social security numbers, name, addresses or any other identifying information protected under the HIPAA Act, in [0018].)
receiving a context sequence, wherein the context sequence comprises snippets of data drawn from a text database; (Szeto teaches training data including text data as part of the actual data captured, considered a context sequence comprise of text data from the database server, in [0069].)
generating a training sequence by inserting the data sequence into the context sequence; (Szeto teaches generating training data sequence a training sequence of proxy data by combining features with samples in the private database,  [0074]-[0075], including the text context sequence captured in the data server as part of the actual data context sequence, in [0069].)
Szeto, Wu, and Choi do not expressly teach claim 7 limitations:
generating a label sequence indicating a position of the inserted data sequence in the training sequence, wherein the label sequence comprises at least two characters identifying different types of data; and 
training the recurrent neural network using the training sequence and the label sequence.
Dern teaches claim 7 limitations:
generating a label sequence indicating a position of the inserted data sequence in the training sequence, wherein the label sequence comprises at least two characters identifying different types of data; and training the recurrent neural network using the training sequence and the label sequence  (Dern teaches generating a label sequence to indicate the position with index associated with the datatype identified as part of the embedded tokenizing processes for the sequence input sequence, in pg. 599: Col. 1: 1st and 2nd paras. As depicted in Fig. 1. used to train the ANN using the token embedding’s, in pg. 598: Col. 1: 1st full para.)

It would have been obvious to one of ordinary skill in the art before the effective filing date of claimed invention to integrate the method tokenizing personal data using recurrent neural networks environments as disclosed by Dern with the method of information processing in machine learning  computing environments as collectively disclosed by Szeto, Wu, and Choi.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by the Dern, Szeto and Wu in order to provide reliable automated de-identification system using artificial neural networks that require no handcrafted features or rules (Dern, Abstract). Doing so will improve the performs of de-identification systems based on artificial networks (Dern, Abstract).

Regarding claim 18, the rejection of claim 16 is incorporated and Szeto in combination with Wu in view of Choi the method of claim 16,
wherein: the synthetic dataset model comprises a class and subclass-specific model corresponding to a data class and a subclass of the data class; (Szeto teaches the synthetic dataset model as the proxy model from the class-specific model corresponding to the type (that is class) of the learning algorithm on a set of the class-specific data as depicted in Fig. 6, having a sub class of the data class as the type/class as the sensitive data class information to be de-identified in compliance with HIPPA standards, in [0077], including social security numbers, name, addresses or any other identifying information protected under the HIPAA Act, in [0018].)
generating the synthetic dataset using the synthetic dataset model and the training dataset comprises: determining a sensitive portion of the training dataset belongs to the (Szeto teaches generating training data sequence a training sequence of proxy data by combining features with samples in the private database, [0074]-[0075], as part of the de-identification processes in compliance with HIPPA standards to produce proxy data including information from the actual data, considered a process for replacing sensitive portion with the synthetic portion to de-identify the actual data, considered determining the data set belongs to a sensitive portion class and selecting the subclass to be de-identified, in [0077]-[0078], using the proxy model for generating the synthetic data by preserving the selected knowledge as synthetic proxy data during the de-identification process, in [0076]-[0079].)
generating a synthetic portion using the class and subclass-specific model; and replacing the sensitive portion of the training dataset with the synthetic portion. (Szeto teaches generating training data sequence a training sequence of proxy data by combining features with samples in the private database, [0074]-[0075], as part of the de-identification processes in compliance with HIPPA standards to produce proxy data including information from the actual data, considered a process for replacing sensitive portion with the synthetic portion to de-identify the actual data, in [0077]-[0078]; using the proxy model, considered the class and subclass specific model for generating the synthetic data by preserving the selected knowledge as synthetic proxy data during the de-identification process, in [0076]-[0079].)
Szeto teaches synthetic dataset model as the proxy model from the class-specific model corresponding to the type (that is class) of the learning algorithm on a set of the class-specific data as depicted in Fig. 6 using machine learning algorithms/techniques for training the proxy model by a type of machine learning model, in Fig. 5., using the training dataset from a database, in [0078].
(Dern teaches the tokenizing processing for inserting differing subclasses of data sequences using label classes and notations of sequence position using indexes associated with the data type identified (e.g. different data class y labels and scalar and vector class sequence class types) as part of the embedded tokenizing processes for the sequence input sequence, in pg. 599: Col. 1: 1st and 2nd paras. As depicted in Fig. 1. using a train the ANN as the class and subclass-specific model, in pg. 599: Sec. Bidirectional LSTM, using the token embedding’s, in pg. 598: Col. 1: 1st full para.)
The Szeto, Wu, Choi, and Dern references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing methods in automated computing environment.
It would have been obvious to one of ordinary skill in the art before the effective filing date of claimed invention to integrate the method tokenizing personal data using recurrent neural networks environments as disclosed by Dern with the method of information processing in machine learning  computing environments as collectively disclosed by Szeto, Wu, and Choi.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by the Dern, Szeto and Wu in order to provide reliable automated de-identification system using artificial neural networks that require no handcrafted features or rules (Dern, Abstract). Doing so will improve the performs of de-identification systems based on artificial networks (Dern, Abstract).

Regarding claim 19, the rejection of claim 18 is incorporated and Szeto in combination with Wu in view of Choi and Dern further teaches the method of claim 18,
wherein the subclass is selected: … (Szeto teaches a process for replacing sensitive portion with the synthetic portion to de-identify the actual data, considered determining the data set belongs to a sensitive portion class and selecting the subclass to be de-identified, in [0077]-[0078], using the proxy model for generating the synthetic data by preserving the selected knowledge as synthetic proxy data during the de-identification process, in [0076]-[0079].)
Szeto does not expressly teach claim 19 limitation:
wherein the subclass is selected: according to a univariate distribution, or using a recurrent neural network.
Wu teaches claim 19 limitation: 
wherein the subclass is selected: according to a univariate distribution, or using a recurrent neural network. (Wu teaches selecting a data subclass as the by the discriminator from the generated x training data associated with a  selected yi subclass from the  generated training samples according to the uniform or normal noise distribution as the univariate distribution for generating data for the selected sub class, yi using the P(yi|x) univariate distribution, in [0030]: The generator G(z) 102 is configured to map a random noise vector z that has been drawn from a uniform or normal noise distribution pz(z) to produce generated samples xgen that simulate real samples. The generated data samples xgen are added to a dataset Xgen of generated samples that are stored in the data bank 108. Data bank 108  also includes a training dataset X,ra,n· Training dataset X,train includes a labelled dataset Xzab that includes labelled train­ing samples Xzab and an unlabeled dataset Xunz that includes unlabeled training samples xunz· … The discriminator D 104 is also configured to perform a classification function to determine probabilities for the different class labels y, to y k that can be applied to an  unlabeled sample (which can be a generated sample Xgen or an unlabeled training sample xunz)- In the example of FIG. 1, discriminator D 104 is also configured to distinguish  between K possible label classes. Each of the ith component of the K-dimensional output of the discriminator D(x) 104 in FIG. 1 represents a confidence score that a sample x (which can be a generated sample xgem an unlabeled sample xunz., or a labelled sample Xzab) belongs to class y,. Discriminator D 104 is also configured to generate a posterior probability value P(y,lx) for predicting that a possible label y, is the correct label for the sample x (which can be a generated sample xgem an unlabeled sample xunz., or a labelled sample Xzab).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Szeto and Wu for the same reasons disclosed above.
Additionally, Dern expressly teach claim 19 limitation:
wherein the subclass is selected: according to a univariate distribution, or using a recurrent neural network. (Dern teaches de-identification process using a recurrent neural network using a recurrent neural network, abstract; where the process selects a subclass as input sequences of variable sizes and types as depicted, in Fig. 1, in pg. 599: 1st Col.; 1st & 2nd paras. & Sec. Bidirectional LSTM.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Szeto, Wu, Choi, and Dern for the same reasons disclosed above.
 
Claims 8-10 rejected under 35 U.S.C. 103 as being unpatentable over  by Szeto et al. (US Pat. Pub. No. 2018/0018590, hereinafter Szeto) in view of Wu et al. (US Pub No. 2019/0122120, hereinafter ‘Wu’), in further view of Choi et al. (NPL: “Generating Multi-label Discrete Patient Records using Generative Adversarial Networks”, hereinafter ‘Choi’), and in further view of Dernoncourt et al. (NPL: .

Regarding claim 8, the rejection of claim 7 is incorporated and Szeto in combination with Wu in view of Choi and Dern further teaches the cloud computing system of claim 7,
wherein: the training sequence includes inserted data sequences; (Szeto teaches generating training data sequence a training sequence of proxy data by combining features with samples in the private database,  [0074]-[0075], as part of the de-identification processes in compliance with HIPPA standards to produce proxy data including information from the actual data , in [0077]-[0078].)
Szeto, Wu, and Choi do not expressly teach claim 8 limitation:
and the label sequence indicates at least one of differing classes among the inserted data sequences and differing subclasses among the inserted data sequences. 
Dern teaches claim 8 limitation:
and the label sequence indicates at least one of differing classes among the inserted data sequences and differing subclasses among the inserted data sequences. (Dern teaches the tokenizing processing for inserting differing subclasses of data sequences using label classes and notations of sequence position using indexes associated with the data type identified (e.g. different data class y labels and scalar and vector class sequence class types) as part of the embedded tokenizing processes for the sequence input sequence, in pg. 599: Col. 1: 1st and 2nd paras. As depicted in Fig. 1., used to train the ANN using the token embedding’s, in pg. 598: Col. 1: 1st full para.)


In addition Jia teaches claim 8 limitation:
and the label sequence indicates at least one of differing classes among the inserted data sequences and differing subclasses among the inserted data sequences. (Jai teaches tokenizing data sequences using feature generations and classes as feature labels associated with the data sequence as depicted in Fig. 1, to extract features as part of the embedding process, in pg. S45:Col. 1: 1st full para.)
 The Szeto, Wu, Choi, Dern, and Jia references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing methods in automated computing environment.
It would have been obvious to one of ordinary skill in the art before the effective filing date of claimed invention to integrate the method tokenizing personal data using recurrent neural networks environments as disclosed by Jia with the method of information processing in machine learning  computing environments as collectively disclosed by Szeto, Wu, Choi, and Dern.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods in Szeto, Wu, Dern, and Jia in order to provide reliable automated de-identification system using artificial neural network model to prevent re-identification (Jia, Sec. 1:Introduction: 2nd full para). Doing so will help reduce the time and expense associated with the de-identification (Jia, Sec. Intro).

	
Regarding claim 9, the rejection of claim 8 is incorporated. Szeto, Wu, and Choi do not expressly teach claim 8 limitations: 
wherein training the recurrent neural network using the training sequence and the label sequence comprises: estimating a label by applying a subset of the training sequence to the recurrent neural network; 
comparing the estimated label to an actual label in the label sequence, the actual label corresponding to the subset; and 
updating the recurrent neural network according to a loss function based on a result of the comparison.
Dern teaches claim 9 limitations:
wherein training the recurrent neural network using the training sequence and the label sequence comprises: estimating a label by applying a subset of the training sequence to the recurrent neural network; (Dern teaches estimating a label as a score that is computed from the sequence subsets depicted in Fig. 1, in pg. 600: Col. 2: 1st and 2nd Full paras, by applying a subset of training sequences as inputted tokens as a probability to the ANN, in pg. 599: Col. 1: 1st and 2nd paras.; where the ANN are recurrent neural networks (RNN), in pg. 598:Col. 2: Last Para.)
comparing the estimated label to an actual label in the label sequence, the actual label corresponding to the subset; and (Dern teaches comparing the estimated label score to an actual label noted as the probability yi associated with the subset sequence by choosing the actual labels  that maximizes the score using an objective function to perform the comparison, in in pg. 600: Col. 2: 1st and 2nd)
updating the recurrent neural network according to a loss function based on a result of the comparison. (Dern teaches the comparison for determine the predicted probability parameters associated with the training of the RNN as depicted in Fig. 1, in pg. 600: Sec. Label sequence optimization layer: 1st –last paragraphs. Including last para.; where the training to update the model parameters as the parameters of the RNN are done according to a loss function using stochastic gradient decent, in pg. 601: Sec. Training and hyperparameters: 1st para.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Szeto, Wu, and Dern for the same reasons disclosed above.
Additionally Jia teaches claim 9 limitation:
updating the recurrent neural network according to a loss function based on a result of the comparison. (Jia teaches training the RNN considered updating, selecting the correct label by maxing a loss function as a result of the comparison using the objective function to score the classification labels, in pgs. S46-S47: Sec.: Label decoding: 1st – last paragraphs. Including last para.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Szeto, Wu, Choi, Dern, and Jia for the same reasons disclosed above.

Regarding claim 10, the rejection of claim 9 is incorporated and Szeto, Wu, and Choi do not expressly teach claim 10 limitation.
Dern teaches claim 10 limitation:
wherein the actual label corresponds to an element of the subset occupying the same position in the training sequence as the actual label occupies in the label sequence. (Dern teaches the actual label corresponds to an element of the subset occupying the same position as the index i, as depicted in Fig. 1, in pg. 600: Sec. Label sequence optimization layer: 1st –last paragraphs.)
.

Claims 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over  by Szeto et al. (US Pat. Pub. No. 2018/0018590, hereinafter Szeto) in view of Wu et al. (US Pub No. 2019/0122120, hereinafter ‘Wu’), and in further view of Choi et al. (NPL: “Generating Multi-label Discrete Patient Records using Generative Adversarial Networks”, hereinafter ‘Choi’), and in further view of Bloom (US Pat. Pub. No.  2018/0336463).

Regarding claim 13, the rejection of claim 11 is incorporated and Szeto in combination with Wu, in view of Choi further teaches the method of claim 11,
wherein the interface, the computing resources, the dataset generator, and the model optimizer are hosted by separate virtual computing instances of a cloud computing system. (Szeto teaches the server based environment can comprise more than one global modeling that offers distributed machine learning services, as separated computing resources of the cloud computing system, in [0042], implemented using Docker as virtual computing resources, in [0060].)
Szeto teaches the cloud-base information processing system for creating machine learning models as depicted in Fig. 2, in in [0042] & [0028]; and generating synthetic dataset as proxy data from actual data  that is used to train as part of the machine learning process, in [0051]-[0052], for training the data model, in [0048] & [0050], that make valid machine learning base inferences regarding the actual data as predictions, in [0108]-[0111].

Additionally Bloom teaches virtual computing systems for inference information using machine learning algorithms:
computing resources, … are hosted by separate virtual computing instances of the cloud computing system (Bloom teaches a cloud platform module system that deploys hosted separate virtual instances a software services included as part of the cloud platform as cloud-based services, in [0039]-[0040] for specific application fields (e.g. industry specific data infrastructure and analysis systems) using for making machine learning predictions, in [0038].)

The Szeto, Wu, Choi, and Bloom references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing method for cloud-based computing environment.
It would have been obvious to one of ordinary skill in the art before the effective filing date of claimed invention to integrate the method for data processing for training inference machine learning algorithms in distributed cloud-based environments as disclosed by Bloom with the method of information processing in cloud-based computing environments as collectively disclosed by Szeto and Wu.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Szeto, Wu and Bloom in order to provide domain-specific data processing through machine learning in remote computing environments (Bloom, Abstract). Doing so will help provide for remote inference using domain-specific techniques in a cloud platform (Bloom, Abstract; 0037-0038).


wherein a distributor routes user requests to the computing resources, the dataset generator, and the model optimizer. (Szeto teaches using a network, considered a distributor, for routing a researchers or data analysts requests for creating machine learning algorithms through the distributed machine learning system, which is considered a distributor routing requests by a distributor, in [0042].)

Regarding claim 15, the rejection of claim 13 is incorporated and Szeto in combination with Wu in view of Choi, and Bloom further teaches the method of claim 13,
wherein the production data is received from a data source by a production instance using a common file system, and wherein the production data is processed using the data model by the production instance. (Szeto teaches receiving new data, considered production data using a common file system, to incorporate into the machine learning models, in [0048]; using a transmission protocol, in [0114].)

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over  Wu et al. (US Pub No. 2019/0122120, hereinafter ‘Wu’), in view of Szeto et al. (US Pat. Pub. No. 2018/0018590, hereinafter Szeto) and in further view of Dernoncourt et al. (NPL: “De-identification of patient notes with recurrent neural networks”, hereinafter ‘Dern’).

Regarding independent claim 20 limitations, Wu teaches a 
obtaining a synthetic dataset model; (Wu teaches obtaining a synthetic dataset model as a neural network generator of a generative adversarial network (GAN) model, in [0025]: Typically, generative adversarial networks (GANs) include two separate deep neural networks: a first neural network (generally referred to in the art as a generator… The generator takes in a random variable, z, with a distribution Pz(z) and attempts to map the random variable z to provide a realistic generated sample within a data distribution P dataCx). Conversely, the discriminator is expected to discern real samples from generated samples by giving the output of 1 or O respectively. In the GAN training process, the generator and discriminator are used to generate samples and classify them respectively to improve the performance of each other in an adversarial manner.; where the generator model obtained to generate synthetic training data, in [0025]-[0027]: … This disclosure describes a self-training method and system for semi-supervised GANs. In example embodi­ments, a first neural network (generally referred to herein­after as a generator) is used to generate synthetic data (referred to herein as generated samples)…) 
retrieving a training dataset from a database; (Wu teaches retrieving training data set  from a training database including labelled training data, in [0027]: … The training dataset includes a set of labelled training data (referred to hereinafter as a labelled training dataset) comprising labelled training data (referred to hereinafter as labelled samples), a larger set of unlabeled training data (referred to herein as an unlabeled training dataset) comprising unlabeled training data (referred to hereinafter as unlabeled samples), and the generated samples. In at least some examples, the unlabeled training dataset includes at least 10 times as many samples as the labelled training dataset.)
receiving, from the database, a similarity criterion, the similarity criterion including a predetermined difference in value between the training dataset and an output of the synthetic dataset model; ( in [0050]:  As proxy data 260 is generated and relayed to the global model server 130, the global model server aggregates the data and generates an updated global model. Once the global model is updated, it can be determined whether the updated global model is an improvement over the previous version of the global model. If the updated global model is an improvement (e.g., the predictive accuracy is improved), new parameters may be provided to the private data servers via the updated model instructions 230. At the private data server 124, the performance [claimed evaluation of similarity metric] of the trained actual model (e.g., whether the model improves or worsens) can be evaluated to determine whether the models instructions provided by the updated global model result in an improved trained actual model [claimed evaluation of the similarity metric meets claimed similarity metric]; And in 0104: …If the accuracy of the predictions from the trained proxy model on the actual private data training set is sufficiently high (e.g., within 10%, 5%, 1 %, or closer) [claimed receiving, from the database, a similarity criterion, the similarity criterion including a predetermined difference in value between the training dataset and an output of the synthetic dataset model], then the trained proxy model could be considered similar to the trained actual model… Further, if the similarity score fails to satisfy simi­larity criteria (e.g., falls below a threshold, etc.), then the modeling engine can repeat operations 540 through 560.)
generating a synthetic dataset using the synthetic dataset model and the training dataset (Wu teaches generating a synthetic data set using the generator synthetic dataset model, in [0025]-[0027]: … This disclosure describes a self-training method and system for semi-supervised GANs. In example embodi­ments, a first neural network (generally referred to herein­after as a generator) is used to generate synthetic data (referred to herein as generated samples)…) 
selecting a data subclass according to a univariate distribution; (Wu teaches selecting a data subclass as the by the discriminator from the generated x training data associated with a  selected yi subclass from the  generated training samples according to the uniform or normal noise distribution as the univariate distribution for generating data for the selected sub class, yi using the P(yi|x) univariate distribution, in [0030]: The generator G(z) 102 is configured to map a random noise vector z that has been drawn from a uniform or normal noise distribution pz(z) to produce generated samples xgen that simulate real samples. The generated data samples xgen are added to a dataset Xgen of generated samples that are stored in the data bank 108. Data bank 108  also includes a training dataset X,ra,n· Training dataset X,train includes a labelled dataset Xzab that includes labelled train­ing samples Xzab and an unlabeled dataset Xunz that includes unlabeled training samples xunz· … The discriminator D 104 is also configured to perform a classification function to determine probabilities for the different class labels y, to y k that can be applied to an  unlabeled sample (which can be a generated sample Xgen or an unlabeled training sample xunz)- In the example of FIG. 1, discriminator D 104 is also configured to distinguish  between K possible label classes. Each of the ith component of the K-dimensional output of the discriminator D(x) 104 in FIG. 1 represents a confidence score that a sample x (which can be a generated sample xgem an unlabeled sample xunz., or a labelled sample Xzab) belongs to class y,. Discriminator D 104 is also configured to generate a posterior probability value P(y,lx) for predicting that a possible label y, is the correct label for the sample x (which can be a generated sample xgem an unlabeled sample xunz., or a labelled sample Xzab).)
generating a synthetic portion using a class and subclass-specific model; and … of the training dataset with the synthetic portion; (Wu teaches generating synthetic data portion to include in the training data using the generator model based on a class model the modifying the training data by augmenting the data set with the using the class model is a the decussation function T(f, x) and the subclass specific model as the P(yi|x) model for the data used to augment the training data with the synthetic generated training data, in [0052]-[0053]: … The information about the sample is denoted by fin the decision function T (f, x). The information about the sample x may be an output of the neural network used to implement the discriminator 104, the posterior probability P(y,lx) for the sample, or any other feature that is derivable from the sample. In some embodiments of the data augmentation phase 223, a few subsets of unlabeled or generated samples may be labelled and new GANs may be trained using each of these subsets to see which subset of unlabeled or generated samples gives the best GAN….
validating the training dataset, wherein the validating comprises: generating, based on a comparison of the synthetic dataset and the training dataset, a similarity metric of the synthetic dataset model,  (in 0025-0029: Typically, generative adversarial networks (GANs) include two separate deep neural networks: a first neural network (generally referred to in the art as a generator) and a second neural network (generally referred to in the art as a discriminator)… the discriminator is expected to discern real samples from generated samples by giving the output of 1 or O respectively [claimed comparison]. In the GAN training process, the generator and discriminator are used to generate samples and classify them respectively to improve the performance of each other in an adversarial manner…; And in 0046: In example embodiments, the training phase 208 is an iterative phase during which the generator G 102 and discriminator D 104 are trained using the current training dataset Xtrain(J) until a validation error for a validation dataset stops decreasing, as shown by blocks 210 to 221. In particular an adversarial game is played for improving the discrimination and classification performance by the dis­criminator D [claimed evaluation of similarity metric of claimed model] 104 and the data generation performance by generator G 102 simultaneously…)
determining validating whether the similarity metric satisfies the similarity criterion, and repeating the step of generating the synthetic dataset until the satisfies the similarity criterion is met. (Wu teaches, in 0048-0049: As indicated in block 221, a determination is then made whether the validation error on a predetermined vali­dation dataset has stopped decreasing [claimed determining validating whether the similarity metric satisfies the similarity criterion, and repeating the step of generating the synthetic dataset until the satisfies the similarity criterion is met]. As known in the art, the validation dataset is a predetermined dataset that is used to determine when training of the discriminator D 104 has reached a level where the validation error reaches its mini­mal level.)
Wu does not expressly teach claim 20 limitations:
determining a sensitive portion of the training dataset belongs to a data class using a recurrent neural network, wherein the sensitive portion comprises personal information;
and replacing the sensitive portion of the training dataset with the synthetic portion.
Szeto teaches claim 20 limitations
determining a sensitive portion of the training dataset belongs to a data class using a …, wherein the sensitive portion comprises personal information; (Szeto teaches identifying sensitive portion of the training data of the actual data during the de-identification process (considered determining sensitive portion of actual data as a sensitive class per HIPPA standards) of personal information for patient information in compliance with HIPPA standards, in [0077]-[0078].)
and generating a synthetic portion using a class and subclass-specific model; (Szeto teaches generating the synthetic dataset using the synthetic data proxy data model, in [0078], which is the class-specific model, as depicted in Fig. 6.)
and replacing the sensitive portion of the training dataset with the synthetic portion. (Szeto teaches generating training data sequence a training sequence of proxy data by combining features with samples in the private database, [0074]-[0075], as part of the de-identification processes in compliance with HIPPA standards to produce proxy data including information from the actual data, considered a process for replacing sensitive portion with the synthetic portion to de-identify the actual data, in [0077]-[0078].)
The Wu and Szeto references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing methods in automated computing environment.
It would have been obvious to one of ordinary skill in the art before the effective filing date of claimed invention to integrate the method for modifying synthetic data as disclosed by Szeto with the method of information processing in machine learning computing environments as disclosed by Wu.
One of ordinary skill in the arts would have been motivated to integrate the disclosed methods in order to improve or optimize distributed machine learning in environments where the computing devices lack access to private data , (Szeto, [0030]).

While Szeto teaches a de-identification process using a learning model, in [0077]-[0078].
Szeto does not expressly teach claim 20 limitation:
the generating comprising: determining a sensitive portion of the training dataset belongs to a data class using a recurrent neural network…
Dern teaches claim 20 limitation:
the generating comprising: determining a sensitive portion of the training dataset belongs to a data class using a recurrent neural network…(Dern teaches de-identification system using artificial neural networks (ANN), in pg. 598:Col. 1: Last Para.), where the ANN are recurrent neural networks (RNN), in pg. 598:Col. 2: Last Para.)
The Wu, Szeto, and Dern references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing methods in automated computing environment.
It would have been obvious to one of ordinary skill in the art before the effective filing date of claimed invention to integrate the method for de-identification of personal data using recurrent neural networks environments as disclosed by Dern with the method of information processing in machine learning computing environments as disclosed by Szeto.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by the Dern, Szeto and Wu in order to provide reliable automated de-identification system using artificial neural networks that require no handcrafted features or rules (Dern, Abstract). Doing so will improve the performs of de-identification systems based on artificial networks (Dern, Abstract).
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure are listed below:
Barra-Chicote et al. (US Patent No. 10,510,358): teaches the converting of label state to numerical data for processing using adversarial networks, to learn latent representations as generated synthetic data. 
Camino et al. (NPL: Generating Multi-Categorical Samples with Generative Adversarial Networks): teaches converting multi-categorical samples to binary representations to train using Generative Adversarial Networks. 
Goodfellow et al. (NPL:  Generative adversarial networks): teaches Generative adversarial networks as a machine learning algorithm for processing data using synthetic data, neural networks, and model criteria to evaluate the data associated with the training process.
Bhowmick et al. (US Pub. No. 2019/0244138): teaches the use of Generative adversarial networks (GANS) for privatizing data in machine learning data systems and encoding input data sequences.  
Faulhaber, Jr. et al. (US Pub. No. 2019/0156247): teaches machine learning iteration as a refining process for improving prediction accuracy.
Laszlo et al. (NPL: “Optimal univariate microaggregation with data suppression”): teaches the de-identification of data as a mircoaggregation process, in pg. 677: 1st Col.; 1st para., using unite variate distribution, as depicted in Fig., in abstract & p. 678: Sec. 4, by selecting a data subclass as subproblem based algorithms, in pg. 679: 1st Col. 1st partial & full paras.
                                                                                                                                                                                  Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516.  The examiner can normally be reached on Monday-Friday, 8:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-






/O.O.A./              Examiner, Art Unit 2126  
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129