DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	This communication is in response to Applicant’s submission filed 30 September 2021 [hereinafter Response], in which:
Claims 1, 32, 34, 36, and 38 have been amended.
Claims 29, 30, and 35 have been cancelled.
Claims 1-28, 31-34, and 36-38 are pending.
	Claims 1-28, 31-34, and 36-38 are rejected.
Claim Rejections - 35 U.S.C. § 103
3.	The following is a quotation of 35 U.S.C. § 103, which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
4.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. § 103 are summarized as follows:
1. 	Determining the scope and contents of the prior art.
2. 	Ascertaining the differences between the prior art and the claims at issue.
3. 	Resolving the level of ordinary skill in the pertinent art.
4. 	Considering objective evidence present in the application indicating obviousness or nonobviousness.
5.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
6.	Claims 1-13, 18-23, 31-34 and 36-38 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20130085773 to Yao et al. [hereinafter Yao] in view of Stojanovic et al., “Modeling Healthcare Quality via Compact Representations of Electronic Health Records,” IEEE/ACM transactions on Computational Biology and Bioinformatics (May 2016) [hereinafter Stojanovic].
Regarding claim 1, Yao teaches [a] distributed machine learning system (Yao, Abstract) comprising:
a plurality of private data servers, wherein each private data server of the plurality of private data servers has access to local private data and has at least one modeling engine, wherein the plurality of private data servers is communicatively coupled, via a network, to at least one non-private computing device (Yao, Fig. 2 and ¶ 0031 teaches (Examiner annotation in text block):

    PNG
    media_image1.png
    484
    700
    media_image1.png
    Greyscale

wherein five PHCs (that is, a plurality of private data servers) have individual on-site access to [Model Deconstruction and Transfer (MDT) / Variable Library (VL)] platforms which generate five first prediction models (PM0) and an [Model Component Library (MCL)] (that is, each private data server of the plurality of private data servers . . . has at least one modeling engine); Yao ¶ 0047 teaches “electronic health records” or “EHR” is meant to include healthcare facility-hosted electronic medical records (EMR), internet-based EHR, data input by healthcare providers (including designated personnel), and patients and/or consumers; Yao ¶ 0006 teaches the implementation of prediction models also faces challenges related to the protection of personal identifiers (that is, local private data) that are present in the data sets (that is, as each PHC has electronic medical records, each private data server of the plurality of private data servers has access to local private data)), 
wherein, for each private data server of the plurality of private data servers, the local private data includes restricted features to which the at least one non-private computing device does not have authorization to access (Yao ¶ 0006 teaches the implementation of prediction models also faces challenges related to the protection of personal identifiers that are present in the data sets (that is, the personal identifiers in the data sets is the local private data includes restricted features to which the at least one non-private computing device does not have authorization to access); Examiner notes the specification identifies the “personal identifiers within data” is such that “a researcher would likely not have authorization to access each hospital's patient data due to privacy restrictions or HIPAA1 compliance” (PGPUB2 ¶ 0005)), and
wherein each private data server of the plurality of private data servers, upon execution by at least one processor software instructions stored in a non-transitory computer readable memory causes (Yao ¶ 0019 teaches a [Model Deconstruction and Transfer (MDT)] platform is provided in an electronic storage medium (that is, non-transitory computer readable memory), . . . and a software product (that is, software instructions)) its at least one modeling engine to:
receive model instructions to create a trained actual model from at least some of the local private data and according to an implementation of a machine learning algorithm (Yao ¶ 0020 teaches the computing device may be programmed to run statistical techniques selected (that is, “selected” is to receive model instructions to create a trained actual model from at least some of the local private data) from the group consisting of machine learning, logistic regression, linear regression, non-linear regression, and combinations of any of the foregoing (that is, according to an implementation of a machine learning algorithm); see also Yao ¶ 0024 that teaches [t]he machine learning techniques may be selected from the group consisting of classification tree methods, LASSO (least absolute shrinkage and selection operator), Bayesian network modeling, and combinations of any of the foregoing);
create the trained actual model according to the model instructions and as a function of the at least some of the local private data by training the implementation of the machine learning algorithm on the local private data (Yao ¶ 0046 teaches that a “training set” is meant to refer to a data set that is prepared from data provided by a PHC (that is local private data) or a non-PHC (see, Examples 2-6) that is used to generate or train one or more prediction models (that is, to generate or train on or more prediction models is create the trained actual model according . . . and as a function of the at least some of the local private data by training the implementation of the machine learning algorithm on the local private data) for a particular condition or disease state (that is, according to the model instructions)), the trained actual model comprising trained actual model parameters (Yao ¶ 0022 teaches at least two PM1 are generated, ranked, and selected according to model performance parameters (that is, the trained actual model comprising trained actual model parameters));
generate a plurality of private data statistical distributions from the local private data where the plurality of private data statistical distributions represents the local private data in aggregate used to create the trained actual model and does not include individual elements of the local private data (Yao ¶ 0081 teaches [t]he MDT and external hard drive now contain a prediction model (PM0) . . . [that] do not contain any raw data or [patient] identifiers (that is, does not include individual elements of the local private data); rather, they are encoded prediction models representing the statistical relationships (that is, the statistical relationships are a plurality of private data statistical distributions from the local private data in aggregate) among variables that have predictive value for the particular PHC from which they are derived; Yao ¶ 0024 teaches examples applying statistical criteria selected from the group consisting of predictive power based on posterior log likelihoods, predictive power based upon posterior probability of an event, predictive power based upon estimated errors of prediction, dynamic range of predicted probabilities, reclassification, frequency of predicted probabilities, cross-validation error, number of model components required, and discrimination);
* * *
calculate a model similarity score based on the proxy model parameters and the trained actual model parameters (Yao, claim 13, teaches performance of the validated at least one PM0 is measured by comparison against performance of a control model (that is, calculate a model similarity score); Yao ¶ 0050 teaches “control model” refers to a prediction model, or practice guidelines, or eligibility criteria . . . , that is in standard use in the healthcare industry or that is being more specifically used for the generation of a prediction model by a specific PHC (that is, by comparison of the control model, performance is a function of the proxy model parameters and the trained actual model parameters); see also Yao ¶ 0062, which teaches [t]he ability to discriminate can be measured by receiver operator characteristics analysis, where the area-under-the-curve (AUC) indicates the degree of discrimination (that is, AUC is a model similarity score) and AUC=0.5 indicates the model has no ability to discriminate; 
[Examiner note: neither the claims nor the specification define the term “model similarity score.” Notably, the specification broadly and generally recites that “[t]he similarity between trained proxy model 270 and trained actual model 240 can be measured through various techniques by modeling engine 226 calculating model similarity score 280 as a function of proxy model parameters 275 and actual model parameters 245. The resulting model similarity score 280 is a representation of how similar the two models are, at least to within similarity criteria (that is, multiple criterion).” (PGPUB ¶ 0079 (emphasis added)). Accordingly, the term “model similarity score” has a broad BRI that reads upon the teachings of Yao relating to “discrimination” of a model against a control model through AUC metrics, or rather, “through various techniques”; also, the accuracy-under-the curve of Yao pertains to model accuracy, which the specification expressly recites that the “similarity score 280 can be a single value (e.g., a difference in accuracy, . . . ) (PGPUB ¶ 0079)]); 
determine whether the model similarity score satisfies at least one transmission criterion (Yao ¶ 0024 teaches the performance of . . . at least one PM1 may be measured by comparison (that is, the comparison result of Yao is a model similarity) against performance of a control model; Yao ¶ 0061 teaches use of subsets of the predictive variables and thresholds to optimally define patient populations that are enriched for a certain trait, prognosis or outcome; predictive power; discrimination (that is, similarity score satisfies at least one transmission criterion)), wherein the at least one transmission criterion includes at least one of the following conditions relating to the model similarity score: 
a threshold condition, a multi-valued condition, a change in value condition, a trend condition, a human command condition, an external request condition, and a time condition (Yao ¶ 0062 teaches [d]iscrimination refers to how well a model can differentiate patients with higher versus lower probabilities of outcomes, or those with significantly different prognoses. The ability to discriminate (that is, can be measured by receiver operator characteristics analysis, where the area-under-the-curve (AUC) indicates the degree of discrimination and AUC=0.5 indicates the model has no ability to discriminate (that is, as compared to a control model, the AUC performance of each model pertains to similarity, such that the degree of discrimination is at least one of the following conditions relating to the model similarity score: a threshold condition . . . .)); and
in response to a determination that the model similarity score satisfies the at least one transmission criterion (Yao ¶ 0061 teaches use of subsets of the predictive variables and thresholds to optimally define patient populations that are enriched for a certain trait, prognosis or outcome; predictive power; discrimination (similarity score satisfies at least one transmission criterion); and reclassification), transmit the set of proxy data, over the network, to the at least one non-private computing device (Yao ¶ 0054 teaches “MCL,” “Model Component Library,” and “MCL components” are meant to refer to the deconstructed components (that is, the set of proxy data) derived from statistical features of prediction models (PM0); also, Yao teaches that [o]nce the prediction model is established (that is, as with the AUC of Yao, satisfies at least one transmission criterion), the PHC may further utilize the MDT platform to extract statistical features and/or components from the prediction model and send (that is, transmit) those features and/or components to a third party who can in turn reassemble the statistical features and/or components (Yao ¶ 0057)
[Examiner note: the term “transmit” is not defined by the claims or the specification. For purposes of examination, Examiner construes the plain and ordinary meaning of “transmit” is “to send or convey,” and is synonymous with “share” (Yao ¶ 0056), “send” (Yao ¶ 0057), “transfer” (Yao ¶ 0052), “access” (Yao ¶ 0052), etc.]).
Though Yao teaches creating MCL components that are deconstructed components derived from statistical features of predictions models, where each of the components represents a statistical feature that pertains to one or more variables provided in a data set, Yao does not explicitly teach -
* * *
generate a set of proxy data according to the plurality of private data statistical distributions;
create a trained proxy model from the set of proxy data by training another implementation of the machine learning algorithm on the set of proxy data, wherein the trained proxy model comprises proxy model parameters;
* * *
But Stojanovic teaches -
* * *
generate a set of proxy data according to the plurality of private data statistical distributions (Stojanovic, Fig. 2 & caption, teaches (Examiner annotations in bordered rectangles):

    PNG
    media_image2.png
    201
    1002
    media_image2.png
    Greyscale

Stojanovic, right column of p. 3, “2.2 disease+procedure2vec method,” first paragraph, teaches disease+procedure2vec (dp2v) approach for learning diseases and procedures representations (step 1 in Figure 2) that extend models of the recently proposed word2vec algorithm3. The key insight is that we can represent the patients’ lists of diseases and procedures from [Electronic Health Records] sequences of tokens, and view each sequence as a sample from some unknown language. . . . Probability                         
                            Ρ
                             
                            
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                            +
                                            m
                                        
                                    
                                    |
                                     
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     . . . is defined using the soft-max function (that is, a plurality of statistical distributions); continuing, Stojanovic, right column of p. 3, “2.2.1 Patient visit representation,” first paragraph, & Fig. 2, step 2, teaches generating a data set . . . where for each record ri the value of yi ∈ Y represents one of the target variables: [Length of Stay (LoS)], total charges (TOTCHG), or binary mortality indicator, and xi ∈ ℝM is a patient’s feature vector calculated by summing vectors of diseases and procedures that appear in that record (that is, generate a set of proxy data according to the plurality of private data statistical distributions));
create a trained proxy model from the set of proxy data by training another implementation of the machine learning algorithm on the set of proxy data, wherein the trained proxy model comprises proxy model parameters (Stojanovic, Fig. 2 & caption, teaches at step 3), train regression and classification models (that is, create a trained proxy model) to predict important indicators of healthcare quality y (LoS, TOTCHG and mortality for certain procedures and medical conditions of an inpatient) (that is, create a trained proxy model from the set of proxy data by training another implementation of the machine learning algorithm on the set of proxy data, wherein the trained proxy model comprises proxy model parameters));
* * *
Yao and Stojanovic are from the same or similar field of endeavor. Yao teaches prediction models generated from multiple sources to produce deconstructed components derived from statistical features of prediction models. Stojanovic teaches generating a data set from statistical representations, that in turn are used to train a model. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to implement the teachings of Yao pertaining to the generation, ranking, and selection of model performance parameters with the producing a data set from statistical representations to train a model of Stojanovic.
The motivation for doing so is to use Electronic Health Record (EHR) data that provides unique opportunities for improving quality of health services through learning vector representations by employing state-of-the-art language models specifically designed for modeling co-occurrence of diseases and applied clinical procedures. In doing so, provide a model that outperforms the baseline models on all tasks, indicating a strong potential of the proposed approach for advancing quality of the healthcare system. (Stojanovic, Abstract).
Regarding claim 2, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Yao further teaches wherein the local private data comprises local private healthcare data (Yao ¶ 0043 teaches “healthcare data” is meant to refer to medical data that is created in the process of inpatient or outpatient clinical services that are provided to the patient, such as for example, patient history, physical exams, lab tests and results, medical procedures and results, medications and patient response to the medications (healthcare data); Yao ¶ 0056 further teaches the costs of data-sharing (local) associated with legal- and institution-imposed protection of healthcare data (private) that is linked to personal identifiers).
Regarding claim 3, the combination of Yao and Stojanovic teaches all of the limitations of claim 2, as described in detail above.
Yao further teaches wherein the local private healthcare data includes patient-specific data (Yao ¶ 0056 further teaches the costs of data-sharing (local) associated with legal- and institution-imposed protection of healthcare data (private) that is linked to personal identifiers (patient-specific data)).
Regarding claim 4, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Yao further teaches wherein the local private data includes at least one of the following types of data: genomic data, whole genome sequence data, whole exosome sequence data, proteomic data, proteomic pathway data, k-mer data, neoepitope data, RNA data, allergy information, encounter data, treatment data, outcome data, appointment data, order data, billing code data, diagnosis code data, results data, treatment response data, tumor response data, demographic data, medication data, vital sign data, payor data, drug study data, drug response data, longitudinal study data, biometric data, financial data, proprietary data, electronic medical record data, research data, human capital data, performance data, analysis results data, and event data (Yao ¶ 0043 teaches “healthcare data” is meant to include data that is generated for billing and other administrative tasks that are required for the patient to receive clinical services with his/her healthcare providers and navigate his/her healthcare needs, such as for example, pharmaceutical prescriptions, lab orders, billing records (billing code data), and referrals).
Regarding claim 5, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Yao further teaches -
wherein the network comprises at least one of the following types of networks: a wireless network, a packet switched network, the Internet, an intranet, a virtual private network, a cellular network, an ad hoc network, and a peer-to-peer network (Yao ¶ 0019 teaches the MDT platform is provided in an electronic storage medium, which may be selected from the group consisting of . . . a VPN, a secure internet website, and a software product that is downloadable from the internet with security features (that is, access being via the Internet))
Regarding claim 6, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Yao further teaches wherein the at least one non-private computing device is a different one of the plurality of private data servers lacking authorization to the local private data on which the trained actual model was created (Yao ¶ 0059 teaches a non-participating healthcare center (PHC) regional healthcare network wants a prediction model for IUI treatment successes (at least one non-private computing device is a different one of the plurality of private data servers lacking authorization to the local private data on which the trained model was created), but has very little date [sic] of its own).
Regarding claim 7, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Yao further teaches wherein the at least one non-private computing device includes a global model server (Yao ¶ 0049 teaches [t]he term “server' is meant to refer to a stand-alone computer and/or a networked server (communicatively coupled, via a network) that is dedicated to carry out a subject operation. Examples of stand-alone and networked servers that may be used with the present invention include without limitation, . . . a local on-site server (plurality of private data servers), and an off-site cloud server at least one non-private computing device); Examiner points out that a cloud server is a global model server).
Regarding claim 8, the combination of Yao and Stojanovic teaches all of the limitations of claim 7, as described in detail above.
Yao teaches -
wherein the global model server is configured to aggregate sets of proxy data from at least two of the plurality of private data servers (Yao ¶ 0025 teaches compensating the healthcare center according to the MCL components that contribute to the generation of the at least one PM1. (that is, the global model server is configured to aggregate sets of proxy data from at least two of the plurality of private data servers and is configured to train a global model on the sets of proxy data)).
Regarding claim 9, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Yao teaches -
wherein each private data server is communicatively coupled with a local storage system that stores the local private data (Yao ¶ 0052 teaches terms “Model Deconstruction and Transfer” and “MDT” are meant to refer to a platform that allows the PHC to provide access to a third party, to use a PHC's healthcare data to generate a prediction model that is devoid of any personal identifiers on-site, and transfer that prediction model to a third-party facility, without the transfer of personal identifiers or any raw data, for deconstruction; Yao ¶ 0019 teaches the MDT platform is provided with an electronic storage medium, which may be selected from the group consisting of an external hard drive, a CD ROM, a USB memory device, an electronic mobile device, desktop computer, a server, a VPN, a secure internet website (that is, each private data server is communicably coupled with a local storage system that stores the local private data)).
Regarding claim 10, the combination of Yao and Stojanovic teaches all of the limitations of claim 9, as described in detail above.
Yao teaches -
wherein the local storage system includes at least one of the following: a RAID system, a file server, a network accessible storage device, a storage area network device, a local computer readable memory, a hard disk drive, an optical storage device, a tape drive, a tape library, and a solid state disk (Yao ¶ 0019 teaches the MDT platform is provided with an electronic storage medium, which may be selected from the group consisting of an external hard drive (that is, a hard disk drive), a CD ROM (that is, an optical storage device), a USB memory device, an electronic mobile device, desktop computer, a server, a VPN, a secure internet website).
Regarding claim 11, the combination of Yao and Stojanovic teaches all of the limitations of claim 9, as described in detail above. 
Yao teaches -
the local storage system includes at least one of the following: a local database, a BAM server, a SAM server, a GAR server, BAMBAM server, and a clinical operating system server (Yao ¶ 0049 teaches that term “server” is meant to refer to a stand-alone computer and/or a networked server that is dedicated to carry out a subject operation. Examples of stand-alone and networked servers that may be used with the present invention include without limitation, a desktop, a laptop, a mobile device, a co-location server, a local on-site server, and an off-site cloud server. Cloud servers include public cloud servers and private/dedicated cloud or hybrid cloud servers; Yao, claim 8, teaches the server is selected from an on-site local server at the healthcare center and an off-site collocation or cloud server controlled by the healthcare center (that is, the local storage system includes at least one of the following: . . . a clinical operating system server)).
Regarding claim 12, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Yao teaches -
wherein the model instructions comprise at least one of the following: a local command, a remote command, an executable file, a protocol command, and a selected command (Yao ¶ 0019 teaches a software product that is downloadable from the internet with security features (that is, the model instructions comprise at least one of the following: . . . . an executable file)).
Regarding claim 13, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Stojanovic teaches -
wherein, for each statistical distribution of the plurality of private data statistical distributions, the statistical distribution is generated such that the statistical distribution comprises at least one of the following types of statistical distributions: a Gaussian distribution, a Poisson distribution, a Bernoulli distribution, a Rademacher distribution, a discrete distribution, a binomial distribution, a zeta distribution, a Gamma distribution, a beta distribution, and a histogram distribution (Stojanovic, Fig. 5 & caption, teaches:

    PNG
    media_image3.png
    328
    499
    media_image3.png
    Greyscale

(that is, for each statistical distribution of the plurality of private data statistical distributions, the statistical distribution is generates such that the statistical distribution comprises at least one of . . . a histogram distribution)).
Regarding claim 18, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Yao further teaches wherein the trained actual model is based on an implementation of at least one of the following types of machine learning algorithms: a classification algorithm, a neural network algorithm, a regression algorithm, a decision tree algorithm, a clustering algorithm, a genetic algorithm, a supervised learning algorithm, a semi-supervised learning algorithm, an unsupervised learning algorithm, and a deep learning algorithm (Yao ¶ 0021 teaches [t]he machine learning technique may be selected from the group consisting of classification tree methods, LASSO (least absolute shrinkage and selection operator), Bayesian network modeling, and combinations of any of the foregoing (that is, the trained actual model is based on an implementation of at least one of the following types of machine learning algorithms: a classification algorithm . . . a decision tree algorithm . . . .)).
Regarding claim 19, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Yao further teaches wherein the trained actual model is based on an implementation of at least one of the following machine learning algorithms: a support vector machine, a nearest neighbor algorithm, a random forest, a ridge regression, a Lasso algorithm, a k-means clustering algorithm, a spectral clustering algorithm, a mean shift clustering algorithm, a non-negative matrix factorization algorithm, an elastic net algorithm, a Bayesian classifier algorithm, a RANSAC algorithm, and an orthogonal matching pursuit algorithm (Yao ¶ 0021 teaches [t]he machine learning technique may be selected from the group consisting of classification tree methods, LASSO (least absolute shrinkage and selection operator) (Lasso algorithm), Bayesian network modeling (Bayesian classifier algorithm), and combinations of any of the foregoing).
Regarding claim 20, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Yao teaches -
wherein, for a first private data server of the plurality of private data servers, the model instructions include instructions to create the trained actual model from a base-line model created external to the private data server (Yao ¶ 0051 teaches that [w]ith reference to a control model, the baseline is the average probability if no predictive modeling is done. If no prior model exists, a control model may be generated by using one or more variables that are commonly used by healthcare providers in a particular situation (that is, create the trained actual model from a base-line model created external to the private data server)).
Regarding claim 21, the combination of Yao and Stojanovic teaches all of the limitations of claim 20, as described in detail above.
Yao teaches -
wherein the baseline model comprises a global trained actual model (Yao ¶ 0050 teaches that the term “control model” refers to a prediction model, or practice guidelines, or eligibility criteria (e.g. criteria required to perform a medical service, provide coverage to a medical service, or to reimburse healthcare professionals or healthcare facilities), that is in standard use in the healthcare industry or that is being more specifically used for the generation of a prediction model by a specific PHC (that is, as standard related, the baseline model comprises a global trained actual model)).).
Regarding claim 22, the combination of Yao and Stojanovic teaches all of the limitations of claim 21, as described in detail above.
Yao teaches -teaches wherein the global trained actual model is trained, at least in part, on sets of proxy data from at least two of the plurality of private data servers other than the first private data server (Yao ¶ 0051 teaches the term “baseline” refers to an overall mean probability, disregarding any profiling or modeling. With reference to a control model, the baseline is the average probability if no predictive modeling is done. If no prior model exists, a control model may be generated by using one or more variables that are commonly used by healthcare providers in a particular situation; Yao ¶ 0059 teaches how each of the five PHCs is compensated for their contribution to the generation of the PM1 (that is, (that is, the plurality of private health centers are global trained actual model is trained, at least in part, on sets of proxy data from at least two of the plurality of private data servers other than the first private data server)).
Regarding claim 31, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above. 
Yao further teaches wherein the modeling engine is further configured to update the trained actual model on new local private data (Yao ¶ 0069 teaches a method for incentivizing PHCs to provide updated data, at regular time intervals (to update the trained actual model on new local private data)).
Regarding claim 32, Yao teaches [a] computing-device implemented method of distributed machine learning, the method comprising:
receiving, by a private data server, model instructions to create a trained actual model from at least some of local private data local to the private data server and according to an implementation of a machine learning algorithm (Yao, Fig. 2 and ¶ 0031 teaches (Examiner annotation in text block):

    PNG
    media_image4.png
    651
    735
    media_image4.png
    Greyscale

wherein five PHCs (that is, a plurality of private data servers) have individual on-site access to [Model Deconstruction and Transfer (MDT) / Variable Library (VL)] platforms which generate five first prediction models (PM0) and an [Model Component Library (MCL)] (that is, a private data server, and a machine learning engine);Yao ¶ 0020 teaches the computing device may be programmed to run statistical techniques selected (that is, “selected” is to receive model instructions to create a trained actual model from at least some of the local private data) from the group consisting of machine learning, logistic regression, linear regression, non-linear regression, and combinations of any of the foregoing (that is, according to an implementation of a machine learning algorithm); see also Yao ¶ 0024 that teaches [t]he machine learning techniques may be selected from the group consisting of classification tree methods, LASSO (least absolute shrinkage and selection operator), Bayesian network modeling, and combinations of any of the foregoing), wherein the local private data includes restricted features (Yao ¶ 0006 teaches the implementation of prediction models also faces challenges related to the protection of personal identifiers that are present in the data sets (that is, the personal identifiers in the data sets is the local private data includes restricted features); Examiner notes the specification identifies the “personal identifiers within data” is such that “a researcher would likely not have authorization to access each hospital's patient data due to privacy restrictions or HIPAA4 compliance” (PGPUB5 ¶ 0005));
creating, by a machine learning engine, the trained actual model according to the model instructions and as a function of the at least some of the local private data by training the implementation of the machine learning algorithm on the local private data (Yao ¶ 0046 teaches that a “training set” is meant to refer to a data set that is prepared from data provided by a PHC (that is local private data) or a non-PHC (see, Examples 2-6) that is used to generate or train one or more prediction models (that is, to generate or train on or more prediction models is create the trained actual model according . . . and as a function of the at least some of the local private data by training the implementation of the machine learning algorithm on the local private data) for a particular condition or disease state (that is, according to the model instructions)), wherein the trained actual model comprises trained actual model parameters (Yao ¶ 0022 teaches at least two PM1 are generated, ranked, and selected according to model performance parameters (that is, the trained actual model comprising trained actual model parameters));
generating, by the machine learning engine, a plurality of private data statistical distributions from the local private data, wherein the plurality of private data statistical distributions represents the local private data in aggregate used to create the trained actual model and does not include individual elements of the local private data (Yao ¶ 0081 teaches [t]he MDT and external hard drive now contain a prediction model (PM0) . . . [that] do not contain any raw data or [patient] identifiers (that is, does not include individual elements of the local private data); rather, they are encoded prediction models representing the statistical relationships (that is, the statistical relationships are a plurality of private data statistical distributions from the local private data in aggregate) among variables that have predictive value for the particular PHC from which they are derived; Yao ¶ 0024 teaches examples applying statistical criteria selected from the group consisting of predictive power based on posterior log likelihoods, predictive power based upon posterior probability of an event, predictive power based upon estimated errors of prediction, dynamic range of predicted probabilities, reclassification, frequency of predicted probabilities, cross-validation error, number of model components required, and discrimination);
identifying, by the machine learning engine, salient private data features from the plurality of private data statistical distributions, wherein the salient private data features allow for replication of the plurality of private data distributions (Yao ¶ 0081 teaches the MDT and external hard drive now contain a prediction model (PM0) (which may or may not be validated depending on whether or not the PHC ran a validation test or training set) relating to the success of IUI treatments (that is, salient private data features) specific to each PHC (that is, identifying, by the machine learning engine, salient private data features from the plurality of private data statistical distribution wherein the salient private data features allow for replication of the plurality of private data distributions). These prediction models (PM0) do not contain any raw data or identifiers; rather, they are encoded prediction models representing the statistical relationships among variables that have predictive value for the particular PHC from which they are derived; with respect to the term “salient,” the specification nor the claims define the term, though examples are provided in the context of “salient private data features.” (see PGPUB ¶ 0019). Accordingly, Examiner gives the term its plain meaning, which is “most noticeable or important.” The plain meaning of the term is not inconsistent with the specification. (MPEP § 2111.01));
* * *
and in response to a determination that a model similarity score of the trained proxy model satisfies at least one transmission criterion (Yao ¶ 0061 teaches use of subsets of the predictive variables and thresholds to optimally define patient populations that are enriched for a certain trait, prognosis or outcome; predictive power; discrimination (similarity score satisfies at least one transmission criterion)), transmitting, by the machine learning engine, the salient private data features over a network to a non-private computing device (Yao ¶ 0054 teaches “MCL,” “Model Component Library,” and “MCL components” are meant to refer to the deconstructed components (that is, the set of proxy data) derived from statistical features of prediction models (PM0); Yao ¶ 0057 teaches the PHC may further utilize the MDT platform to extract statistical features and/or components from the prediction model and send those features and/or components (that is, the salient private data features over a network) to a third party (that is, a non-private computing device) who can in turn reassemble the statistical features and/or components, together with statistical features and/or components from other facilities (i.e., the multiple sources), into a Model Component Library (MCL)),
wherein the model similarity score is based on the proxy model parameters and the trained actual model parameters (Yao, claim 13, teaches performance of the validated at least one PM0 is measured by comparison against performance of a control model (that is, model similarity score); Yao ¶ 0050 teaches “control model” refers to a prediction model, or practice guidelines, or eligibility criteria . . . , that is in standard use in the healthcare industry or that is being more specifically used for the generation of a prediction model by a specific PHC (that is, by comparison of the control model, performance is based on the proxy model parameters and the trained actual model parameters); see also Yao ¶ 0062, which teaches [t]he ability to discriminate can be measured by receiver operator characteristics analysis, where the area-under-the-curve (AUC) indicates the degree of discrimination (that is, AUC is a model similarity score) and AUC=0.5 indicates the model has no ability to discriminate),
wherein the non-private computing device is not authorized to access the restricted features of the local private data (Yao ¶ 0006 teaches the implementation of prediction models also faces challenges related to the protection of personal identifiers that are present in the data sets (that is, the personal identifiers in the data sets is the local private data includes restricted features to which the at least one non-private computing device does not have authorization to access); Examiner notes the specification identifies the “personal identifiers within data” is such that “a researcher would likely not have authorization to access each hospital's patient data due to privacy restrictions or HIPAA6 compliance” (PGPUB7 ¶ 0005)), and
wherein the salient private data features exclude the restricted features (Yao ¶ 0052 teaches terms “Model Deconstruction and Transfer” and “MDT” are meant to refer to a platform that allows the PHC to provide access to a third party, to use a PHC's healthcare data to generate a prediction model that is devoid of any personal identifiers on-site, and transfer that prediction model to a third-party facility, without the transfer of personal identifiers or any raw data, for deconstruction (that is, the transfer is private data where the salient private data features exclude the restricted features)).
Though Yao teaches creating MCL components that are deconstructed components derived from statistical features of predictions models, where each of the components represents a statistical feature that pertains to one or more variables provided in a data set, Yao does not explicitly teach -
* * *
creating a trained proxy model from a set of proxy data by training another implementation of the machine learning algorithm on the set of proxy data, wherein:
the set of proxy data is associated with the plurality of private data statistical distributions and
the trained proxy model comprises proxy model parameters; and
* * *
But Stojanovic teaches -
* * *
creating a trained proxy model from a set of proxy data by training another implementation of the machine learning algorithm on the set of proxy data (Stojanovic, Fig. 2 & caption, teaches at step 3), train regression and classification models (that is, create a trained proxy model) to predict important indicators of healthcare quality y (LoS, TOTCHG and mortality for certain procedures and medical conditions of an inpatient) (that is, create a trained proxy model from the set of proxy data by training another implementation of the machine learning algorithm on the set of proxy data)), wherein:
the set of proxy data is associated with the plurality of private data statistical distributions (Stojanovic, Fig. 2 & caption, teaches (Examiner annotations in bordered rectangles):

    PNG
    media_image2.png
    201
    1002
    media_image2.png
    Greyscale

Stojanovic, right column of p. 3, “2.2 disease+procedure2vec method,” first paragraph, teaches disease+procedure2vec (dp2v) approach for learning diseases and procedures representations (step 1 in Figure 2) that extend models of the recently proposed word2vec algorithm8. The key insight is that we can represent the patients’ lists of diseases and procedures from [Electronic Health Records] sequences of tokens, and view each sequence as a sample from some unknown language. . . . Probability                         
                            Ρ
                             
                            
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                            +
                                            m
                                        
                                    
                                    |
                                     
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     . . . is defined using the soft-max function (that is, a plurality of statistical distributions); continuing, Stojanovic, right column of p. 3, “2.2.1 Patient visit representation,” first paragraph, & Fig. 2, step 2, teaches generating a data set . . . where for each record ri the value of yi ∈ Y represents one of the target variables: [Length of Stay (LoS)], total charges (TOTCHG), or binary mortality indicator, and xi ∈ ℝM is a patient’s feature vector calculated by summing vectors of diseases and procedures that appear in that record (that is, generating a set of proxy data according to at least one of the following: the plurality of private data statistical distributions . . .)) and
the trained proxy model comprises proxy model parameters (Stojanovic, Fig. 2 & caption, teaches at step 3), train regression (that is, regression is the process of adjusting model parameters, and for a trained proxy model, these are proxy model parameters) and classification models (that is, create a trained proxy model) to predict important indicators of healthcare quality y (LoS, TOTCHG and mortality for certain procedures and medical conditions of an inpatient)); and
* * *
Yao and Stojanovic are from the same or similar field of endeavor. Yao teaches prediction models generated from multiple sources to produce deconstructed components derived from statistical features of prediction models. Stojanovic teaches generating a data set from statistical representations, that in turn are used to train a model. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to implement the teachings of Yao pertaining to the generation, ranking, and selection of model performance parameters with the producing a data set from statistical representations to train a model of Stojanovic.
The motivation for doing so is to use Electronic Health Record (EHR) data that provides unique opportunities for improving quality of health services through learning vector representations by employing state-of-the-art language models specifically designed for modeling co-occurrence of diseases and applied clinical procedures. In doing so, provide a model that outperforms the baseline models on all tasks, indicating a strong potential of the proposed approach for advancing quality of the healthcare system. (Stojanovic, Abstract).
Regarding claim 33, the combination of Yao and Stojanovic teaches all of the limitations of claim 32, as described in detail above.
Though Yao teaches creating MCL components that are deconstructed components derived from statistical features of predictions models, where each of the components represents a statistical feature that pertains to one or more variables provided in a data set, Yao does not explicitly teach -
wherein the salient private data features includes a set of proxy data
	But Stojanovic teaches 
wherein the salient private data features includes a set of proxy data (Stojanovic, Fig. 2, and right column of p. 3, 2.2.1 Patient visit representation,” first paragraph, teaches having learned the disease and procedure vectors (that is, salient private features) we aim to exploit them . . . . For this purpose, we generate a data set (that is, a set of proxy data)).
Yao and Stojanovic are from the same or similar field of endeavor. Yao teaches prediction models generated from multiple sources to produce deconstructed components derived from statistical features of prediction models. Stojanovic teaches generating a data set from statistical representations, that in turn are used to train a model. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to implement the teachings of Yao pertaining to the generation, ranking, and selection of model performance parameters with the producing a data set from statistical representations to train a model of Stojanovic.
The motivation for doing so is to use Electronic Health Record (EHR) data provides unique opportunities for improving quality of health services through learning vector representations by employing state-of-the-art language models specifically designed for modeling co-occurrence of diseases and applied clinical procedures. In doing so, provide a model that outperforms the baseline models on all tasks, indicating a strong potential of the proposed approach for advancing quality of the healthcare system. (Stojanovic, Abstract).
Regarding claim 34, the combination of Yao and Stojanovic teaches all of the limitations of claim 32, as described in detail above.
Though Yao teaches creating MCL components that are deconstructed components derived from statistical features of predictions models, where each of the components represents a statistical feature that pertains to one or more variables provided in a data set, Yao does not explicitly teach -
	generating the set of proxy data according to at least one of the following: the plurality of private data statistical distributions and the salient private data features.
But Stojanovic teaches -
generating a set of proxy data according to at least one of the following: the plurality of private data statistical distributions and the salient private data features (Stojanovic, Fig. 2 & caption, teaches (Examiner annotations in bordered rectangles):

    PNG
    media_image2.png
    201
    1002
    media_image2.png
    Greyscale

Stojanovic, right column of p. 3, “2.2 disease+procedure2vec method,” first paragraph, teaches disease+procedure2vec (dp2v) approach for learning diseases and procedures representations (step 1 in Figure 2) that extend models of the recently proposed word2vec algorithm9. The key insight is that we can represent the patients’ lists of diseases and procedures from [Electronic Health Records] sequences of tokens, and view each sequence as a sample from some unknown language. . . . Probability                         
                            Ρ
                             
                            
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                            +
                                            m
                                        
                                    
                                    |
                                     
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     . . . is defined using the soft-max function (that is, a plurality of statistical distributions); continuing, Stojanovic, right column of p. 3, “2.2.1 Patient visit representation,” first paragraph, & Fig. 2, step 2, teaches generating a data set . . . where for each record ri the value of yi ∈ Y represents one of the target variables: [Length of Stay (LoS)], total charges (TOTCHG), or binary mortality indicator, and xi ∈ ℝM is a patient’s feature vector calculated by summing vectors of diseases and procedures that appear in that record (that is, generating a set of proxy data according to at least one of the following: the plurality of private data statistical distributions . . .)).
Yao and Stojanovic are from the same or similar field of endeavor. Yao teaches prediction models generated from multiple sources to produce deconstructed components derived from statistical features of prediction models. Stojanovic teaches generating a data set from statistical representations, that in turn are used to train a model. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to implement the teachings of Yao pertaining to the generation, ranking, and selection of model performance parameters with the producing a data set from statistical representations to train a model of Stojanovic.
The motivation for doing so is to use Electronic Health Record (EHR) data provides unique opportunities for improving quality of health services through learning vector representations by employing state-of-the-art language models specifically designed for modeling co-occurrence of diseases and applied clinical procedures. In doing so, provide a model that outperforms the baseline models on all tasks, indicating a strong potential of the proposed approach for advancing quality of the healthcare system. (Stojanovic, Abstract).
Regarding claim 36, the combination of Yao and Stojanovic teaches all of the limitations of claim 32, as described in detail above.
	Yao teaches -
calculating the model similarity score of the trained proxy model as a function of the proxy model parameters and the trained actual model parameters (Yao, claim 13, teaches performance of the validated at least one PM0 is measured by comparison against performance of a control model (that is, calculate a model similarity score); Yao ¶ 0050 teaches “control model” refers to a prediction model, or practice guidelines, or eligibility criteria . . . , that is in standard use in the healthcare industry or that is being more specifically used for the generation of a prediction model by a specific PHC (that is, by comparison of the control model, performance is a function of the proxy model parameters and the trained actual model parameters); see also Yao ¶ 0062, which teaches [t]he ability to discriminate can be measured by receiver operator characteristics analysis, where the area-under-the-curve (AUC) indicates the degree of discrimination (that is, AUC is a model similarity score) and AUC=0.5 indicates the model has no ability to discriminate).
Regarding claim 37, the combination of Yao and Stojanovic teaches all of the limitations of claim 36, as described in detail above.
	Yao teaches -
aggregating the set of proxy data into an aggregated global model based on the model similarity score (Yao ¶ 0008 teaches providing more than one healthcare centers with a Model Deconstruction Transfer (MDT) platform comprised of a variable library (VL), wherein each healthcare center enters at least one data set (generate a plurality of private data distributions from the local private data) relevant to the health outcome of interest into the MDT platform and selects variables from the VL that are relevant to the health outcome of interest; and (b) generating at least one prediction model (PM0) for each healthcare center from the MDT platform (where the private data distributions represent the local private data in aggregate used to create the trained actual model)).
Regarding claim 38, Yao teaches a computer-implemented method of generating proxy data (Yao, Abstract), the method comprising: 
creating, at a private data server, from local private data accessible to the private data server, a trained actual model using a machine learning algorithm (Yao, Fig. 2 and ¶ 0031 teaches (Examiner annotation in text block):

    PNG
    media_image5.png
    616
    749
    media_image5.png
    Greyscale

wherein five PHCs (that is, a plurality of private data servers) have individual on-site access to [Model Deconstruction and Transfer (MDT) / Variable Library (VL)] platforms which generate five first prediction models (PM0) and an [Model Component Library (MCL)] (that is, a private data server, and a machine learning algorithm); Yao ¶ 0046 teaches that a “training set” is meant to refer to a data set that is prepared from data provided by a PHC (that is local private data) or a non-PHC (see, Examples 2-6) that is used to generate or train one or more prediction models (that is, to generate or train on or more prediction models is creating . . . a trained actual model using a machine learning algorithm) for a particular condition or disease state), wherein:
the trained actual model comprises trained actual model parameters (Yao ¶ 0022 teaches at least two PM1 are generated, ranked, and selected according to model performance parameters (that is, the trained actual model comprising trained actual model parameters));
the local private data includes restricted features (Yao ¶ 0006 teaches Yao ¶ 0006 teaches the implementation of prediction models also faces challenges related to the protection of personal identifiers that are present in the data sets (that is, the personal identifiers in the data sets is the local private data includes restricted features); Examiner notes the specification identifies the “personal identifiers within data” is such that “a researcher would likely not have authorization to access each hospital's patient data due to privacy restrictions or HIPAA10 compliance” (PGPUB11 ¶ 0005)) that are accessible to the private data server and are inaccessible to at least one system (Yao ¶ 0048 teaches a server (including a closed network of servers inaccessible to the public), a VPN, a secure internet website, and software product that is downloadable from the internet with security features. Security features include without limitation, digital certificates, user access codes, auto-expiration dates, and anti-de-encryption mechanisms (that is, that are accessible to the private data server and are inaccessible to at least one system)); and 
the restricted features include protected health information (Yao ¶ 0079 teaches dataset may include PHI and personal identifiers (that is, protected health information), if permitted by the policies of the PHC and any applicable HIPPA [sic] regulations. Alternatively, this dataset may be a de-identified or limited data set, which contains dates, but not other personal identifiers. It is to be understood that the dataset from any particular PHC is not shared (that is, restricted features) with Company A, any of the other PHCs, or any ultimate users (that is, the restricted features include protected health information)); 
generating, at the private data server, a plurality of private data statistical distributions from the local private data, wherein the plurality of private data statistical distributions represents the local private data in aggregate used to create the trained actual model and does not include individual elements of the local private data (Yao ¶ 0081 teaches [t]he MDT and external hard drive now contain a prediction model (PM0) . . . [that] do not contain any raw data or [patient] identifiers (that is, does not include individual elements of the local private data); rather, they are encoded prediction models representing the statistical relationships (that is, the statistical relationships are a plurality of private data statistical distributions from the local private data in aggregate) among variables that have predictive value for the particular PHC from which they are derived; Yao ¶ 0024 teaches examples applying statistical criteria selected from the group consisting of predictive power based on posterior log likelihoods, predictive power based upon posterior probability of an event, predictive power based upon estimated errors of prediction, dynamic range of predicted probabilities, reclassification, frequency of predicted probabilities, cross-validation error, number of model components required, and discrimination);
* * *
calculating, at the private data server, a model similarity score based on the proxy model parameters and the trained actual model parameters (Yao, claim 13, teaches performance of the validated at least one PM0 is measured by comparison against performance of a control model (that is, calculating, at the private data server, a model similarity score); Yao ¶ 0050 teaches “control model” refers to a prediction model, or practice guidelines, or eligibility criteria . . . , that is in standard use in the healthcare industry or that is being more specifically used for the generation of a prediction model by a specific PHC (that is, by comparison of the control model, performance is a function of the proxy model parameters and the trained actual model parameters); see also Yao ¶ 0062, which teaches [t]he ability to discriminate can be measured by receiver operator characteristics analysis, where the area-under-the-curve (AUC) indicates the degree of discrimination (that is, AUC is a model similarity score) and AUC=0.5 indicates the model has no ability to discriminate
[Examiner note: neither the claims nor the specification define the term “model similarity score.” Notably, the specification broadly and generally recites that “[t]he similarity between trained proxy model 270 and trained actual model 240 can be measured through various techniques by modeling engine 226 calculating model similarity score 280 as a function of proxy model parameters 275 and actual model parameters 245. The resulting model similarity score 280 is a representation of how similar the two models are, at least to within similarity criteria (that is, multiple criterion).” (PGPUB ¶ 0079 (emphasis added)). Accordingly, the term “model similarity score” has a broad BRI that reads upon the teachings of Yao relating to “discrimination” of a model against a control model through AUC metrics, or rather, “through various techniques”; also, the accuracy-under-the curve of Yao pertains to model accuracy, which the specification expressly recites that the “similarity score 280 can be a single value (e.g., a difference in accuracy, . . . ) (PGPUB ¶ 0079)]); 
determining, at the private data server, that the model similarity score satisfies at least one transmission criterion (Yao ¶ 0024 teaches the performance of . . . at least one PM1 may be measured by comparison (that is, the comparison result of Yao is a model similarity) against performance of a control model; Yao ¶ 0061 teaches use of subsets of the predictive variables and thresholds to optimally define patient populations that are enriched for a certain trait, prognosis or outcome; predictive power; discrimination (that is, similarity score satisfies at least one transmission criterion), wherein the at least one transmission criterion includes at least one of the following conditions relating to the model similarity score: a threshold condition, a multi-valued condition, a change in value condition, a trend condition, a human command condition, an external request condition, and a time condition (Yao ¶ 0062 teaches [d]iscrimination refers to how well a model can differentiate patients with higher versus lower probabilities of outcomes, or those with significantly different prognoses. The ability to discriminate (that is, can be measured by receiver operator characteristics analysis, where the area-under-the-curve (AUC) indicates the degree of discrimination and AUC=0.5 indicates the model has no ability to discriminate (that is, as compared to a control model, the AUC performance of each model pertains to similarity, such that the degree of discrimination is at least one of the following conditions relating to the model similarity score: a threshold condition . . . .)); and
in response to the determination that the model similarity score satisfies the at least one transmission criterion (Yao ¶ 0061 teaches use of subsets of the predictive variables and thresholds to optimally define patient populations that are enriched for a certain trait, prognosis or outcome; predictive power; discrimination (similarity score satisfies at least one transmission criterion); and reclassification), distributing, by the private data server, the trained proxy model to the at least one system that lacks access to the restricted features ((Yao ¶ 0057 teaches the [Participating Healthcare Center] may further utilize the [Model Deconstruction and Transfer] platform to extract statistical features and/or components from the prediction model and send (that is, distributing the trained . . . model) those features and/or components to a third party who can in turn reassemble . . . into a Model Component Library (MCL) (that is, distributing the trained proxy model to the at least one system that lacks access to the restricted features)); Yao ¶ 0054 teaches “MCL,” “Model Component Library,” and “MCL components” are meant to refer to the deconstructed components (that is, the set of proxy data) derived from statistical features of prediction models (PM0)). 
Though Yao teaches creating MCL components that are deconstructed components derived from statistical features of predictions models, where each of the components represents a statistical feature that pertains to one or more variables provided in a data set, Yao does not explicitly teach -
* * *
generating, at the private data server, a set of proxy data based on the plurality of private data statistical distributions, wherein the restricted features are absent from the set of proxy data;
creating, at the private data server, from the set of proxy data, a trained proxy model using the machine learning algorithm, wherein the trained proxy model comprises proxy model parameters; and
* * *
But Stojanovic teaches -
* * *
generating, at the private data server, a set of proxy data based on the plurality of private data statistical distributions, wherein the restricted features are absent from the set of proxy data (Stojanovic, Fig. 2 & caption, teaches (Examiner annotations in bordered rectangles):

    PNG
    media_image2.png
    201
    1002
    media_image2.png
    Greyscale

Stojanovic, right column of p. 3, “2.2 disease+procedure2vec method,” first paragraph, teaches disease+procedure2vec (dp2v) approach for learning diseases and procedures representations (step 1 in Figure 2) that extend models of the recently proposed word2vec algorithm12. The key insight is that we can represent the patients’ lists of diseases and procedures from [Electronic Health Records] sequences of tokens, and view each sequence as a sample from some unknown language. . . . Probability                         
                            Ρ
                             
                            
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                            +
                                            m
                                        
                                    
                                    |
                                     
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     . . . is defined using the soft-max function (that is, a plurality of statistical distributions); continuing, Stojanovic, right column of p. 3, “2.2.1 Patient visit representation,” first paragraph, & Fig. 2, step 2, teaches generating a data set . . . where for each record ri the value of yi ∈ Y represents one of the target variables: [Length of Stay (LoS)], total charges (TOTCHG), or binary mortality indicator, and xi ∈ ℝM is a patient’s feature vector calculated by summing vectors of diseases and procedures that appear in that record (that is, generating a set of proxy data based on the plurality of private data statistical distributions); Stojanovic, left column of p. 4, “3. EHR Discharge Database,” first paragraph, teaches we explored the State Inpatient Database (SID), an archive that stores the inpatient discharge abstracts from a number of data organizations (that is, inpatient discharge abstracts lack restricted features, and accordingly, the statistical representations lack restricted features, and accordingly, the restricted features are absent from the set of proxy data));
creating, at the private data server, from the set of proxy data, a trained proxy model using the machine learning algorithm (Stojanovic, Fig. 2 & caption, teaches at step 3), train regression and classification models (that is, creating, from the set of proxy data, a trained proxy model using the machine learning algorithm) to predict important indicators of healthcare quality);
* * *
Yao and Stojanovic are from the same or similar field of endeavor. Yao teaches prediction models generated from multiple sources to produce deconstructed components derived from statistical features of prediction models. Stojanovic teaches generating a data set from statistical representations, that in turn are used to train a model. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to implement the teachings of Yao pertaining to the generation, ranking, and selection of model performance parameters with the producing a data set from statistical representations to train a model of Stojanovic.
The motivation for doing so is to use Electronic Health Record (EHR) data provides unique opportunities for improving quality of health services through learning vector representations by employing state-of-the-art language models specifically designed for modeling co-occurrence of diseases and applied clinical procedures. In doing so, provide a model that outperforms the baseline models on all tasks, indicating a strong potential of the proposed approach for advancing quality of the healthcare system. (Stojanovic, Abstract).
Examiner notes that the Applicant’s preamble does not afford patentable weight to the Applicant’s claims because the claim preamble is not “necessary to give life, meaning, and vitality” to the claim. Moreover, because the Applicant’s preamble merely states the purpose or intended use of the invention rather than any distinct definition of any of the claimed invention’s limitations, the preamble is not considered a limitation and is of no significance to claim construction.
7.	Claims 14-16 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20130085773 to Yao et al. [hereinafter Yao] and Stojanovic et al., “Modeling Healthcare Quality via Compact Representations of Electronic Health Records,” IEEE/ACM transactions on Computational Biology and Bioinformatics (May 2016) [hereinafter Stojanovic], and further in view of Poh et al., “Challenges in Designing an Online Healthcare Platform for Personalised Patient Analytics,” pp. 1-6 (IEEE 2014) [hereinafter Poh].
Regarding claim 14, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Though the combination of Yao and Stojanovic teaches creating MCL components that are deconstructed components derived from statistical features of predictions models, where each of the components represents a statistical feature that pertains to one or more variables provided in a data set that generates proxy data for training a proxy model, the combination of Yao and Stojanovic, however, does not explicitly teach -
wherein the plurality of private data distributions are based on eigenvalues derived from the trained actual model parameters and the local private data.
But Poh teaches wherein the plurality of private data distributions are based on eigenvalues derived from the trained actual model parameters and the local private data (Poh left column at page 5, Section V.B, second full paragraph, teaches model adaptation [where a] background model is first trained on data samples aggregated from all patients. This model is also called a Universal background model or a world model. This model is then adapted with the training data of a particular patient in order to obtain a patient specific model. There are several ways to realize the adaptation, namely, maximum a posteriori adaptation, maximum likelihood linear regression, and adaptation via eigen vectors (the set of proxy data includes combinations of eigenvectors derived from trained actual model parameters and the local private data); Examiner notes the scalar value of the eigen vector is an eigen value, and accordingly, are derived on the model adaptation of Poh).
Yao, Stojanovic, and Poh are from the same or similar field of endeavor. Yao and Stojanovic are from the same or similar field of endeavor. Yao teaches prediction models generated from multiple sources to produce deconstructed components derived from statistical features of prediction models. Stojanovic teaches generating a data set from statistical representations, that in turn are used to train a model. Poh teaches healthcare modeling to render a model as patient-specific. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to implement the teachings of the combination of Yao and Stojanovic pertaining model training based on proxy data with the model adaptation of Poh.
The motivation for doing so is to help clinicians make better use of their time to processes data through more adequate data processing and analytical tools. (Poh, Abstract).
Regarding claim 15, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Though the combination of Yao and Stojanovic teaches creating MCL components that are deconstructed components derived from statistical features of predictions models, where each of the components represents a statistical feature that pertains to one or more variables provided in a data set that generates proxy data for training a proxy model, the combination of Yao and Stojanovic, however, does not explicitly teach -
wherein the set of proxy data includes combinations of eigenvectors derived from the trained actual model parameters and the local private data.
But Poh teaches wherein the set of proxy data includes combinations of eigenvectors derived from the trained actual model parameters and the local private data (Poh left column at page 5, Section V.B, second full paragraph, teaches model adaptation [where a] background model is first trained on data samples aggregated from all patients. This model is also called a Universal background model or a world model. This model is then adapted with the training data of a particular patient in order to obtain a patient specific model. There are several ways to realize the adaptation, namely, maximum a posteriori adaptation, maximum likelihood linear regression, and adaptation via eigenvectors (the set of proxy data includes combinations of eigenvectors derived from trained actual model parameters and the local private data)).
Yao, Stojanovic, and Poh are from the same or similar field of endeavor. Yao and Stojanovic are from the same or similar field of endeavor. Yao teaches prediction models generated from multiple sources to produce deconstructed components derived from statistical features of prediction models. Stojanovic teaches generating a data set from statistical representations, that in turn are used to train a model. Poh teaches healthcare modeling to render a model as patient-specific. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to implement the teachings of the combination of Yao and Stojanovic pertaining model training based on proxy data with the model adaptation of Poh.
The motivation for doing so is to help clinicians make better use of their time to processes data through more adequate data processing and analytical tools. (Poh, Abstract).
Regarding claim 16, the combination of Yao, Stojanovic, and Poh teaches all of the limitations of claim 15, as described in detail above.
Poh teaches wherein the set of proxy data comprises linear combinations of the eigenvectors. (Poh left column at page 5, Section V.B, second full paragraph, teaches model adaptation [where a] background model is first trained on data samples aggregated from all patients. This model is also called a Universal background model or a world model. This model is then adapted with the training data of a particular patient in order to obtain a patient specific model. There are several ways to realize the adaptation, namely, maximum a posteriori adaptation, maximum likelihood linear regression (proxy data comprises linear combinations of the eigenvectors), and adaptation via eigenvectors).
8.	Claim 17 is rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20130085773 to Yao et al. [hereinafter Yao] and Stojanovic et al., “Modeling Healthcare Quality via Compact Representations of Electronic Health Records,” IEEE/ACM transactions on Computational Biology and Bioinformatics (May 2016) [hereinafter Stojanovic], and further in view of Poh et al., “Challenges in Designing an Online Healthcare Platform for Personalised Patient Analytics,” pp. 1-6 (IEEE 2014) [hereinafter Poh] and Mitra et al., “Eigen-Profiles of Spatio-Temporal Fragments for Adaptive Region-Based Tracking,” pp. 1497-1500 (IEEE 2012) [hereinafter Mitra].
Regarding claim 17, the combination of Yao, Stojanovic, and Poh teaches all of the limitations of claim 15, as described in detail above.
	However, the combination of Yao, Stojanovic, and Poh does not explicitly teach wherein the eigenvectors include at least one of the following: an eigenpatient, an eigenprofile, an eigendrug, an eigenhealth record, an eigengenome, an eigenproteome, an eigenRNA profile, and an eigenpathway.
	But Mitra teaches wherein the eigenvectors include at least one of the following: an eigenpatient, an eigenprofile, an eigendrug, an eigenhealth record, an eigengenome, an eigenproteome, an eigenRNA profile, and an eigenpathway (Mitra, right column at page 1497, Section I, first full paragraph, teaches a [general] novel space-time descriptor which we call an Eigenprofile (eigenprofile). Estimation of [eigen profile] is equivalent to joint diagonalization of these covariance matrices and they form a matrix of orthonormal vectors).
	Yao, Stojanovic, Poh, and Mitra are from the same or similar field of endeavor. Yao and Stojanovic are from the same or similar field of endeavor. Yao teaches prediction models generated from multiple sources to produce deconstructed components derived from statistical features of prediction models. Stojanovic teaches generating a data set from statistical representations, that in turn are used to train a model. Poh teaches healthcare modeling to render a model as patient-specific. Mitra teaches a novel space-time descriptor called an Eigenprofile. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to implement the teachings of Yao, Stojanovic, and Poh pertaining to model training based on private data and synthetic data and modeling with the space-time Eigenprofile of Mitra.
	The motivation for doing so is to incrementally build models for a target using an Eigenprofile for greater reliability and accuracy of the model. (Mitra right column at page 1497, Section 1, first paragraph).
9.	Claims 23-28 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20130085773 to Yao et al. [hereinafter Yao] in view of Stojanovic et al., “Modeling Healthcare Quality via Compact Representations of Electronic Health Records,” IEEE/ACM transactions on Computational Biology and Bioinformatics (May 2016) [hereinafter Stojanovic], and US Published Application 20090037351 to Kristal et al. [hereinafter Kristal].
Regarding claim 23, the combination of Yao and Stojanovic teaches all of the limitations of 1 claim.
Though Yao and Stojanovic teach the feature of a similarity score for validating the accuracy of a generated model, the combination of Yao and Stojanovic does not explicitly teach -
wherein the similarity score is determined based on a cross validation of the trained proxy model
But Kristal teaches -
wherein the similarity score is determined based on a cross validation of the trained proxy model (Kristal FIG. 9, which shows simulation results (that is, the results being a similarity score) based on the method of FIG. 2, in which machine learning algorithm comparisons are made in view of synthetic data (proxy model parameters) in contrast to “without synthetic data” (trained actual model parameters) (that is similarity score is determined based on a cross validation of the proxy model)).
Yao, Stojanovic, and Kristal are from the same or similar field of endeavor. Yao teaches prediction models generated from multiple sources to produce deconstructed components derived from statistical features of prediction models. Stojanovic teaches generating a data set from statistical representations, that in turn are used to train a model. Kristal teaches neural network training with and without synthetic data from multiple sources. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to modify the combination of Yao and Stojanovic with the cross-validation of Kristal.
The motivation for doing so is to improve the efficiency of replacing or augmenting training data with the training data sets are too limited in size, scope or quality to otherwise generate accurate predictions. (Kristal, Abstract).
Regarding claim 24, the combination of Yao, Stojanovic and Kristal teaches all of the limitations of claim 23, as described in detail above. 
Kristal further teaches -
wherein the cross validation includes an internal cross validation on a portion of the set of proxy data (Kristal ¶ 0045 teaches comparison (internal cross validation) of guesstimates, votes, etc. between voters (on a portion of the proxy data) may be used to determine if there are outliers).
Regarding claim 25, the combination of Yao, Stojanovic and Kristal teaches all of the limitations of claim 23, as described in detail above.
Kristal further teaches -
wherein the cross validation includes an internal cross validation of the local private data (Kristal ¶ 0044 teaches it may be possible to use the present invention to obtain multiple related data sets (of the local private data) for differential comparison (cross validation)).
Regarding claim 26, the combination of Yao, Stojanovic and Kristal teaches all of the limitations of claim 23, as described in detail above.
Kristal further teaches -
wherein the cross validation includes an external cross validation by a different one of the plurality of private data servers on its local private data (Kristal FIG. 9 teaches that simulation results based on the method of FIG. 2 where machine learning algorithm comparisons are made in view of synthetic data (proxy model parameters) in contrast to “without synthetic data” (trained actual model parameters), accordingly, conducting an external cross validation by a different one of the plurality of private servers on its local private data because the synthetic data is external to the instances “without synthetic data”, or rather actual private data of a server).
Regarding claim 27, the combination of Yao, and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Though Yao and Stojanovic teach the feature of a similarity score for validating the accuracy of a generated model, the combination of Yao and Stojanovic does not explicitly teach -
wherein the model similarity score comprises a difference between an accuracy measure of the trained proxy model and an accuracy measure of the trained actual model.
But Kristal teaches -
wherein the model similarity score comprises a difference between an accuracy measure of the trained proxy model and an accuracy measure of the trained actual model (Kristal ¶ 0058 & FIG. 9 teaches that synthetic data (proxy model parameters) are beneficial primarily when unassisted algorithms (without synthetic data, which is trained actual model parameters) make somewhat accurate predictions, here < 40% (difference between an accuracy measure of the trained actual model parameters and the proxy model parameters)).
Yao, Stojanovic, and Kristal are from the same or similar field of endeavor. Yao teaches prediction models generated from multiple sources to produce deconstructed components derived from statistical features of prediction models. Stojanovic teaches generating a data set from statistical representations, that in turn are used to train a model. Kristal teaches neural network training with and without synthetic data from multiple sources. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to modify the combination of Yao and Stojanovic with the cross-validation of Kristal.
The motivation for doing so is to improve the efficiency of replacing or augmenting training data with the training data sets are too limited in size, scope or quality to otherwise generate accurate predictions. (Kristal, Abstract).
Regarding claim 28, the combination of Yao and Stojanovic teaches all of the limitations of claim 1, as described in detail above.
Though Yao and Stojanovic teaches scoring and validating the resulting model, the combination of Yao and Stojanovic, however, does not explicitly teach - 
wherein the model similarity score comprises a metric distance calculated from the trained actual model parameters and the proxy model parameters.
But Kristal teaches -
wherein the model similarity score comprises a metric distance calculated from the trained actual model parameters and the proxy model parameters (Kristal ¶ 0077 & FIG. 13 teaches results produced by the DECORATE algorithm . . . showing that input of human guesstimates (proxy model parameters) markedly (metric distance calculated) reduces errors [to unassisted ML] (trained actual model parameters)).
Yao, Stojanovic, and Kristal are from the same or similar field of endeavor. Yao teaches prediction models generated from multiple sources to produce deconstructed components derived from statistical features of prediction models. Stojanovic teaches generating a data set from statistical representations, that in turn are used to train a model. Kristal teaches neural network training with and without synthetic data from multiple sources. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to modify the combination of Yao and Stojanovic with the cross-validation of Kristal.
The motivation for doing so is to improve the efficiency of replacing or augmenting training data with the training data sets are too limited in size, scope or quality to otherwise generate accurate predictions. (Kristal, Abstract).
Response to Argument
10.	With regard to the rejection under Section 102, Applicant argues “[a]s amended herein, independent claim 32 includes the subject matter of former claim 35, which was not anticipated by Yao.” (Response at p. 12).
Examiner agrees. Claim 35 was rejected under Section 103 as being unpatentable over Yao in view of Stojanovic. Accordingly, the instant claim 32 overcomes the rejection under Section 102 as being anticipated by Yao.
11.	Applicant argues “Yao and Stojanovic, as cited by the Office, do not each or suggest transmitting the set of proxy data to a non-private computing device based on aa model similarity score satisfying at least one transmission criterions, as recited by amended independent claims 1 and 32.” (Response at p. 14). Applicant argues Yao does not provide a teaching as to the features of Applicant’s claims, presenting its interpretation by the following:

    PNG
    media_image6.png
    738
    395
    media_image6.png
    Greyscale

Applicant argues “Yao teaches deconstructing a machine learning model to extract statistical features.” (Response at p. 15).
Examiner respectfully disagrees with Applicant’s characterization of Yao, which teaches a method for generating prediction models from multiple healthcare centers. (Yao, Abstract). The table below illustrates Applicant’s characterization of Yao with respect to Figure 3 of Yao: 
Applicant’s Interpretation of Yao
Fig. 3 of Yao

    PNG
    media_image7.png
    744
    408
    media_image7.png
    Greyscale


    PNG
    media_image8.png
    468
    461
    media_image8.png
    Greyscale


Applicant appears to apply a broadest reasonable interpretation standard for assessing subject matter disclosed by Yao; however, the broadest reasonable interpretation is a standard used for construing claim language. When considering prior art such as Yao, the “decision maker must consider what it teaches or fairly suggests to the skilled artisan.” Ex parte LeBeouf , App. No. 2020-003511 (PTAB 25 January 2021). Here, Yao is relied upon, inter alia, for generating a plurality of private data statistical distributions from the local private data.
Applicant’s claims merely recite “generate a plurality of private data statistical distributions from the local private data where the plurality of private data statistical distributions represents the local private data in aggregate used to create the trained actual model and does not include individual elements of the local private data; . . . “ (Claim 1, ll. 20-24; see also claim 32, ll. 12-16, claim 35, ll. 11-14). Applicant’s claims do not recite the manner or technique in which such private data statistical distributions are generated. 
Examiner notes claim 13 recites a list of resultant private data statistical distribution types, but not a process and/or methodology of arriving at these distribution types. (see also PGPUB 0083). Stojanovic is relied upon as teaching these types of statistical distributions, as set out in the rejections above in detail.
Moreover, the specification recites that various processes or methodologies may be implemented. For example, “data distributions may be manually constructed (e.g., a histogram, a probability density function, etc.). In some other embodiments, data distributions may be based on rates of change, and/or higher order derivatives (e.g., moments)). (PGPUB ¶ 0084). 
Also, neither Applicant’s claims nor the specification define the term “local private data.” Further, neither the claims nor the specification define the formats or constructs “local private data” may take. 
Under a broadest reasonable interpretation, words of the claim must be given their plain meaning, unless such meaning is inconsistent with the specification. MPEP § 2173.01.I (“Broadest Reasonable Interpretation”). Local private data is simply data that is relevant to a health outcome of interest. (see Yao ¶ 0008)
Thus, Yao teaches “local private data” accessible via a Model Deconstruction and Transfer (MDT) platform. Figure 3 of Yao teaches “PHC-A Data” (Examiner annotation in dashed boxes):

    PNG
    media_image9.png
    461
    446
    media_image9.png
    Greyscale

(Yao ¶ 0032). In particular, as set out in detail in the rejections above, Yao teaches that “[b]y entering the data sets in a Model Deconstruction and Transfer (MDT) platform, a healthcare center may provide data to a third party without the need to de-identify data or to physically transfer any identifying or de-identified data from the healthcare center.” (Yao, Abstract). Such a plain meaning of “local private data” is not inconsistent with the specification.
Moreover, the rejection clearly sets forth which claim limitations are taught by each of the prior art references, and the reason why it would be obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to combine their teachings, and Applicant has not explained why the cited prior art references cannot be combined in the manner set forth in the rejection.
12.	Regarding former claims 29 and 30, which have been incorporated into the base claims, Applicant argues Yao “does not suggest that using subsets of predictive values to define patient populations includes transmitting a set of proxy data in response to a model similarity score satisfying a transmission criterion. Yao merely defines patient populations using subsets of the predictive values. The patient populations may be enriched for a certain trait, prognosis or outcome, but defining the patient populations does not include using a model similarity score and transmission criteria to determine whether to transmit a set of proxy data.” (Response at p. 16 (emphasis added)).
Examiner respectfully disagrees with Applicant’s characterization of Yao because the instant claims do not recite limitations that Applicant submits as distinguishing over Yao. 
Generally, Yao teaches method for generating prediction models from multiple healthcare centers that allows a third party to use data sets from multiple sources to build prediction models. In Yao, a healthcare center may provide data to a third party without the need to de-identify data or to physically transfer any identifying or de-identified data from the healthcare center. (Yao, Abstract). 
The term “transmitted” is not defined by the Applicant’s claims or specification, simply that data is moved through a network. Similarly, the term “model similarity score” is not defined by the Applicant’s claims or specification, simply that the model similarity satisfies a similarity requirement. (PGPUB ¶ 0018). Also, the specification does not explicitly recite “in response to a model similarity score.” The specification does, however, recite “[i]f similarity score 280 satisfies the similarity criteria . . . , modeling engine 226 can then transmit information about the knowledge gained from the effort. More specifically, for example, once the similarity criteria has been satisfied, modeling engine 226 can transmit . . . one or more of proxy data 260, proxy model parameters 275, similarity score 280, or other information to a non-private computing device located over network 215. This approach, as discussed previously, allows for a researcher to gain knowledge about private data 222 without compromising its privacy or security.” (PGPUB ¶ 0080). Accordingly, Examiner construes “in response to a model similarity score” as an indication that the model is at a state for transmission / sharing as is also taught by the AUC of Yao. 
The language of now cancelled claim 29 is merely that “wherein the set of proxy data is transmitted when the function of the model similarity score satisfies at least one transmission criterion.” The language of now cancelled claim 30 is merely that “at least one transmission criterion include at least one of the following conditions relating to the model similarity score: a threshold condition, a multi-valued condition, a change in value condition, a trend condition, a human command condition, an external request condition, and a time condition.”
Yao ¶¶ 0061 & 0062 are relied upon as teaching these broad and general limitations, as set out above in detail hereinabove. Moreover, the rejections clearly set forth which claim limitations are taught by each of the prior art references, and the reasons why it would be obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to combine their teachings. Applicant has not explained why the cited prior art references cannot be combined in the manner set forth in the rejection.
13.	Applicant argues “Yao fails to teach or suggest that the sharing is in response to a determination that the performance measurement of the validated prediction model (PMo) satisfies a transmission criteria.” (Response at p. 17).
Examiner respectfully disagrees because neither the claims nor the specification define the term “transmit.” Further, the plain and ordinary meaning of “transmit” is “to send or convey.”13 Also, synonymous “transmit” is “share” (Yao ¶ 0056), “send” (Yao ¶ 0057), “transfer” (Yao ¶ 0052), (Yao ¶ 0052), etc., which for Yao is a distributed network in the form of a PHC regional healthcare network (or non-PHC regional healthcare network). (see Yao ¶ 0059). 
For a “transmission criterion,” Applicant appears to argue there is a condition precedent for transmission by the claims; however, the claims merely recite “in response to a determination that the model similarity score satisfies the at least one transmission criterion.” (Claim 1, ll. 37-40). Instead, the determining limitation merely recites that “at least one transmission criterion includes at least one of the following conditions relating to the model similarity score.” (Claim 1, ll. 31-36). 
Moreover, the rejection clearly sets forth which claim limitations are taught by each of the prior art references, and the reason why it would be obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to combine their teachings, and Applicant has not explained why the cited prior art references cannot be combined in the manner set forth in the rejection.
14.	Applicant argues Stojanovic “fails to suggest determining "whether the model similarity score satisfies at least one transmission criterion, wherein the at least one transmission criterion includes at least one of the following conditions relating to the model similarity score: . . . .” (Response at p. 18).
Examiner notes that Yao is cited as teaching these features. Moreover, the rejection clearly sets forth which claim limitations are taught by each of the prior art references, and the reason why it would be obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to combine their teachings, and Applicant has not explained why the cited prior art references cannot be combined in the manner set forth in the rejection.
Conclusion
15.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
16.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
(He et al., “CRFs based de-identification of medical records,” (2015)) teaches a novel de-identifier, WI-deId, based on conditional random fields (CRFs). A preprocessing module, which tokenizes the medical records using regular expressions and an off-the-shelf tokenizer, is introduced, and three groups of features are extracted to train the de-identifier model..
17.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN L. SMITH whose telephone number is (571) 272-5964. Normally, the examiner is available on Monday-Thursday 0730-1730. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USSPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KAKALI CHAKI can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/K.L.S./
Examiner, Art Unit 2122
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122                                                                                                                                                                                                        


    
        
            
        
            
    

    
        1 Examiner notes patient data by definition is a “restricted feature” because of federal/state statutory and regulatory requirements. Under Federal law, “‘HIPAA’ . . . refers to the HIPAA Privacy and Security rules promulgated under the Health Insurance Portability and Accountability Act of 1996. 45 C.F.R. § 164 (2013). The Privacy Rule was published in December 2000 but modified in August 2002.” (See Nicolas P. Terry, “Big Data Proxies and Health Privacy Exceptionalism,” 24 Health Matrix (2014) at p. 66, n. 3).
        2 US Published Application 20180018590 to Szeto et al., entitled “Distributed Machine Learning Systems, Apparatus, and Methods, filed 17 July 2017 [hereinafter PGPUB].
        3 Stojanovic cites to Mikolov et al., “Distributed Representations of Words and Phrases and their Compositionality,” NIPS (2013).
        4 Examiner notes patient data by definition is a “restricted feature” because of federal/state statutory and regulatory requirements. Under Federal law, “‘HIPAA’ . . . refers to the HIPAA Privacy and Security rules promulgated under the Health Insurance Portability and Accountability Act of 1996. 45 C.F.R. § 164 (2013). The Privacy Rule was published in December 2000 but modified in August 2002.” (See Nicolas P. Terry, “Big Data Proxies and Health Privacy Exceptionalism,” 24 Health Matrix (2014) at p. 66, n. 3).
        5 US Published Application 20180018590 to Szeto et al., entitled “Distributed Machine Learning Systems, Apparatus, and Methods, filed 17 July 2017 [hereinafter PGPUB].
        6 Examiner notes patient data by definition is a “restricted feature” because of federal/state statutory and regulatory requirements. Under Federal law, “‘HIPAA’ . . . refers to the HIPAA Privacy and Security rules promulgated under the Health Insurance Portability and Accountability Act of 1996. 45 C.F.R. § 164 (2013). The Privacy Rule was published in December 2000 but modified in August 2002.” (See Nicolas P. Terry, “Big Data Proxies and Health Privacy Exceptionalism,” 24 Health Matrix (2014) at p. 66, n. 3).
        7 US Published Application 20180018590 to Szeto et al., entitled “Distributed Machine Learning Systems, Apparatus, and Methods, filed 17 July 2017 [hereinafter PGPUB].
        8 Stojanovic cites to Mikolov et al., “Distributed Representations of Words and Phrases and their Compositionality,” NIPS (2013).
        9 Stojanovic cites to Mikolov et al., “Distributed Representations of Words and Phrases and their Compositionality,” NIPS (2013).
        10 Examiner notes patient data by definition is a “restricted feature” because of federal/state statutory and regulatory requirements. Under Federal law, “‘HIPAA’ . . . refers to the HIPAA Privacy and Security rules promulgated under the Health Insurance Portability and Accountability Act of 1996. 45 C.F.R. § 164 (2013). The Privacy Rule was published in December 2000 but modified in August 2002.” (See Nicolas P. Terry, “Big Data Proxies and Health Privacy Exceptionalism,” 24 Health Matrix (2014) at p. 66, n. 3).
        11 US Published Application 20180018590 to Szeto et al., entitled “Distributed Machine Learning Systems, Apparatus, and Methods, filed 17 July 2017 [hereinafter PGPUB].
        12 Stojanovic cites to Mikolov et al., “Distributed Representations of Words and Phrases and their Compositionality,” NIPS (2013).
        13 Merriam-Webster Online Dictionary “transmit” at < https://www.merriam-webster.com/dictionary/transmit>