DETAILED ACTION
This office action is in response to Applicant’s submission filed on 2/17/2020. Claims 1-20 are pending in the application. As such, claims 1- 20 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1 - 6, 9 - 20, are rejected under 35 U.S.C. 103 as being unpatentable over Huang Li et al. (""Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records")(herein "Huang"), Feng et al. (US20170220949A1) (herein "Feng").

Regarding claims 1, 13, and 18, Huang teaches receiving a plurality of local machine learning model data objects from a plurality of model data object provider agents, wherein (i) each local machine learning model data object is received from a corresponding model data object provider agent of the plurality of model data object provider agents, (Huang, ABS:” we introduced a community-based federated machine learning [CBFL] algorithm and evaluated it on non-IID ICU EMRs. Our algorithm clustered the distributed data into clinically meaningful communities that captured similar diagnoses and geographical locations, and learnt one model for each community.”, and Page 2, Introduction: “we proposed a community-based federated learning [CBFL] algorithm that clustered EMR data into several communities and simultaneously trained one model per community, so that the learning process became markedly more efficient than FL. “, and Section 1, page 1: “These concerns can be addressed by FL that keeps both data and computation local in distributed silos and aggregates locally computational results to train a global predictive model”, and section 2.1, Page 2:” CBFL was developed based on the eICU collaborative research database [35], which contains highly granular critical care data of 200,859 patients admitted to 208 hospitals from across the United States. ", and Section 2.2, page 3:” During encoder training, each client [that is, hospital] learnt a denoising autoencoder fautoencoder initialized with Wo,autoencoder for E1 epochs and returned only the trained weights of encoder Wc 1,encoder to the server for average.”) Note: each local machine learning model data [trained weights] shared or received by the server.
(ii) each local machine learning model data object is associated with an inference identifier of one or more inference identifiers, and (Huang, Title:” Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records", and section 2.1, 2.2, 3.3, 3.4 "mortality, ICU stay time, drug features", Section 2.1:” Mortality: unit discharge status that specifies patients’ survival/mortality [0 for alive and 1 for expired]. ICU stay time: unit discharge offset that records the number of days from unit admission to discharge [with an average of 64 h, that is, 2.7 days]. This would be the dependent variable in the unit-stay time prediction task. drug features: drugs administered on patients during the first 48 h of ICU stay [1399 binary drug features in total] … These drug features would be used as independent variables in the prediction tasks of our study. The proposed CBFL algorithm and the baseline FL algorithm would use drug features as predictors to forecast mortality and ICU stay time of critical care patients. We chose medication as predictors rather than other variables [such as age, gender and diagnosis] because the eICU database contained highly dimensional drug information.”, section 3.3, Page 7: “Mortality was predicted based on patients’ prescribed drug features. The training dataset was formed by randomly selecting 400 patients from each of the 50 hospitals, and thus had a size of 20,000 examples; the test dataset contained the remaining 160 patients from each hospital, totaling 8000 examples. All patients were labeled with their unit discharge status [1 for mortality and 0 for alive]. Evaluation metrics included not only predictive accuracy on the mortality prediction task [that is, forecasting the probability of mortality base on drugs prescribed to patients] in terms of ROC AUC and PR AUC, but also the number of communication rounds between the server and hospitals to complete the training process.”, and Section 2.3:” Each one of the community models f1, f2, … ,fK [and the baseline FL model]…)
(iii) each local machine learning model data object is associated with at least one of one or more model provider demographic profiles; (Huang, ABS: “Our algorithm clustered the distributed data into clinically meaningful communities that captured similar diagnoses and geographical locations, and learnt one model for each community.”, Section 3.6: “The abovementioned evaluation results reveal that CBFL had better predictive accuracy in fewer communication rounds than FL in both mortality and stay time prediction tasks. Communities tended to accommodate patients of similar diagnoses and geographical locations, making individual community models on average easier to learn than one model for all patients. In this section, we took CBFL with five communities for mortality prediction as an example to investigate and illustrate the performance differences of each community model.”, and Section 4:” Among these dimensions, CBFL focused primarily on admission diagnoses for patients’ unit stay and also on geographical locations of hospitals. By clustering patients of common features into the same community and learning separate models for individual communities, the algorithm converged to higher predictive accuracy in fewer communication rounds than the baseline FL model in both mortality and stay time prediction tasks. Clustering also made prediction results interpretable: analyzing the distances between communities could help explain why prediction on some examples was more reliable than on others [refer to Table 8 for an example]. Moreover, unlike other optimization algorithms for federated learning on non-IID data that needed a fraction of all data to be shared across the clients, CBFL did not require hospitals to share their patient data with each other or the server at all, thereby keeping privacy intact.”, and section 1: “ we proposed a community-based federated learning (CBFL) algorithm that clustered EMR data into several communities and simultaneously trained one model per community, so that the learning process became markedly more efficient than FL. Success of data clustering [albeit being centralized analyses] has been reported in previous medical studies …. we demonstrate the application of decentralized clustering together with federated machine learning to make predictions on ICU EMRs.”, and section 2.2:” During community-based learning, the server initialized a series of K neural network models f1, f2, …, fK with the same weights w0; each client received all K models from the server and learnt each model on its full data for E2. Meanwhile, fencoder and fkmeans were used to determine which cluster each example belonged to. The size of clusters was denoted mc1, mc2, …, mck and returned together with the learnt weights to the server, where each model was updated by taking the weighted average of s based on mc1, mc2, …, mck. The updated models were sent to each client for the next round of training. This community based learning process was repeated until the algorithm converged. The convergence condition was that the weights of the server-side global model converged to specific values, or that the number of maximum communication rounds was reached. Given a test example, CBFL would firstly convert its features into encodings by fencoder, then define its community by fkmeans and finally use the corresponding community model to make prediction. These three procedures of CBFL are visualized in Fig. 3.”, and table 4, demonstrate details of each communities.”, and section 3.2:” Patient clustering was a key step in our algorithm: since patients with similar features were grouped together, community-based learning [that is, learning an independent model on each community] would be easier than learning one whole model on all patients[...] geographical bias could be found … Moreover, geographical bias could be found: Community 1 had 15 hospitals located in the Midwest [nine], the South [five] and the …”).
for each inference-profile pair of a plurality of inference-provider pairs that is associated with a corresponding inference identifier and a corresponding model profile of the one or more model provider demographic profiles, generating a global machine learning model data object of a plurality of global machine learning model data objects based at least in part on a related local model subset of the plurality of local machine learning model data objects for the inference- profile pair, (Huang, section 1:” These concerns can be addressed by FL that keeps both data and computation local in distributed silos and aggregates locally computational results to train a global predictive model.”, and section 2.2: “This community-based learning process was repeated until the algorithm converged. The convergence condition was that the weights of the server-side global model converged to specific values, or that the number of maximum communication rounds was reached.")
wherein the related local model subset for an inference-profile pair comprises local machine learning model data objects that are associated with the corresponding inference identifier for the inference-profile pair and the corresponding model provider profile for the inference-profile pair; and (Huang, section 3.2:” Patient clustering was a key step in our algorithm: since patients with similar features were grouped together, community-based learning [that is, learning an independent model on each community] would be easier than learning one whole model on all patients. To illustrate what common features were shared among patients in the same community, we clustered the 28,000 patients into five communities and carried out enrichment analysis of diagnoses in them. Table 4 lists the number of patients and overrepresented diagnoses with adjusted p-values within each community. It can be noted that every community exhibited a different focus:”, and Section 3.6: “The abovementioned evaluation results reveal that CBFL had better predictive accuracy in fewer communication rounds than FL in both mortality and stay time prediction tasks. Communities tended to accommodate patients of similar diagnoses and geographical locations, making individual community models on average easier to learn than one model for all patients. In this section, we took CBFL with five communities for mortality prediction as an example to investigate and illustrate the performance differences of each community model.”, and section 4: “Patients admitted to ICUs come from diverse ethnic and age groups, exhibit various levels of vital sign measurements and illness severity, and receive different diagnoses and treatment [35]. Among these dimensions, CBFL focused primarily on admission diagnoses for patients’ unit stay and also on geographical locations of hospitals. By clustering patients of common features into the same community and learning separate models for individual communities, the algorithm converged to higher predictive accuracy in fewer communication rounds than the baseline FL model in both mortality and stay time prediction tasks. Clustering also made prediction results interpretable: analyzing the distances between communities could help explain why prediction on some examples was more reliable than on others [refer to Table 8 for an example]. Moreover, unlike other optimization algorithms for federated learning on non-IID data that needed a fraction of all data to be shared across the clients, CBFL did not require hospitals to share their patient data with each other or the server at all, thereby keeping privacy intact.”)
providing, based at least in part on the plurality of global machine learning model data objects, a demographic-aware predictive data analysis [[application programming interface (API), wherein the demographic-aware predictive data analysis API]] is accessible by the plurality of model data object provider agents. (Huang, section 2.2, page 3: “This community-based learning process was repeated until the algorithm converged. The convergence condition was that the weights of the server-side global model converged to specific values, or that the number of maximum communication rounds was reached. Given a test example, CBFL would firstly convert its features into encodings by fencoder, then define its community by fkmeans and finally use the corresponding community model to make prediction. These three procedures of CBFL are visualized in Fig. 3.”, and section 3.7:” The aforementioned evaluation was mainly based on ROC AUCs that required computation of TPR and FPR at various thresholds between 0 and 1. To enable clinical use of CBFL, a single threshold should be chosen and two methods of defining the appropriate value are recommended. One is related to prevalence of mortality [p] in the population. Firstly, p should be estimated from training examples, on which the CBFL model is learnt. Then, CBFL is used to generate the training examples’ prediction scores, the [(1-p) × 100]th percentile of which should be selected as the threshold. In our data, 1395 out of the 28,000 critical care patients expired and so 1 - p approximately equaled 95%. The 95th percentile of the prediction scores given by CBFL was 0.1741. If a patient’s score surpassed this threshold, he/she would have a mortality prediction, or otherwise a survival prediction. Table 10 shows the confusion matrix that summarizes predictions of the 8000 test examples.”) Note section 3.7 has further detailed out predictive analysis to evaluate performance.
Huang fails to explicitly disclose, however, Feng teach data analysis using application programming interface (API) (Feng, Par. 0025: “FIG. 12 illustrates an exemplary application program interface [API] for distributed machine learning, according to an embodiment of the present teaching”, and Par. 0085:” In another embodiment, a user may write a program based on an application program interface [API] associated with the system, and utilize the program to define configuration parameters for distributed machine learning. FIG. 12 illustrates an exemplary API 1210 for distributed machine learning, according to an embodiment of the present teaching. The API 1210 is an example of an interface that includes a set of routines, protocols, and tools for building software applications related to machine learning based on the Caffe library in Spark framework. FIG. 13 illustrates an exemplary program 1310 written by a user for distributed machine learning, according to an embodiment of the present teaching.”)
Feng, further teaches [A computer-implemented for performing demographic-aware federated machine learning, the computer-implemented method comprising:- claim 1] and [An apparatus for performing demographic-aware federated machine learning, the apparatus comprising at least one processor and at least one memory including program code, the at least one memory and the program code configured to, with the processor, cause the apparatus to at least:- claim 13] and [A computer program product for performing demographic-aware federated machine learning, the computer program product comprising at least one non-transitory computer- readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions configured to:- claim 18] (Feng, Par. 0091: “The computer 1500 also includes a central processing unit [CPU] 1520, in the form of one or more processors, for executing program instructions. … “, and Par. 0092: “Program aspects of the technology may be thought of as ‘products’ or ‘articles of manufacture’ typically in the form of executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Tangible non-transitory ‘storage’ type media include any or all of the memory or other storage for the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, …”, and Par. 0094: “… or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a physical processor for execution.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Huang in view of Feng to perform data analysis using application program interface, perform demographic-aware federated machine learning, the computer-implemented method, an apparatus, and a computer program product, in order to provide the trained model to the user as a response to the user request, as evidence by Feng (see Par. 0060).

Regarding claim 2 and 14 Huang teaches wherein determining the one of one or more model provider demographic profiles comprises: identifying demographic data for the plurality of model data object provider agents, wherein the demographic data comprises demographic schema data and demographic attribute value data; determining, based at least in part on the demographic schema data, one or more profiled demographic attributes for the plurality of model data object provider agents; and determining, based at least in part on the demographic attribute value data and the one or more profiled demographic attributes, the one of one or more model provider demographic profiles, wherein each model provider demographic profile is associated with a profiled demographic attribute value range for each profiled demographic attribute of one or more profiled demographic attributes. (Huang, Section 2.2:” During encoder training, each client [that is, hospital] learnt a denoising autoencoder fautoencoder initialized with Wo,autoencoder for E1 epochs and returned only the trained weights of encoder Wc 1,encoder to the server for average. Here, N, c, nc, and fencoder denoted the total number of examples, the client index, the size of each client, and the averaged encoder, respectively.”, and section 3.2:” Patient clustering was a key step in our algorithm: since patients with similar features were grouped together, community-based learning [that is, learning an independent model on each community] would be easier than learning one whole model on all patients. To illustrate what common features were shared among patients in the same community, we clustered the 28,000 patients into five communities and carried out enrichment analysis of diagnoses in them. Table 4 lists the number of patients and overrepresented diagnoses with adjusted p-values within each community. It can be noted that every community exhibited a different focus. In addition to the above cohort and community analyses considering the characteristics of patients, we further performed clustering at the hospital level to reveal distinctions between hospital communities. Fig. 5 visualizes the 50 hospitals [labeled with their eICU IDs] clustered into five communities on a PCA plot. Separation between communities can be easily recognized, and Communities 1 and 5 had a larger size than the rest three. Moreover, geographical bias could be found: Community 1 had 15 hospitals located in the Midwest [nine], the South [five] and the West [one] of the United States; Community 2 had seven hospitals, all situated in the South; Community 3 had eight hospitals, seven of which came from the West and one with unknown location; Community 4 had three Midwestern hospitals and two Western hospitals; Community 5 had 15 hospitals, one with unknown location and the others residing in the Northeast [three], the Midwest [five], the South [four], and the West [two]. In summary, Community 2 seemed to capture Southern hospitals only and Community 3 tended to accommodate hospitals from the West, while no notable bias was observed in the other communities. Supplementary Table 1 contains full information of each hospital. Combining patient clusters from individual hospitals could be achieved at the server level to further enhance the performance of CBFL. For instance, if Cluster 1 from Hospital 66 was closer to Cluster 2 from Hospital A than to other Hospital B’s clusters, then the two clusters could be combined into a single cluster; the new cluster would be used to train the server-side Community Models 1 and 2 so that the training data size was increased for both models. This could be implemented by comparing the centroids of each hospital’s clusters, which would require each hospital to send to the server not only the parameters of its community models but also its cluster centroids. Although additional hospital-to-server communication cost and extra computational cost at the server would be incurred, we regard this optimized clustering method as a valuable continuation of our study in the future.”)

Regarding claim 3, Huang teaches wherein: each model data object provider agent of the one or more model data object provider agents is associated with at least one of the one or more model provider demographic profiles; and the one or more model provider demographic profiles comprise a universal model provider demographic profile that is associated with each model data object provider agent of the one or more model data object provider agents. (Huang, Section 2.2:” During encoder training, each client [that is, hospital] learnt a denoising autoencoder fautoencoder initialized with Wo,autoencoder for E1 epochs and returned only the trained weights of encoder Wc 1,encoder to the server for average. Here, N, c, nc, and fencoder denoted the total number of examples, the client index, the size of each client, and the averaged encoder, respectively.”, and Section 2.2 page 3:” During community-based learning, the server initialized a series of K neural network models f1, f2, …, fK with the same weights w0; each client received all K models from the server and learnt each model on its full data for E2. Meanwhile, fencoder and fkmeans were used to determine which cluster each example belonged to. The size of clusters was denoted mc1, mc2, …, mck and returned together with the learnt weights to the server, where each model was updated by taking the weighted average of s based on mc1, mc2, …, mck. The updated models were sent to each client for the next round of training. This community based learning process was repeated until the algorithm converged. The convergence condition was that the weights of the server-side global model converged to specific values, or that the number of maximum communication rounds was reached. Given a test example, CBFL would firstly convert its features into encodings by fencoder , then define its community by fkmeans and finally use the corresponding community model to make prediction. These three procedures of CBFL are visualized in Fig. 3.”, and Section 3.2, page 5:” Patient clustering was a key step in our algorithm: since patients with similar features were grouped together, community-based learning [that is, learning an independent model on each community] would be easier than learning one whole model on all patients. To illustrate what common features were shared among patients in the same community, we clustered the 28,000 patients into five communities and carried out enrichment analysis of diagnoses in them. Table 4 lists the number of patients and overrepresented diagnoses with adjusted p-values within each community. It can be noted that every community exhibited a different focus: In addition to the above cohort and community analyses considering the characteristics of patients, we further performed clustering at the hospital level to reveal distinctions between hospital communities. Fig. 5 visualizes the 50 hospitals [labeled with their eICU IDs] clustered into five communities on a PCA plot. Separation between communities can be easily recognized, and Communities 1 and 5 had a larger size than the rest three. Moreover, geographical bias could be found: Community 1 had 15 hospitals located in the Midwest [nine], the South [five] and the West [one] of the United States; Community 2 had seven hospitals, all situated in the South; Community 3 had eight hospitals, seven of which came from the West and one with unknown location; Community 4 had three Midwestern hospitals and two Western hospitals; Community 5 had 15 hospitals, one with unknown location and the others residing in the Northeast [three], the Midwest [five], the South [four], and the West [two]. In summary, Community 2 seemed to capture Southern hospitals only and Community3 tended to accommodate hospitals from the West, while no notable bias was observed in the other communities. Supplementary Table 1 contains full information of each hospital. Combining patient clusters from individual hospitals could be achieved at the server level to further enhance the performance of CBFL. For instance, if Cluster 1 from Hospital 66 was closer to Cluster 2 from Hospital A than to other Hospital B’s clusters, then the two clusters could be combined into a single cluster; the new cluster would be used to train the server-side Community Models 1 and 2 so that the training data size was increased for both models. This could be implemented by comparing the centroids of each hospital’s clusters, which would require each hospital to send to the server not only the parameters of its community models but also its cluster centroids. Although additional hospital-to-server communication cost and extra computational cost at the server would be incurred, we regard this optimized clustering method as a valuable continuation of our study in the future.”).

Regarding claim 4, 15, and 19, Huang teaches wherein generating the plurality of global machine learning model data objects comprises: for each inference identifier of the one or more inference identifiers: generating a per-inference universal machine learning model data object for the inference identifier and a universal model provider demographic profile of the one or more model provider demographic profiles, (Huang, ABS: “… we introduced a community-based federated machine learning [CBFL] algorithm and evaluated it on non-IID ICU EMRs. Our algorithm clustered the distributed data into clinically meaningful communities that captured similar diagnoses and geographical locations, and learnt one model for each community.”
determining a per-inference statistical relevance measure for the per-inference universal machine learning model data object, determining whether the per-inference statistical relevance measure satisfies a statistical relevance measure threshold, (Huang, Section 2.4: “The ROC curve was produced by plotting the true positive rate [TPR or sensitivity] and the false positive rate [FPR or 1-specificity] at thresholds ranging from 0 to 1. The prediction scores [that is, the predicted probabilities of patient mortality] were compared with each threshold: if a patient’s score is above the threshold, he/she is predicted to be mortal, or otherwise alive. The confusion matrix is used to summarize the comparison results [see Table 2]. Among predicted mortalities, those patients who died fall into the true mortality category and those who survived are false mortalities. Among predicted survivals, those who died are false survivals and those who survived are true survivals…. Each point on the ROC curve corresponds to a pair of TPR and FPR for a threshold. Fig. 4 shows five example points for thresholds of 0, 0.01, 0.05, 0.1, and 1. The values of TPR and FPR are firstly computed from the confusion matrix for each threshold and then visualized on the ROC plot. The area under the ROC curve, ROC AUC, is the probability that the classifier will rank a random positive example over a random negative one in terms of prediction scores and regardless of what threshold is chosen. A perfect classifier has an AUC of 1.0, meaning that at any threshold the classifier will always distinguish positive examples from negative ones; a random classifier has an AUC of 0.5, indicating a random guess on which example is positive.)
in response to determining that the per-inference statistical relevance measure satisfies the statistical relevance measure threshold: (i) generating one or more per-inference-per- profile global machine learning model data objects each associated with the inference identifier and a corresponding non-universal model provider demographic profile of one or more non- universal model provider demographic profiles of the model provider demographic profiles, and  (Huang, Section 3.5:” To examine statistical significance of results in the four prediction tasks, we repeated experiments on FL for five times [with different random seeds for partitioning training/test data]. As for CBFL, because the five-community setting stroke a good balance between predictive performance and computational cost, we decided to carry out repeated experiments for CBFL with five communities only. Fig. 10 shows the 95% confidence intervals [dashed lines] and the average [solid lines] of ROC AUC values for FL and CBFL in the mortality and ICU-stay-time prediction tasks. CBFL consistently converged to higher AUCs at fewer communication rounds than FL. In addition, we performed paired t-test for equal means on the average AUCs of CBFL and FL, and obtained p-values less than 2.2e-16 for all four plots. By observing the figure and from the statistical tests, it can be concluded that CBFL statistically significantly outperformed the baseline.”, and Section 3.6:” … Communities tended to accommodate patients of similar diagnoses and geographical locations, making individual community models on average easier to learn than one model for all patients. In this section, we took CBFL with five communities for mortality prediction as an example to investigate and illustrate the performance differences of each community model. As shown in Table 9, Community 1 exhibited the highest ROC AUC of 0.7561 and Community 4 yielded the highest PR AUC of 0.2155, while Community 2, the only one underperforming FL, obtained the worst performance with a ROC AUC of 0.6179 and a PR AUC of 0.0773. This can be explained by the average distance of each community centroid to other community centroids on the PCA plot [see the third column of Table 9]: Community 2 was the furthest apart from the rest, with an average distance of 2.562. In addition, ROC curves for the community models were plotted in Fig. 11.”)
(ii) adopting the one or more per-inference-per-profile global machine learning model data objects and the per-inference universal machine learning data object as global machine learning model data objects for the inference identifier; and (Huang, Section 3.7: “To enable clinical use of CBFL, a single threshold should be chosen and two methods of defining the appropriate value are recommended. One is related to prevalence of mortality [p] in the population. Firstly, p should be estimated from training examples, on which the CBFL model is learnt. Then, CBFL is used to generate the training examples’ prediction scores, the [[1 – p] × 100]th percentile of which should be selected as the threshold. In our data, 1395 out of the 28,000 critical care patients expired and so [1 – p] approximately equaled 95%. The 95th percentile of the prediction scores given by CBFL was 0.1741. If a patient’s score surpassed this threshold, he/she would have a mortality prediction, or otherwise a survival prediction.”)
in response to determining that the per-inference statistical relevance measure fails to satisfy the statistical relevance measure threshold, adopting the per-inference universal machine learning model data object as a sole global machine learning model data object for the inference identifier among the plurality of global machine learning model data objects. (Huang, Section 3.5:” To examine statistical significance of results in the four prediction tasks, we repeated experiments on FL for five times [with different random seeds for partitioning training/test data]. As for CBFL, because the five-community setting stroke a good balance between predictive performance and computational cost, we decided to carry out repeated experiments for CBFL with five communities only. Fig. 10 shows the 95% confidence intervals [dashed lines] and the average [solid lines] of ROC AUC values for FL and CBFL in the mortality and ICU-stay-time prediction tasks. CBFL consistently converged to higher AUCs at fewer communication rounds than FL. In addition, we performed paired t-test for equal means on the average AUCs of CBFL and FL, and obtained p-values less than 2.2e-16 for all four plots. By observing the figure and from the statistical tests, it can be concluded that CBFL statistically significantly outperformed the baseline.”, and Fig. 10:” All results indicate that CBFL significantly outperformed FL by converging to higher AUCs at fewer communication rounds.” Note: Since CBFL outperforms significantly, it is practical to adopt CBFL (Universal machine learning model) as a sole global machine learning mode.

Regarding claim 5, Huang teaches wherein the per-inference statistical relevance measure for a per-inference universal machine learning model data object that is associated with an inference identifier of the one or more inference identifiers is determined based at least in part on an estimated attribute size of training data used to generate the per- inference universal machine learning model data object. (Huang, Section 3.2:” Combining patient clusters from individual hospitals could be achieved at the server level to further enhance the performance of CBFL. For instance, if Cluster 1 from Hospital 66 was closer to Cluster 2 from Hospital A than to other Hospital B’s clusters, then the two clusters could be combined into a single cluster; the new cluster would be used to train the server-side Community Models 1 and 2 so that the training data size was increased for both models. This could be implemented by comparing the centroids of each hospital’s clusters, which would require each hospital to send to the server not only the parameters of its community models but also its cluster centroids.)

Regarding claim 6, Huang teaches wherein the per-inference statistical relevance measure for a per-inference universal machine learning model data object that is associated with an inference identifier of the one or more inference identifiers is determined based at least in part on an estimated attribute size of training data used to generate the per- inference universal machine learning model data object. (Huang, Section 4: “Patients admitted to ICUs come from diverse ethnic and age groups, exhibit various levels of vital sign measurements and illness severity, and receive different diagnoses and treatment [35]. Among these dimensions, CBFL focused primarily on admission diagnoses for patients’ unit stay and also on geographical locations of hospitals. By clustering patients of common features into the same community and learning separate models for individual communities, the algorithm converged to higher predictive accuracy in fewer communication rounds than the baseline FL model in both mortality and stay time prediction tasks. Clustering also made prediction results interpretable: analyzing the distances between communities could help explain why prediction on some examples was more reliable than on others [refer to Table 8 for an example]. Moreover, unlike other optimization algorithms for federated learning on non-IID data that needed a fraction of all data to be shared across the clients, CBFL did not require hospitals to share their patient data with each other or the server at all, thereby keeping privacy intact.”)

Regarding claims 9, 16, and 20 Huang teaches wherein the demographic-aware [[predictive data analysis API]] is configured to: receive a predictive inference query from a first model data object provider agent of the plurality of model data object provider agents; determining a query-related subset of plurality of global machine learning model data objects that are associated with the predictive inference query; process the predictive inference query in accordance with the query-related subset to generate one or more per-model data object predictive inference outputs; and provide the one or more per-model data object predictive inference outputs to the first model data object provider agent. (Huang, Section 3.7: “To enable clinical use of CBFL, a single threshold should be chosen and two methods of defining the appropriate value are recommended. One is related to prevalence of mortality [p] in the population. Firstly, p should be estimated from training examples, on which the CBFL model is learnt. Then, CBFL is used to generate the training examples’ prediction scores, the [[1 – p] × 100]th percentile of which should be selected as the threshold. In our data, 1395 out of the 28,000 critical care patients expired and so [1 – p] approximately equaled 95%. The 95th percentile of the prediction scores given by CBFL was 0.1741. If a patient’s score surpassed this threshold, he/she would have a mortality prediction, or otherwise a survival prediction.”) Note: once a machine learning model is trained, it can be used to assess the performance of other input data and provide such output to the agent requesting such result.
Huang fails to explicitly disclose, however, Feng teach data analysis using application programming interface (API) (Feng, Par. 0025: “FIG. 12 illustrates an exemplary application program interface [API] for distributed machine learning, according to an embodiment of the present teaching”, and Par. 0085:” In another embodiment, a user may write a program based on an application program interface [API] associated with the system, and utilize the program to define configuration parameters for distributed machine learning. FIG. 12 illustrates an exemplary API 1210 for distributed machine learning, according to an embodiment of the present teaching. The API 1210 is an example of an interface that includes a set of routines, protocols, and tools for building software applications related to machine learning based on the Caffe library in Spark framework. FIG. 13 illustrates an exemplary program 1310 written by a user for distributed machine learning, according to an embodiment of the present teaching.”)
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Huang in view of Feng to perform data analysis using application program interface, in order to provide the trained model to the user as a response to the user request, as evidence by Feng (see Par. 0060).

Regarding claims 10, and 17, Huang teaches determining an inferred inference identifier of the one or more inferred inference identifiers that is associated with the predictive inference query; determining one or more related model provider demographic profiles of the one or more model provider demographic profiles that are associated with the first model data object provider agent; and determining the query-related subset based at least in part on each global machine learning model data object of the plurality of global machine learning model data objects that is associated with the inferred inference identifier and a related model provider demographic profile of the one or more related model provider demographic profiles. (Huang, Section 4: “Patients admitted to ICUs come from diverse ethnic and age groups, exhibit various levels of vital sign measurements and illness severity, and receive different diagnoses and treatment [35]. Among these dimensions, CBFL focused primarily on admission diagnoses for patients’ unit stay and also on geographical locations of hospitals. By clustering patients of common features into the same community and learning separate models for individual communities, the algorithm converged to higher predictive accuracy in fewer communication rounds than the baseline FL model in both mortality and stay time prediction tasks. Clustering also made prediction results interpretable: analyzing the distances between communities could help explain why prediction on some examples was more reliable than on others [refer to Table 8 for an example]. Moreover, unlike other optimization algorithms for federated learning on non-IID data that needed a fraction of all data to be shared across the clients, CBFL did not require hospitals to share their patient data with each other or the server at all, thereby keeping privacy intact.”)

Regarding claim 11, Huang teaches wherein: each model data object provider agent of the one or more model data object provider agents is associated with a medical service provider agent of one or more medical service provider agents. (Huang, section 2.1:” CBFL was developed based on the eICU collaborative research database [35], which contains highly granular critical care data of 200,859 patients admitted to 208 hospitals [agents] from across the United States. ", and Section 2.2, page 3:” During encoder training, each client [that is, hospital] learnt a denoising autoencoder fautoencoder initialized with Wo,autoencoder for E1 epochs and returned only the trained weights of encoder Wc 1,encoder to the server for average”)

Regarding claim 12, Huang teaches wherein each inference identifier of the one or more inference identifiers is associated with a medical predictive inference subject of one or more medical predictive inference subjects. (Huang, Section 2.1:” Our study mainly concerned with three variables. Mortality: unit discharge status that specifies patients’ survival/mortality [0 for alive and 1 for expired]. This would be the dependent variable in the mortality prediction task. ICU stay time: unit discharge offset that records the number of days from unit admission to discharge [with an average of 64 h, that is, 2.7 days]. This would be the dependent variable in the unit-stay-time prediction task. drug features: drugs administered on patients during the first 48 h of ICU stay [1399 binary drug features in total]. Table 1 shows the first three drug features: 2 mL of Metoclopramide HCL 5 mg/ml given in injection solution, 3 mL vial of insulin regular human 100 unit/ml given in injection solution, and Metoprolol Succinate ER 50 mg taken orally once per 24 h. If Drug i was prescribed to Patient j, Cell [i, j] in the table would become 1, and 0 otherwise. For instance, Patient 141,194 who received all three drugs had a feature vector of [1, 1, 1], whereas Patient 141,203 who took Metoprolol Succinate ER only had a feature vector of [0, 0, 1]. These drug features would be used as independent variables in the prediction tasks of our study. The proposed CBFL algorithm and the baseline FL algorithm would use drug features as predictors to forecast mortality and ICU stay time of critical care patients. We chose medication as predictors rather than other variables [such as age, gender and diagnosis] because the eICU database contained highly dimensional drug information. In contrast, both age and gender were a single dimensional variable and diagnosis had only dozens of dimensions. Prediction on medication most closely resemble the real case scenario of federated learning, since the technology was devised to tackle the challenge of big data in large volume and with high dimensionality.”)


Claims 7, and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Huang, Feng, and in further view of Yurochkin et al. “Bayesian Nonparametric Federated Learning of Neural Networks” (herein "Yurochkin").

Regarding claim 7, Huang fails to explicitly disclose, however, Yurochkin teaches wherein generating each global machine learning model data object of the plurality of global machine learning model data objects based at least in part on the related local model subset of the plurality of local machine learning model data objects that is associated with the inference-profile pair of the plurality of inference-profile pairs for the global machine learning model data object comprises: processing the related local model subset in accordance with a Bayesian non-parametric model aggregation model to generate the global machine learning model data object. (Yurochkin, ABS:” In federated learning problems, data is scattered across different servers and exchanging or pooling it is often impractical or prohibited. We develop a Bayesian nonparametric framework for federated learning with neural networks. Each data server is assumed to provide local neural network weights, which are modeled through our framework. We then develop an inference approach that allows us to synthesize a more expressive global network without additional supervision, data pooling and with as few as a single communication round. We then demonstrate the efficacy of our approach on federated learning problems simulated from two popular image classification datasets.”, and Section 1:” The matching, to be formally defined later, is governed by the posterior of a Beta-Bernoulli process [BBP] [Thibaux & Jordan, 2007], a Bayesian nonparametric [BNP] model that allows the local parameters to either match existing global ones or to create new global parameters if existing ones are poor matches. Our construction provides several advantages over existing approaches.”)
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Huang and Feng in view of Yurochkin to process the related local model subset in accordance with a Bayesian non-parametric model aggregation model to generate the global machine learning model data object, in order to have a particularly effective learning compressed federated networks from pretrained local networks, as evidence by Yurochkin (see Section 5).

Regarding claim 8, Huang fails to explicitly disclose, however, Yurochkin teaches wherein the Bayesian non-parametric model aggregation model comprises a Beta-Bernoulli processing routine. (Yurochkin, Section 1:” The matching, to be formally defined later, is governed by the posterior of a Beta-Bernoulli process [BBP] [Thibaux & Jordan, 2007], a Bayesian nonparametric [BNP] model that allows the local parameters to either match existing global ones or to create new global parameters if existing ones are poor matches. Our construction provides several advantages over existing approaches. First, it decouples the learning of local models from their amalgamation into a global federated model. This decoupling allows us to remain agnostic about the local learning algorithms, which may be adapted as necessary, with each data source potentially even using a different learning algorithm. Moreover, given only pretrained models, our BBP informed matching procedure is able to combine them into a federated global model without requiring additional data or knowledge of the learning algorithms used to generate the pre-trained models. This is in sharp contrast with existing work on federated learning of neural networks [McMahan et al., 2017], which require strong assumptions about the local learners, for instance, that they share the same random initialization, and are not applicable for combining pre-trained models.”, and Section 3.1:" We now present the key building block of our framework, a Beta Bernoulli Process [Thibaux & Jordan, 2007] based model of MLP weight parameters.")
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Huang and Feng in view of Yurochkin to wherein the Bayesian non-parametric model aggregation model comprises a Beta-Bernoulli processing routine, in order to have a particularly effective learning compressed federated networks from pretrained local networks, as evidence by Yurochkin (see Section 5).




Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Pothula et al. (US-20210174257A1) teaches Par.  0109: “FIG. 4 is a block diagram of a logical architecture in which the present techniques may be implemented in some embodiments. The architecture may include a data atlas 8000, which in some cases may house some or all of the OOM module 22 (rather than or in addition to the AI atlas 20), the above-described AI pillars 8002, the above described AI atlas 8004, and user experience application program interfaces 8006, which may be used to configure or update the federated machine learning model 19 discussed above. Each module depicted may have some or all of the illustrated functionality.”
Rangarajan et al. (US-20200160171A1) teaches Par. 0054: “In some configurations, the program code 416 is implemented as an application programming interface [“API”] compatible with pre-existing communication libraries. For example, and without limitation, the resulting program code 416 can be implemented as an NVIDIA Collective Communications Library [“NCCL”]-compatible API and can be seamlessly plugged into distributed machine learning [“ML”] frameworks such as, but not limited to, TENSORFLOW, PYTORCH, CNTK, and MXNET. Program code 416 compatible with other types of APIs and machine learning frameworks can be generated in other configurations. This ensures that existing programs 418 can execute the program code 416 to utilize the technologies disclosed herein with little or no modification.”
Das et al. (US-20180322606A1) teaches Par. 0023:” FIGS. 18-21 illustrate flow diagrams that describe operations to enable distributed machine learning via the MLSL API”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DARIOUSH AGAHI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656