DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-6, 8, 9, 11-16 and 18-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Miotto et al. (US 2020/0327404 A1)(hereinafter Miotto).
Re claim 1, Miotto discloses a method for preprocessing biomedical data for a predictive model, the method comprising: receiving data from a data source (see fig. 1 ¶ 41 for receiving data from a data source (i.e. input data representing a plurality of entities 58 as described in fig. 2 paragraph 38)); using at least one machine learning (ML) algorithm from a plurality of ML algorithms to obtain at least one combination of preprocessing steps (see figs. 1-2 ¶s 53, 76, 81 for using at least one machine learning (ML) algorithm from a plurality of ML algorithms (i.e. a plurality of ML algorithms as described in paragraphs 74, 80, 92, 96) to obtain at least one combination of preprocessing steps (i.e. EHRs are first extracted from the clinical data warehouse, pre-processed to identify and normalize clinically relevant phenotypes, and grouped in patient vectors (e.g., raw representation) as described in fig. 5 paragraph 80)); and computing an accuracy score for each of the at least one combination based on accuracy of prediction of the predictive model (see fig. 7 ¶s 95, 108 for computing an accuracy score for each of the at least one combination based on accuracy of prediction of the predictive model (i.e. for each disease, the scores obtained by all patients in the test set (i.e., 76,214 patients) was taken and used to measure the area under the receiver operating characteristic curve (i.e., AUC-ROC), accuracy, and F-score as shown in fig. 8 paragraph 107). Also, see figs. 1-2, 5 paragraphs 53, 76, 80-81)
Re claim 2, Miotto as discussed in claim 1 above discloses all the claim limitations with additional claimed feature wherein the data source comprises at least one of a local data storage, a database, and a cloud data storage (see ¶ 41 for the data source (i.e. input data representing a plurality of entities 58 as described in fig. 2 paragraph 38) comprises at least one of a local data storage, a database, and a cloud data storage as shown in fig. 1 paragraph 28)
Re claim 3, Miotto as discussed in claim 1 above discloses all the claim limitations with additional claimed feature wherein the data is one of Magnetic Resonance Imaging (MRI) data, functional Magnetic Resonance Imaging (fMRI) data, Electroencephalogram (EEG) data, Electrocardiogram (EKG/ECG) data, genetics data, (see ¶s 46-47, 72-73, 83 for the data is one of Magnetic Resonance Imaging (MRI) data, functional Magnetic Resonance Imaging (fMRI) data, Electroencephalogram (EEG) data, Electrocardiogram (EKG/ECG) data, genetics data, proteomics data, data from wearable devices, Electronic Health Record (EHR) data, Electronic Medical Record (EMR) data, Chemical structure data, Images (PNG, JPEG), including from pathology or other applications of microscopy, and other healthcare and medical research or healthcare related data)
Re claim 4, Miotto as discussed in claim 1 above discloses all the claim limitations with additional claimed feature wherein the method further comprises selecting one combination of preprocessing steps from the at least one plurality of combination of preprocessing steps, wherein the accuracy score of the selected at least one combination of preprocessing steps is greater than a predefined threshold rank (see fig. 7 ¶s 95, 108 for selecting one combination of preprocessing steps from the at least one plurality of combination of preprocessing steps (i.e. EHRs are first extracted from the clinical data warehouse, pre-processed to identify and normalize clinically relevant phenotypes, and grouped in patient vectors (e.g., raw representation) as described in fig. 5 paragraph 80), wherein the accuracy score of the selected at least one combination of preprocessing steps is greater than a predefined threshold rank (i.e. for each disease, the scores obtained by all patients in the test set (i.e., 76,214 patients) was taken and used to measure the area under the receiver operating characteristic curve (i.e., AUC-ROC), accuracy, and F-score. Accuracy and F-score require a threshold to discriminate between positive and negative predictions, for this example, this threshold was set to 0.6, with this value optimizing the tradeoff between precision and recall for all representations in the validation set by reducing the number of false positive predictions as shown in fig. 8 paragraph 107). Also, see figs. 1-2 paragraphs 53, 76, 81)
Re claim 5, Miotto as discussed in claim 1 above discloses all the claim limitations with additional claimed feature wherein the accuracy score is calculated by an evaluation metric, and wherein the evaluation metric comprises at least one of a classification accuracy, a logarithmic loss, a confusion matrix, an area under curve (AUC), an F1 score, a mean absolute error, a mean squared error, or a performance evaluation metric (see figs. 1, 7 ¶s 95, 108 for the accuracy score is calculated by an evaluation metric, and wherein the evaluation metric comprises at least one of a classification accuracy, a logarithmic loss, a confusion matrix, an area under curve (AUC), an F1 score, a mean absolute error, a mean squared error, or a performance evaluation metric (i.e. for each disease, the scores obtained by all patients in the test set (i.e., 76,214 patients) was taken and used to measure the area under the receiver operating characteristic curve (i.e., AUC-ROC), accuracy, and F-score as shown in fig. 8 paragraph 107). Also, see figs. 9-12 paragraphs 109-112)
Re claim 6, Miotto as discussed in claim 1 above discloses all the claim limitations with additional claimed feature further comprising distributing the plurality of ML algorithms over a cluster of computers or processors in a single computer (see figs. 1-2, 5 ¶s 69, 81 for distributing the plurality of ML algorithms (i.e. a plurality of ML algorithms as described in paragraphs 74, 80, 92, 96) over a cluster of computers or processors in a single computer (i.e. analysis computer system 100 comprises one or more computers. For purposes of illustration in FIG. 1, the analysis computer system 100 is represented as a single computer that includes all of the functionality of the disclosed analysis computer system 100. The functionality of the analysis computer system 100 may be spread across any number of networked computers and/or reside on each of several networked computers as described in paragraph 27))
Re claim 8, Miotto as discussed in claim 1 above discloses all the claim limitations with additional claimed feature further comprising using the selected at least one combination of preprocessing steps to generate data for the predictive model (see figs. 1-2 ¶s 53, 76, 81 for using the selected at least one combination of preprocessing steps (i.e. EHRs are first extracted from the clinical data warehouse, pre-processed to identify and normalize clinically relevant phenotypes, and grouped in patient vectors (e.g., raw representation) as described in fig. 5 paragraph 80) to generate data for the predictive model (i.e. classification models were trained over 200,000 patients and 78 diseases, while the evaluation included 76,214 different patients. The deep feature model was then applied to train and test sets for supervised evaluation; hence each patient in these datasets was represented by a dense vector of 500 features as described in fig. 7 paragraph 95). Also, see paragraphs 96, 99, 114)
Re claim 9, Miotto as discussed in claim 1 above discloses all the claim limitations with additional claimed feature further comprising: detecting a bias in the data, wherein the bias comprises at least one of a selection bias, a reporting bias, a recall bias, an exclusion bias, an information bias, or a statistical bias (see ¶s 57-58, 62, 73 for detecting a bias in the data, wherein the bias comprises at least one of a selection bias, a reporting bias, a recall bias, an exclusion bias, an information bias, or a statistical bias (i.e. Highly frequent (i.e., appearing in more than 80% of patients) and rare descriptors (i.e., present in less than five patients) were removed from the dataset to avoid biases and noise in the learning process leading to a final vocabulary of 41,072 features (i.e., each patient of all datasets was represented by a sparse vector of 41,072 entries) as described in fig. 2 paragraph 92)); and correcting the bias using at least one suitable preprocessing algorithm (see ¶s 57-58, 62, 73 for correcting the bias (i.e. Highly frequent (i.e., appearing in more than 80% of patients) and rare descriptors (i.e., present in less than five patients) were removed from the dataset to avoid biases and noise in the learning process leading to a final vocabulary of 41,072 features (i.e., each patient of all datasets was represented by a sparse vector of 41,072 entries) as described in fig. 2 paragraph 92) using at least one suitable preprocessing algorithm (i.e. a plurality of ML algorithms as described in paragraphs 53, 74, 76, 80-81, 96))
Re claim 11, Miotto discloses a system for preprocessing biomedical data for a predictive model, the system comprising: an iterative data preprocessing device that includes at least one processor and a computer-readable medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform at least the following operations (see ¶s 27-28 for an iterative data preprocessing device (i.e. analysis computer system 100 as shown in fig. 1) that includes at least one processor and a computer-readable medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform at least the following operations (i.e. the computing system comprises one or more processors and memory storing one or more programs for execution by the one or more processors as described in fig. 1 paragraph 25)): receiving data from a data source (see fig. 1 ¶ 41 for receiving data from a data source (i.e. input data representing a plurality of entities 58 as described in fig. 2 paragraph 38)); using at least one ML algorithm from a plurality of ML algorithms to obtain at least one combination of preprocessing steps (see figs. 1-2 ¶s 53, 76, 81 for using at least one machine learning (ML) algorithm from a plurality of ML algorithms (i.e. a plurality of ML algorithms as described in paragraphs 74, 80, 92, 96) to obtain at least one combination of preprocessing steps (i.e. EHRs are first extracted from the clinical data warehouse, pre-processed to identify and normalize clinically relevant phenotypes, and grouped in patient vectors (e.g., raw representation) as described in fig. 5 paragraph 80)); and computing an accuracy score for each of the at least one combination based on accuracy of prediction of the predictive model (see fig. 7 ¶s 95, 108 for computing an accuracy score for each of the at least one combination based on accuracy of prediction of the predictive model (i.e. for each disease, the scores obtained by all patients in the test set (i.e., 76,214 patients) was taken and used to measure the area under the receiver operating characteristic curve (i.e., AUC-ROC), accuracy, and F-score as shown in fig. 8 paragraph 107). Also, see figs. 1-2, 5 paragraphs 53, 76, 80-81)
Re claim 12, Miotto as discussed in claim 2 above discloses all the claimed limitations of claim 12.
Re claim 13, Miotto as discussed in claim 3 above discloses all the claimed limitations of claim 13.
Re claim 14, Miotto as discussed in claim 4 above discloses all the claimed limitations of claim 14.
Re claim 15, Miotto as discussed in claim 5 above discloses all the claimed limitations of claim 15.
Re claim 16, Miotto as discussed in claims 6 and 11 above discloses all the claimed limitations of claim 16.
Re claim 18, Miotto as discussed in claims 8 and 11 above discloses all the claimed limitations of claim 18.
Re claim 19, Miotto as discussed in claim 9 above discloses all the claimed limitations of claim 19.
Re claim 20, Miotto as discussed in claim 11 above discloses all the claim limitations with additional claimed feature wherein a user specifies a sequence of operations, a criteria for a success, and a deployment of pre- and post-processing using (see ¶s 25, 29, 38, 50 for a user specifies a sequence of operations, a criteria for a success, and a deployment of pre- and post-processing (i.e. these medical records are pre-processed using the Open Biomedical Annotator to obtain harmonized codes for procedures and lab tests, normalized medications based on brand name and dosages, and to extract clinical concepts from the free-text notes as described in fig. 2 paragraph 53, furthermore, EHRs are first extracted from the clinical data warehouse, pre-processed to identify and normalize clinically relevant phenotypes, and grouped in patient vectors (e.g., raw representation) as described in fig. 5 paragraph 80) using a graphic user interface (GUI) (i.e. a user interface (e.g., including a display 82 and keyboard 80 or other form of input device) as described in fig. 1 paragraph 28))
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Miotto et al. (US 2020/0327404 A1)(hereinafter Miotto) as applied to claims 1-6, 8, 9, 11-16 and 18-20 above, and further in view of Aliper et al. (US 2019/0272890 A1)(hereinafter Aliper).
Re claim 7, Miotto as discussed in claim 1 above discloses all the claim limitations with additional claimed feature wherein the preprocessing steps (see figs. 1-2 ¶s 53, 76, 81 for the preprocessing steps (i.e. EHRs are first extracted from the clinical data warehouse, pre-processed to identify and normalize clinically relevant phenotypes, and grouped in patient vectors (e.g., raw representation) as described in fig. 5 paragraph 80)) 
Miotto fails to explicitly teach comprise at least one of a pixel threshold determination, linear regression computation, non-linear regression computation, volume threshold determination, or a smoothing method. However, the reference of Aliper explicitly teaches comprise at least one of a pixel threshold determination, linear regression computation, non-linear regression computation, volume threshold determination, or a smoothing method (see figs. 3-4 ¶s 123, 241-242 for at least one of a pixel threshold determination, linear regression computation, non-linear regression computation, volume threshold determination, or a smoothing method. Also, see fig. 9 paragraphs 249, 297)
Therefore, taking the combined teachings of Miotto and Aliper as a whole, it would have been obvious before the effective filing date of the claimed invention to incorporate this feature (regression) into the system of Miotto as taught by Aliper.

Re claim 17, the combination of Miotto and Aliper as discussed in claim 7 above discloses all the claimed limitations of claim 17.
Claims 10 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Miotto et al. (US 2020/0327404 A1)(hereinafter Miotto) as applied to claims 1-6, 8, 9, 11-16 and 18-20 above, and further in view of Frenkel et al. (US 2020/0098443 A1)(hereinafter Frenkel).
Re claim 10, Miotto as discussed in claim 1 above discloses all the claimed limitations but fails to explicitly teach further comprising visualizing an output by using one of a Seaborn package, a Matplotlib package, or a data visualization package. However, the reference of Frenkel explicitly teaches further comprising visualizing an output by using one of a Seaborn package, a Matplotlib package, or a data visualization package (see ¶ 252 for visualizing an output by using one of a Seaborn package, a Matplotlib package, or a data visualization package)
 would have been obvious before the effective filing date of the claimed invention to incorporate this feature (package) into the system of Miotto as taught by Frenkel.
One will be motivated to incorporate the above feature into the system of Miotto as taught by Frenkel for the benefit of performing principal component analysis (PCA) on log-transformed expressions of 19,308 genes, wherein gene expressions were transformed into 10 components space using scipy package, wherein plots were created using matplotlib and seaborn, wherein gene expression ratios were calculated using pandas and numpy packages, wherein logistic regression models were constructed and evaluated by scikitlearn package, wherein Kaplan-Meier curves were produced by lifelines in order to improve efficiency when creating plots using matplotlib and seaborn (see ¶ 252)
Re claim 21, the combination of Miotto and Frenkel as discussed in claim 10 above discloses all the claimed limitations of claim 21.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSE M MESA whose telephone number is (571)270-1706.  The examiner can normally be reached on Monday-Friday 8:30AM-6:00PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

8/5/2021
/JOSE M. MESA/
Examiner
Art Unit 2484



/THAI Q TRAN/Supervisory Patent Examiner, Art Unit 2484