Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of the Claims
Applicant’s amendment filed 2/28/22 (hereinafter “Response”) has been entered. Examiner notes that claims 1, 9, 17, and 20 have been amended. Claims 1-20 are currently pending in the application and for reasons that will be outlined in detail below, are hereby allowed.
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given via email from Attorney of Record Rachel Pearlman dated 5/3/22 and was the product of the interview conducted on 4/28/22.
Claim 1 has been amended as follows:
1.	(Currently Amended) A computer-implemented method, comprising:
obtaining, by one or more processors in a distributed computing environment, one or more data sets related to a patient population diagnosed with a medical condition from one or more databases;
based on a frequency of features in the one or more data sets, identifying, by the one or more processors, common features in the one or more data sets and weighting the common features based on frequency of occurrence in the portion of the data, wherein the common features comprise mutual information;
utilizing, by the one or more processors, the mutual information to generate a patient definition, wherein generating the patient definition comprises: 
truncating, by the one or more processors, the common features based on identifying one or more common features of the common features with mutual information values above a predefined threshold, wherein the mutual information value for each common feature of the common features comprises a weighted value for the common feature based on the frequency of occurrence of the common feature in the portion of the data, wherein the truncating comprises ranking the weighted values and selecting one or more common features from the common features wherein the selected common features comprise the one or more features with the mutual information values above the predefined threshold; and
selecting, by the one or more processors, from the one or more common features with the mutual information values above the predefined threshold, a portion of the common features, wherein the portion of the common features comprises a smallest subset of common features from the one or more common features with the mutual information values above the predefined threshold comprising a majority of common features of the one or more features with the mutual information values above the predefined threshold  

generating, by the one or more processors, one or more machine learning algorithms based on the patient definition
utilizing, by the one or more processors, statistical sampling to compile a training set of data, wherein the training set comprises data from the one or more data sets and at least one additional data set comprising data related to a population without the medical condition, and wherein utilizing the statistical sampling comprises formulating and obtaining queries based on the training data and processing and responding to the queries;
tuning, by the one or more processors, the one or more machine learning algorithms by applying the one or more machine learning algorithms to the training set of data;
integrating, by the one or more processors, the one or more machine learning algorithms into a graphical user interface, wherein the graphical user interface provides an input to enable a user to provide data related to the undiagnosed patient;
obtaining, by the one or more processors, via the graphical user interface, data related to the undiagnosed patient; 
applying, by the one or more processors, the one or more machine learning algorithms to the data related to the undiagnosed patient;
determining, by the one or more processors, based on applying the one or more machine learning algorithms to the data related to the undiagnosed patient, a probability, wherein the probability is a numerical value indicating a percentage of commonality between the data related to the undiagnosed patient and the patient definition
displaying, by the one or more processors, the probability to the user, through the graphical user interface, as a score.

Claim 9 has been amended as follows:
9.	(Currently Amended) A computer program product comprising: 
a computer readable storage medium readable by one or more processors in a distributed computing environment, and storing instructions for execution by the one or more processors for performing a method comprising: 
obtaining, by the one or more processors in a distributed computing environment, one or more data sets related to a patient population diagnosed with a medical condition from one or more databases;
based on a frequency of features in the one or more data sets, identifying, by the one or more processors, common features in the one or more data sets and weighting the common features based on frequency of occurrence in the portion of the data, wherein the common features comprise mutual information;
utilizing, by the one or more processors, the mutual information to generate a patient definition, wherein generating the patient definition comprises: 
truncating, by the one or more processors, the common features based on identifying one or more common features of the common features with mutual information values above a predefined threshold, wherein the mutual information value for each common feature of the common features comprises a weighted value for the common feature based on the frequency of occurrence of the common feature in the portion of the data, wherein the truncating comprises ranking the weighted values and selecting one or more common features from the common features wherein the selected common features comprise the one or more features with the mutual information values above the predefined threshold; and
selecting, by the one or more processors, from the one or more common features with the mutual information values above the predefined threshold, a portion of the common features, wherein the portion of the common features comprises a smallest subset of common features from the one or more common features with the mutual information values above the predefined threshold comprising a majority of common features of the one or more features with the mutual information values above the predefined threshold  

generating, by the one or more processors, one or more machine learning algorithms based on the patient definition
utilizing, by the one or more processors, statistical sampling to compile a training set of data, wherein the training set comprises data from the one or more data sets and at least one additional data set comprising data related to a population without the medical condition, and wherein utilizing the statistical sampling comprises formulating and obtaining queries based on the training data and processing and responding to the queries;
tuning, by the one or more processors, the one or more machine learning algorithms by applying the one or more machine learning algorithms to the training set of data;
integrating, by the one or more processors, the one or more machine learning algorithms into a graphical user interface, wherein the graphical user interface provides an input to enable a user to provide data related to the undiagnosed patient;
obtaining, by the one or more processors, via the graphical user interface, data related to the undiagnosed patient; 
applying, by the one or more processors, the one or more machine learning algorithms to the data related to the undiagnosed patient;
determining, by the one or more processors, based on applying the one or more machine learning algorithms to the data related to the undiagnosed patient, a probability, wherein the probability is a numerical value indicating a percentage of commonality between the data related to the undiagnosed patient and the patient definition
displaying, by the one or more processors, the probability to the user, through the graphical user interface, as a score.

Claim 17 has been amended as follows:
17 (Currently Amended) A system comprising: 
one or more memory; 
one or more processors in communication with the memory; and 
program instructions executable by the one or more processors in a distributed computed environment via the one or more memory to perform a method, the method comprising: obtaining, by the one or more processors in a distributed computing environment, one or more data sets related to a patient population diagnosed with a medical condition from one or more databases;
based on a frequency of features in the one or more data sets, identifying, by the one or more processors, common features in the one or more data sets and weighting the common features based on frequency of occurrence in the portion of the data, wherein the common features comprise mutual information;
utilizing, by the one or more processors, the mutual information to generate a patient definition, wherein generating the patient definition comprises: 
truncating, by the one or more processors, the common features based on identifying one or more common features of the common features with mutual information values above a predefined threshold, wherein the mutual information value for each common feature of the common features comprises a weighted value for the common feature based on the frequency of occurrence of the common feature in the portion of the data, wherein the truncating comprises ranking the weighted values and selecting one or more common features from the common features wherein the selected common features comprise the one or more features with the mutual information values above the predefined threshold; and
selecting, by the one or more processors, from the one or more common features with the mutual information values above the predefined threshold, a portion of the common features, wherein the portion of the common features comprises a smallest subset of common features from the one or more common features with the mutual information values above the predefined threshold comprising a majority of common features of the one or more features with the mutual information values above the predefined threshold  

generating, by the one or more processors, one or more machine learning algorithms based on the patient definition
utilizing, by the one or more processors, statistical sampling to compile a training set of data, wherein the training set comprises data from the one or more data sets and at least one additional data set comprising data related to a population without the medical condition, and wherein utilizing the statistical sampling comprises formulating and obtaining queries based on the training data and processing and responding to the queries;
tuning, by the one or more processors, the one or more machine learning algorithms by applying the one or more machine learning algorithms to the training set of data;
integrating, by the one or more processors, the one or more machine learning algorithms into a graphical user interface, wherein the graphical user interface provides an input to enable a user to provide data related to the undiagnosed patient;
obtaining, by the one or more processors, via the graphical user interface, data related to the undiagnosed patient; 
applying, by the one or more processors, the one or more machine learning algorithms to the data related to the undiagnosed patient;
determining, by the one or more processors, based on applying the one or more machine learning algorithms to the data related to the undiagnosed patient, a probability, wherein the probability is a numerical value indicating a percentage of commonality between the data related to the undiagnosed patient and the patient definition
displaying, by the one or more processors, the probability to the user, through the graphical user interface, as a score.

Allowable Subject Matter
Claims 1-20 are allowed.

Reasons for Allowance
The following is the Examiner’s statement of reasons for allowance:
Based on the amendments filed in the Response the provisional non-statutory double patenting rejection raised in the non-final office action mailed 11/26/21 in view of copending Application No. 17/197704 in view of US 2019/0043618 to Vaughan et al (hereinafter Vaughan) is withdrawn.

Based on the amendments filed in the Response the 112(b) rejection of claims 1, 9, and 17 and their dependents is withdrawn as the antecedent basis issue has been corrected.

As discussed in the Office Action mailed on November 26, 2021, claims 1-20 are interpreted as being patent eligible under 35 USC 101 when considered in view of the 2019 Revised Patent Subject Matter Eligibility Guidance. Office Action pp. 8-9.

Regarding 35 USC 102 and 103, the following represents the closest prior art references to the present claims, as well as reasons explaining why the present claims are distinguished from the closest prior art references:
As discussed in the Office Action mailed on November 26, 2021, Vaughan teaches a method for evaluating a subject for a condition generating one or more machine learning algorithms based on identified patterns from a dataset related to a patient population diagnosed with a medical condition, performing statistical sampling to compile a training dataset, tuning the one or more machine learning algorithms using the training dataset, and applying data of an undiagnosed patient to the one or more machine learning algorithms to determine a probability relating to a medical condition of the undiagnosed patient. However, Vaughan does not specifically disclose identifying common features based on the frequency of the features in the dataset and weighting the common features, which include mutual information. Furthermore, Vaughan does not teach generating a patient definition by truncating common features with mutual information values over a threshold and additionally, selecting a portion of the common features that are the smallest subset of features that contain a majority of the mutual information.
US 2019/0214141 to Chatterjee et al (hereinafter Chatterjee) teaches identifying and weighting common features in dataset based on frequency of occurrence of the features and further determining a smallest number of principal components that yield a predetermined level of validation accuracy. However, Chatterjee does not make up for all of the deficiencies of Vaughan, and at a minimum does not disclose generating a patient definition by truncating common features with mutual information values over a threshold and additionally, selecting a portion of the common features that are the smallest subset of features that contain a majority of the mutual information.
US 8,655,695 to Qu et al (hereinafter Qu) teaches performing a feature selection step to select only the most discriminating features by filtering out features that do not meet a threshold, noting that the feature selection method includes mutual information. However, Qu does not make up for all of the deficiencies of Vaughan or Chatterjee. 
Even if each and every element of the present invention were taught individually by the aforementioned references, combining the references as an ordered combination would not have been obvious to one ordinarily skilled in the art because doing so would require improper hindsight reasoning in view of the present Specification, and furthermore there is no teaching, suggestion, or motivation to combine the aforementioned references present in the aforementioned references themselves or in knowledge generally available to one of ordinary skill in the art.
For at least these reasons, the rejection of claims 1-20 under 35 USC 102/103 is withdrawn.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRISTOPHER B WEHRLY whose telephone number is (303)297-4433. The examiner can normally be reached Monday - Friday, 8:30 - 4:30 MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jason B. Dunham can be reached on (571) 272-8109. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/CHRISTOPHER B WEHRLY/Examiner, Art Unit 3686                                                                                                                                                                                                        
/JOHN P GO/Primary Examiner, Art Unit 3686