DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	Applicant’s amendments and arguments, filed 8/15/2022 have been entered and carefully considered but are not completely persuasive.
	Claims 1-4, 11-15, 17-21 and 24-25 are under examination.  Claims 5-10, 16, 22-23 stand withdrawn as being drawn to non-elected species.
CLAIM INTERPRETATION
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  
The claims have been amended to change “machine learning classifiers” to “decision tree classifiers” which provides the requisite structure for the functions, and do not require rejection under 35 USC 112(6th) or 112(f) considerations.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-4, 11-15, 17-21 and 24-25 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
In claim 1, the metes and bounds of the claim are unclear with respect to how each classifier is trained.  It is unclear what RNA-based data is applied to the initial decision tree classifier, to generate each separate RNA-based decision tree classifier, to then process three separate sets of RNA expression data.  The training of the classifiers is not particularly set forth.  It could be assumed that some random set of RNA expression data not from the sample is used to train all classifiers.  Alternatively, particular separate RNA expression datasets (or subsets of a larger set) can be used to train each classifier.  For example, it is unclear whether RNA-based decision tree classifier #1, #2 and #3 are all trained on the same RNA expression dataset that is not the same as the dataset from the sample; or whether RNA-based decision tree classifier #1 is trained on one set of data, classifier #2 is trained on a second, distinct set of data, and classifier #3 is trained on a third, distinct set of data.  One of skill would not be apprised as to what the base training sets are intended to comprise.  It is unclear if it comprises some sort of “healthy” subject RNA expression dataset, for the same type of subject (i.e. mouse, or human); or if some standard RNA expression dataset is intended to be used. As such the metes and bounds of each trained classifier are unclear.
Further in claim 1, and throughout the examined claims, Applicant has changed “machine learning classifiers” to “decision tree classifiers”.  The metes and bounds of “decision tree classifiers” are unclear.  The examiner reviewed the disclosure to determine whether it provides a definition of this term, and how it is applied to the subject data at hand.  At nearly every recitation, the phrase setting forth types of classifiers to be applied to RNA- or DNA-based expression datasets is actually “gradient boosted decision tree classifiers.”  At page 7 of the specification, the first recitation the Examiner noted for decision tree classifiers is for DNA based machine learning classifiers, and specifies a “gradient-boosted decision tree classifier” at line 17.  At page 8 of the specification, the RNA-based machine learning classifier is a “gradient-boosted decision tree classifier” at line 10. In the section of the specification directed to machine learning classifiers: 
“As described above, a hierarchy of machine learning classifiers includes multiple machine learning classifiers used to process features obtained from expression data obtained from the biological sample. FIG. 2C shows an illustrative diagram 250 of a two-class classifier, ) optionally a multi-class classifier, according to some embodiments of the technology described herein. 
In some embodiments, a machine learning classifier included in the hierarchy of machine
learning classifiers (e.g., machine learning classifier B 224b) can include for example, a gradient boosted tree, a neural network, a logistic regression model, a support vector machine, a Bayesian classifier, a random forest classifier, or any suitable type of machine learning classifier, as aspects of the technology described herein are not limited to any particular type of machine learning classifier…”

The only recitation of a tree is a “gradient boosted tree.”  This continues throughout the specification. It is unclear what other decision tree structures can be used in the claimed invention, and it is unclear how to select, train and use any other type of decision tree classifier to “identify at least one candidate molecular category for a biological sample” as required by the preamble.  Gradient boosting is a particular type of technique, as defined by Sagi et al (of record): 
“In Gradient boosting machines (GBM; Friedman, 2001), the training of each inducer is dependent on inducers that have already been trained. The main difference between GBM and other techniques is that in GBM optimization is applied in the function space. It includes a learning procedure in which the goal is to construct the base learners so that they are maximally
correlated with the negative gradient of the loss function, associated with the whole ensemble (Natekin & Knoll, 2013).” (p10)

	It is further unclear whether the recitation of “decision tree classifiers” further sets forth all the necessary and sufficient limitations to carry out the ensemble classification.  Even if one reads into the term “decision tree classifiers” the addition of “gradient boosted” it appears critical information is missing from claim 1. In Sagi, gradient boosted decision tree methods (GBDT, or GBM) methods are described as follows: 
“More specifically, in GBM a sequence of regression trees is computed, where each successive tree predicts the pseudo-residuals of the preceding trees given an arbitrary differentiable loss function. An arbitrary loss function requires the specification of the loss function by the user, in addition to the function that calculates the corresponding negative gradient. Predictions are aggregated in an additive manner in which each added model is trained so it will minimize the loss function. It is important to note that a GBM model usually has many shallow trees, as opposed to random forest which has fewer (but deeper) trees. Choosing the right number of trees (i.e., number of iterations) is very important when training a gradient boosting model. Setting it too high can lead to overfitting, while setting it too low may result in underfitting. The selection of the most suitable number of iterations is usually done by using a validation set to evaluate the overall predictive performance. Overfitting can be reduced by applying a stochastic gradient boosting method (Friedman, 2002) in which trees are  consecutively trained with small subsets, sampled from the original dataset.” (p10)

Independent claim 1 sets forth the linear processing of three RNA expression datasets, followed by “identifying using at least some of the RNA-based decision tree classifier outputs, including the first output and the second output, at least one candidate molecular category for the sample.”  These processing steps merely use “RNA-based decision tree classifiers” but do not provide a loss function, nor does it provide the function that calculates the corresponding negative gradient.  The training of the “RNA-based decision tree classifiers” does not clearly minimize a loss function, nor is the resulting ensemble clearly validated in any way.  As such, it is further unclear how to utilize “decision tree classifiers” even if they are assumed to be “gradient boosting decision tree classifiers” to achieve the required results of molecular categorization or classification.  The specification, at the section for machine learning classifiers, p 53, sets forth that the training also includes two specific categories which are different for each classifier, i.e. subtype A or not subtype A.  However, this may not accurately describe the data as discussed at page 54: 
“Consider, as an example, a tumor sample obtained from the liver that contains normal liver tissue. Since liver neoplasm originates in the liver, the normal tissue from the liver and tumor tissue belonging to the liver neoplasm molecular category may share similar RNA expression profiles. Therefore, a machine learning classifier that receives the tumor sample and is not trained to distinguish between tissue belonging to the liver neoplasm molecular category and normal liver tissue may inaccurately predict a high probability for the liver neoplasm molecular category, even when that is not the case.
To mitigate these biases, in some embodiments, the machine learning classifier B 224b
may comprise a multi-class classifier trained to distinguish between three classes: normal tissue
258a (e.g., tissue from the sample site that is not diseased), molecular category B 224c, and
molecular category B 254a. In this embodiment, the machine learning classifier B 224b may be
trained to determine probability 356d that the biological sample belongs to the normal tissue
corresponding to the molecular category B…
In some embodiments, classifier B 2245, classifier C 225b, and classifier D 226b are
each associated with molecular categories represented by nodes that descend from parent node
223, representing molecular category A. Therefore, classifiers B-D are positioned at a same level
(e.g., level N) of the hierarchy of machine learning classifiers as one another.”

While breadth of a claim is not the same as indefiniteness, one of skill would not be apprised of other types of decision tree classifiers applicant intends to apply to the RNA expression datasets, to achieve the required classification.  One of skill would not be apprised as to what loss functions, functions, or steps for validation would be applicable to any other type of decision tree classifier other than the GBDT. One of skill would not be apprised as to the particular training processes as they relate to the molecular categories.  These elements each appear to be critical to the training and use of each classifier and cannot be dismissed as incidental to the claimed invention.
These rejections apply particularly to claims 2-4, where up to 6 different RNA-expression datasets and 6 decision tree classifiers are indicated, claim 12-13 with respect to the addition of DNA expression data, and DNA-based decision tree classifiers, claim 14 with respect to the DNA features, Claim 18 which recites 10 RNA-based decision tree classifiers, claims 19-21 where 19 specifies the GB decision tree, but not the rest of the required information, claim 20 identifies a number of genes, but no other required information, and claim 21 where the sample is of a cancer of unknown primary tumor, but no other required information.  Claims 24-25 recite the same limitations as claim 1 and are similarly indefinite.
Applicant’s arguments:
Applicant’s arguments were directed to previously interpreted limitations under 35 USC 112(f) and related issues under 112(a) and (b).  While the claims no longer recite means plus function limitations, the claims still encompass a plurality of terms and concepts which are indefinite as set forth above.

Claims 1-4, 11-15, 17-21 and 24-25 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the enablement requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention. 
In In re Wands (8 USPQ2d 1400 (CAFC 1988)) the CAFC considered the issue of enablement in molecular biology. The CAFC summarized eight factors to be considered in a determination of "undue experimentation". These factors include: (a) the quantity of experimentation necessary; (b) the amount of direction or guidance presented; (c) the presence or absence of working examples; (d) the nature of the invention; (e) the state of the prior art; (f) the relative skill of those in the art; (g) the predictability of the art; and (h) the breadth of the claims.
In considering the factors for the instant claims:
(a) Based on the total analysis of each factor as set forth below, an undue amount of experimentation would be required to carry out the invention of the rejected claims.  
(h) With respect to the breadth of the claims: in order to practice the claimed invention one of skill in the art must identify and obtain at least three different RNA expression datasets, obtain at least three RNA-based machine learning classifiers, and use the classifiers to process the datasets, and arrive at a candidate molecular category for the sample.  
 (f) With respect to the level of the skill and knowledge of one skilled in the art: the level of skill in bioinformatics is high.  Bioinformatics combines biological and technical knowledge with skills related to computers and sophisticated data analysis.  In particular, the prior art of record relating to the claimed sample classification methods using many separate types of machine learning technology illustrates the unpredictable nature of the technology.
(b), (d), (g) With respect to the nature of the invention and the amount of direction of guidance presented and predictability of the art: Considering independent claims 1, 24 and 25, the claimed invention is drawn to identifying at least one candidate molecular category (unspecified) for a biological sample of a subject, utilizing at least three separate RNA expression data sets for three different sets of genes (unspecified), and three (generically recited) RNA-based decision tree classifiers.
Guidance for the nature of the decision tree classifiers in the specification: the recitation of “decision tree classifiers” in the independent claims is broad. This category comprises multiple different types of classifiers. When considering the newly added limitation “decision tree classifiers” in the claims, the specification provides guidance for “gradient-boosted decision tree classifiers”  at nearly every instance of decision tree classifiers. At page 7 of the specification, the first recitation the Examiner noted for decision tree classifiers is for DNA-based machine learning classifiers, and specifies a “gradient-boosted decision tree classifier” at line 17.  At page 8 of the specification, the RNA-based machine learning classifier is a “gradient-boosted decision tree classifier” at line 10. In the section of the specification at pages 53-54 directed to machine learning classifiers: 
“As described above, a hierarchy of machine learning classifiers includes multiple machine learning classifiers used to process features obtained from expression data obtained from the biological sample. FIG. 2C shows an illustrative diagram 250 of a two-class classifier, ) optionally a multi-class classifier, according to some embodiments of the technology described herein. 
In some embodiments, a machine learning classifier included in the hierarchy of machine
learning classifiers (e.g., machine learning classifier B 224b) can include for example, a gradient boosted tree, a neural network, a logistic regression model, a support vector machine, a Bayesian classifier, a random forest classifier, or any suitable type of machine learning classifier, as aspects of the technology described herein are not limited to any particular type of machine learning classifier…”

Gradient-boosted decision tree classifiers require certain types of data, loss functions, and functions providing the corresponding negative gradient and some sort of validation.  Each different decision tree classifier appears to require different sets of RNA-expression data for different sets of genes.  Each different decision tree classifier appears to further require further information as to the training of each classifier. The training of the “RNA-based decision tree classifiers” in the claims does not clearly minimize a loss function, nor is the resulting ensemble clearly validated in any way.  As such, the specification does not provide guidance for carrying out the claimed invention using generic “decision tree classifiers” and the claims do not provide the requisite information to carry out such methods even if they are assumed to be “gradient boosting decision tree classifiers” to achieve the required results of molecular categorization or classification.  The specification, at the section for machine learning classifiers, p 53, sets forth that the training also includes information about at least two specific molecular categories which are different for each classifier, i.e. subtype A or not subtype A.  However, this may not accurately describe the data from the sample from the subject as discussed at page 54: 
“Consider, as an example, a tumor sample obtained from the liver that contains normal liver tissue. Since liver neoplasm originates in the liver, the normal tissue from the liver and tumor tissue belonging to the liver neoplasm molecular category may share similar RNA expression profiles. Therefore, a machine learning classifier that receives the tumor sample and is not trained to distinguish between tissue belonging to the liver neoplasm molecular category and normal liver tissue may inaccurately predict a high probability for the liver neoplasm molecular category, even when that is not the case.
To mitigate these biases, in some embodiments, the machine learning classifier B 224b
may comprise a multi-class classifier trained to distinguish between three classes: normal tissue
258a (e.g., tissue from the sample site that is not diseased), molecular category B 224c, and
molecular category B 254a. In this embodiment, the machine learning classifier B 224b may be
trained to determine probability 356d that the biological sample belongs to the normal tissue
corresponding to the molecular category B…
In some embodiments, classifier B 2245, classifier C 225b, and classifier D 226b are
each associated with molecular categories represented by nodes that descend from parent node
223, representing molecular category A. Therefore, classifiers B-D are positioned at a same level
(e.g., level N) of the hierarchy of machine learning classifiers as one another.”

The independent claims do not appear to set forth all the necessary and sufficient steps and information to carry out the listed goals.  
With respect to the predictability in the art of classifying samples from a subject to a molecular classification or subclassification, the claimed technology of utilizing ensemble classifiers to predict a molecular category of a dataset from a sample from a subject, particularly a cancer type or subtype category, is considered in the art to be unpredictable.  
(c) With respect to the examples of the disclosure: Figures 2-6 set forth several flow charts for certain aspects or subroutines of the invention.  Figure 7 sets forth a hierarchical structure for certain possible candidate molecular categories.  Figure 8 sets forth two separate flowcharts, 8A for particular gene set data, and 8B for DNA-expression data. Pages 1-85 describe the steps of each flowchart and structure, and describe a variety of certain steps not present in the claims.  Beginning at page 85, in the section regarding molecular category identification performance, essentially a working example on actual data, the example methods comprise steps and algorithms not clearly required in the independent claim to achieve the goals, and the exemplified accuracy of 92% as set forth at Figure 9 and its discussion at pages 85-87.  These figures, and their accompanied description in the specification up to page 87 recite at least the following features not required for the independent claims: a normal category for each molecular category, outputs calculated as a probability of belonging to each molecular category, training datasets, steps of training decision tree classifiers, how conflicting outputs are to be resolved to a single candidate molecular category, sample size, sample purity, DNA and RNA features associated with the categories, such as mutational burden, hotspot information, and pathogenic mutation information. Many other possibly important steps are also disclosed.  Figure 8 and its discussion set forth steps for treating the names and expression levels of the genes, including rank transformation, choosing hyperparameters, fitting a model, calculating gene importance values, and discarding a subset of unimportant gene data, which is repeated for feature data.  None of these specific steps are set forth in the independent claims.
Tables 3-5 of the specification list potential information which may be required to carry out the claimed invention. Table 3 lists some genes associated with a specific molecular category by name and NCBI accession number and gene ID.  (pages 87-179)  Table 4, at page 179-180, DNA features are listed, along with a feature description. Table 5, at page 180-247, DNA features possibly associated with a candidate molecular category of a particular cancer are listed, by name, and additionally by gene ID and NCBI accession number when applicable. None of these specific categories, genes, or features are required by the claims.
The specification exemplifies RNA and DNA expression datasets at pages 254-56.  The RNA expression datasets indicate expression levels for a plurality of genes or gene sets.  DNA expression data refers to “a level of DNA (e.g. copy number of a chromosome, gene or other genomic region)…” and it can relate to sequencing data for DNA.
e) With respect to the state of the art, and the indication of unpredictability of the subject matter, Sagi, Yang and Dong each discuss the state of the art of ensemble learning with biological data. 
For a definition of gradient boosting decision trees, Sagi (2018, of record) provides: 
“In Gradient boosting machines (GBM; Friedman, 2001), the training of each inducer is dependent on inducers that have already been trained. The main difference between GBM and other techniques is that in GBM optimization is applied in the function space. It includes a learning procedure in which the goal is to construct the base learners so that they are maximally
correlated with the negative gradient of the loss function, associated with the whole ensemble (Natekin & Knoll, 2013).” (p10)

	Sagi sets forth required elements of GBM not present in the independent claims: 
“More specifically, in GBM a sequence of regression trees is computed, where each successive tree predicts the pseudo-residuals of the preceding trees given an arbitrary differentiable loss function. An arbitrary loss function requires the specification of the loss function by the user, in addition to the function that calculates the corresponding negative gradient. Predictions are aggregated in an additive manner in which each added model is trained so it will minimize the loss function. It is important to note that a GBM model usually has many shallow trees, as opposed to random forest which has fewer (but deeper) trees. Choosing the right number of trees (i.e., number of iterations) is very important when training a gradient boosting model. Setting it too high can lead to overfitting, while setting it too low may result in underfitting. The selection of the most suitable number of iterations is usually done by using a validation set to evaluate the overall predictive performance. Overfitting can be reduced by applying a stochastic gradient boosting method (Friedman, 2002) in which trees are  consecutively trained with small subsets, sampled from the original dataset.” (p10)

XGBoost and LightGBM are prior art examples of GBM.  
“XGBoost (Chen & Guestrin, 2016) is a scalable machine learning system for tree boosting that gained popularity among machine learning practitioners in recent years as depicted in Figure 4… XGBoost added several algorithmic optimizations and refinements to GBM in order to increase scalability. First, it features split finding algorithms that (a) handle sparse data with nodes’ default directions, (b) address weighted data using merge and prune operations, and (c) enumerate efficiently over all possible splits so the splitting threshold is optimized. Another important refinement in XGBoost is that it adds a regularization component to the loss function presented in GBM, aimed at creating ensembles that are simpler and more generative. Finally, XGBoost can run faster than other models as it supports distributed platforms such as Apache Hadoop and can be distributed across multiple machines. LightGBM is another gradient boosting method recently developed by Microsoft (developed by Microsoft Research by Guolin Ke et al.). It uses histogram based algorithms to reduce duration and memory consumption when training the model. It also leverages network communication algorithms to optimize parallel learning (Zhu et al., 2017).” P10.

Sagi emphasizes that each level of model creation, training and implementation requires careful selection and modification based on the answers required, from providing the particular  training and sample data required, creating sufficient and separate training datasets, selecting base classifier type, how much to iterate in training to avoid overfitting, how many classifiers should be used, how should the inputs and outputs flow through the ensemble, how the errors are determined, to how to use the output to make a prediction or ultimate classification.  All of these elements are critical, and require disclosure, when claiming ensemble methods such as in the claims.
Dong provides a very recent survey of ensemble learning (2020, of record).  Dong emphasizes:
“traditional machine learning methods may fail to obtain satisfactory performances when dealing with complex data, such as imbalanced, high-dimensional, noisy data, etc. The reason behind is that it is difficult for these methods to capture multiple characteristics and underlying structure of data. In this context, it becomes an important topic in the data mining field that how to  effectively construct an efficient knowledge discovery and mining model. Ensemble learning, as one research hot spot, aims to integrate data fusion, data modeling, and data mining into a unified framework. Specifically, ensemble learning firstly extracts a set of features with a variety of transformations. Based on these learned features, multiple learning algorithms are utilized
to produce weak predictive results. Finally, ensemble learning fuses the informative knowledge from the above results obtained to achieve knowledge discovery and better predictive performance via voting schemes in an adaptive way.” (abstract).  

Dong’s review categorizes four main categories of ensemble learning, with their pros and cons (Figs 2,3 p242).  Pages 242-243, and Figs 4-6 discuss typical structures for ensemble classification, AdaBoost, and Gradient Boosting.  
“Table 6 shows Gradient Boosting randomly samples to get sample subsets and then each learner is constructed and trained to reduce the residuals generated by the previous learner. Consequently Gradient Boosting can make the sum of the final residuals from the integrated models small enough, thereby forcing the prediction close to the actual value…basic models in the gradient boosting are trained in a tandem way.” P243.  

Aspects of feature selection, feature subset selection, feature extraction, and redundancy removal all can act in dimension reduction and reduction of errors.  Training ensemble classifiers for prediction adds other challenges.  (p244).  Dong notes that not all basic classifiers are beneficial to the final result, leading to several differing approaches to removal of the negative classifiers, including considering sample space, feature space, cost functions, noisy data, confidence level calculations, types of voting.  (p244).  Issues in integrating results from basic models, and in ensembles is discussed.  Some address these by pruning, others by clustering, others by Bayesian network merge processes.  Refining basic models and ensemble models is also performed in any of a variety of ways depending on all of the data and selections in the previous steps.  Other considerations in the ensemble model include the result of the ensemble when the classifiers work in parallel vs using a successive classifier approach.  The minimal efficient number of classifiers needs to be estimated or determined, the stability of the model should be considered, the issue of sufficient diversity in the classifiers should be addressed, and errors assessed.  Dong summarizes the challenges in ensemble classification and prediction including the use of gradient boosting models as follows: 
	“Specifically, although most methods mainly consider improving the accuracy of the model at the architecture level of ensemble models, there are quite a few researches on  determining the appropriate model size and reducing the complexity of the model to increase the training speed. Moreover, ensemble classification models contain many characteristics like the diversity, accuracy, generalization and so on, and these characteristics are conflicting in improving performances of the models in certain cases. Hence, there are many existing methods exploring to combine these characteristics using mixture models or multi-objective functions to optimize them simultaneously, but there is a lack of researches on theoretically analyzing relationships among these characteristics. Moreover, some researches have proved that performances of ensemble classification models can be further improved by taking the interconnection and feedback between different levels such as sample level, feature level, etc. into account and optimizing these levels simultaneously, which needs more researches. Besides, it is also a feasible research issue of performing optimization at a higher level such as the  classifier collection level of ensemble model. Finally, it is necessary to expand the practical applications of ensemble classification to handle multiple-type data that may be semi-structured and unstructured, or continuous and discrete.” P246.

Yang (2010 of record) provides a review of ensemble methods in bioinformatics, and as applied to biological data.  Yang identifies the most widely used methods and their variants used in bioinformatics applications, and the rational of each approach.  (introduction). “The aim of designing/using ensemble methods is to achieve more accurate classification (on training data) as well as better generalization (on unseen data). However this is often achieved at the expense of increased model complexity (decreased model interpretability).” (p296, Fig 1). Yan goes on to set forth a generalization; the “ensemble approach is often explained using the classic bias-variance decomposition analysis. Specifically previous studies found that methods like bagging improve generalization by decreasing variance, while methods similar to boosting achieve this by decreasing bias.” (Figure 1b, and  p297). Yang discusses pros and cons, including that “the training datasets are often compounded by small sample size, high dimensionality, and high noise-to-signal ratio etc. Therefore, obtaining the best classification hypothesis is often nontrivial because there are a large number of suboptimal hypotheses in the hypothesis space (denoted as H in Fig. 2) which can fit the training data but do not generalize well on unseen data.” (p297, Fig 2). Such an ensemble is “approximating the best classification rule by using multiple rules.” (p297.) specific to boosting models, “As for boosting (Fig. 1(b)), diversity is obtained by increasing the weights of misclassified samples in an iterative manner. Each base classifier is trained and combined from the samples with different classification weights, and therefore, different hypotheses. By default, these three methods use decision trees as base classifiers because decision trees are sensitive to small changes on the training set [8], and thus suited for the perturbation procedure applied to the training data.” p297-298. In further analysis of the use of boosting methods on gene expression data, using the LogitBoost algorithm to replace the loss function of AdaBoost provides a more accurate classification of gene expression data. In analyzing MS-based datasets for proteins.  Finally, Yang discusses “Gertheiss and Tutz [37] designed a block-wise boosting algorithm to integrate feature selection and sample classification of mass spectrometry data. Based on LogitBoost, their method addresses the horizontal variability of the m/z values by dividing the m/z values into small subsets called blocks. Finally, the boosting ensemble has also been adopted as the classification and biomarker discovery component in the proteomic data analysis framework proposed by Yasui et al. [38].”
Yang discusses how to analyze the output of the various ensemble schemes, for instance to focus on more objective comparisons, or to add known data from other sources such as pathway databases (like KEGG which provide gene names, known functions, and other related data) (p300). Adding genome-related datasets such as SNP, or copy number differences obtained from GWAS analysis of the sample, can help identify susceptibility to disease (p300-301, Figs 6 and 7).  Yang reviews the pros and cons of extension of the ensemble methods to meta ensembles, use of different classification algorithms, or different feature selection processes (p302-304, Fig 8).  At each level of these ensembles, an identification of the type of classifier, an identification of the specific data employed for test and training, a structure of the ensemble, the structure of the feature categories (including molecular categories, and DNA mutation data features), and a clear disclosure of how they all act together, error assessments, selectivity and specificity for the final output must be carefully selected and tested by the user.  Yang concludes that “in classification and prediction, a carefully engineered ensemble algorithm generally offer higher accuracy and stability than a single algorithm can achieve.”  Yang also points out, increased model complexity can lead to decreased ability to interpret the results of the model, and require much more computing power than single algorithms.  (p305)
(a) The skilled practitioner would first turn to the instant specification to identify the specific algorithms or steps required to carry out the classification of sample data utilizing decision tree classifiers. However, the instant disclosure provides only gradient boosting decision tree classifiers, and the variety of steps, data and outcomes that are possible from those specific decision tree classifiers.  As such, the skilled practitioner would turn to the prior art for guidance as to how to determine a candidate molecular classification, utilizing 3 separate RNA-based decision tree classifiers, 3 separate RNA-based expression datasets from the sample, to identify that molecular candidate category. However, the prior art shows that even within the definition of gradient-boosted decision trees, additional information, steps, direction of information flow and error analysis are required, but not clearly required by the claims, nor disclosed in the specification.  As such, performing the invention of the claims would require undue experimentation.
Applicant’s arguments:
Applicant’s arguments were directed to previously interpreted limitations under 35 USC 112(f) and related issues under 112(a) and (b).  While the claims no longer recite means plus function limitations, the claims recite an invention which is not enabled, as set forth above.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-4, 12-15, 17, 19-20 is/are rejected under 35 U.S.C. 102a2 as being anticipated by Pan et al (US 2020/0199671 A1, published 6/25/2020; PTO-1449).
	Pan, W. et al. methods for detecting disease using analysis of RNA. US 2020/0199671 A1, 6/25/2020, having an effective filing date of at least 12/2019.  
	The claims are drawn to methods, systems and computer program products for identifying at least one candidate molecular category for a biological sample of a subject, through receiving RNA expression data from the sample, in at least three different subsets; processing the data using an ensemble of at least three RNA-based decision tree classifiers, the classifiers correspond to a hierarchy of molecular categories to which a sample may belong; where a first set of RNA expression data is applied to a first RNA based decision tree classifier to obtain a first output indicative of whether the first molecular category is a candidate category; repeating this for every different set of RNA expression data, then using at least some of the output to identify the candidate molecular category.
	Pan discloses: A method for identifying at least one candidate molecular category for a biological sample obtained from a subject (see PP12, 13; or claim 7), the method comprising: 
obtaining RNA expression data previously obtained by processing the biological sample obtained from the subject (PP 12, 13 and claim 7), 
wherein the RNA expression data comprises at least three sets of RNA expression data for three set of genes (see PP 13: “determine an expression level of a plurality of target RNA molecules in the biological test sample; compare the expression level of each of the plurality of target RNA molecules to an RNA tissue score matrix to determine a cancer indicator score for each of the plurality of target RNA molecule”) See also [0138] where expression data for molecules identified in tables 1-7, each table separately or in combination. Paragraphs [0139-0150] provide the tables, and representative gene names or transcript names.
processing the RNA expression data using a hierarchy of RNA-based decision tree classifiers corresponding to a hierarchy of molecular categories to obtain RNA-based machine learning classifier outputs including a first output and a second output and a third, et al (see PP 13; see also the section “classification model” at 157 -161 for the machine learning classifiers based on a predicted "score or probability"). Paragraph 0158 discloses decision tree classifiers and gradient-boosted decision tree classifiers. This section also addresses training for each classifier. 
the hierarchy of molecular categories including a parent molecular category and first and second molecular categories that are children of the parent molecular category in the hierarchy of molecular categories (see e.g. the cancer "type or subtype” in  12); paragraphs 0157-0161 also provide candidate molecular categories and hierarchies including normal, category A and not category A. Beginning at [0279] Example 1, and Table 9 set forth the parent molecular category cancer, and the subtypes or subcategories of types of cancer.  [0287] addresses further subtyping of particular cancers, i.e. triple negative breast cancer, Her2+, and her2+.HER- subtypes of breast cancer.
the hierarchy of RNA-based machine learning classifiers comprising at least three RNA-based decision tree classifiers corresponding to the at least three molecular categories, the processing comprising: 
processing the first/ second/ third RNA expression data using the first/ second/ third RNA-based machine learning classifier to obtain the first/ second/ third output indicative of whether the first/ second/ third molecular category is a candidate molecular category for the biological sample ("the cancer indicator score for the biological test sample exceeds a threshold value” in pp 13, “for each of the plurality of target RNA molecules"); 
identifying, using at least some of the RNA-based machine learning classifier outputs including the first output and the second output, at least one candidate molecular category for the biological sample (in pp 13: "detecting the presence of the cancer"; see also pp 15 See also Fig 17A-B and Fig 19).   See also the classification model [0157-0161] and [example 1].
These methods are all computer implemented using a computer system comprising inputs, outputs, processors, storage elements, memory elements, et al.  As such, claims 1, 24 and 25 are anticipated.
With respect to claims 2-4, Pan discusses utilizing a multiplicity of RNA datasets each comprising differing RNA expression data, which can include 3, 4, 5 or more.  (pp0010-0021)
	With respect to claim 12, Pan discloses using DNA expression data, at pp0167-0170; and the classification model at paragraphs 0157-0161.  
	With respect to claim 13, the sequencing techniques set forth at 0162-0170 discuss DNA features.  The classification model is disclosed at 0157-0161.
	With respect to claim 14, each feature is set forth in an ultimate and/or, such that only one feature type is required.  Pan discloses “one or more features indicating whether the NA expression data indicates the presence of one or more protein coding genes.” DNA features such a mutations, and genes associated with types of cancer or other categories are provided. (throughout, embodiments where the expressed gene transcript only comprises exons, known annotated genes etc.)
	With respect to claim 15, purity of the samples is discussed at [0073-0080] in the Examples, and the following materials and methods. [0270-0324]
	With respect to claim 17, a GUI is presented in the figures, and description of the computing elements.  Fig 13 shows one way to visualize categories (Cancer type); paragraphs 0171-0189 discuss computing elements, including display and GUI elements.
	With respect to claim 19, gradient boosting, logistic regression, random forest, neural networks and/or multinomial regression are each disclosed. [end of 0222, embodiment 86, 0253 et al.]
	With respect to claim 20, each table comprises at least 2-500 genes, including at least 10 and fewer than 300, paragraph 0139, 0140, et al..  
Applicant’s arguments
	Applicant’s arguments that Pan fails to provide at least three datasets, and at least three RNA-based classifiers for the identification of candidate molecular categories are not persuasive.  Pan clearly provides decision trees, and gradient boosted decision trees as cited above.  Pan clearly sets forth a multiplicity of possible separate sets of data, from Tables 1-7, each separate, or in various subcombinations.  Pan discloses the use of these ensembles to identify subcategories of certain cancers, such as breast cancer.

Claim(s) 1-4, 11-15, 17-21, 24-25 is/are rejected under 35 U.S.C. 102a2 as being anticipated by Michuda (US 2020/0365268 A1: PTO-1999).
	Michuda, J et al. systems and methods for multi-label cancer classification. US 2020/0365268 A1, 11/19/2020 (having priority to at least May 2020, possibly as early as February 2020.)
	With respect to claim 1, Michuda discloses: A method for identifying at least one candidate molecular category for a biological sample obtained from a subject (see 0015), the method comprising: 
obtaining RNA expression data previously obtained by processing the biological sample obtained from the subject (0015-0018), (0017, 0132 provides DNA expression data) 
wherein the RNA expression data comprises first/ second/ third RNA expression data for a first/ second/ third set of genes different from the first set of genes (see 0016, 0125- 0126 adds a second and additional pluralities) 
processing the RNA expression data using a hierarchy of RNA-based decision tree classifiers corresponding to a hierarchy of molecular categories to obtain RNA-based machine learning classifier outputs including a first output and a second output (see 0015-0035, identifying categories, features, and classifiers, also 0125-0126), and [0164, 0283, 0305, 0364] specifically recite XGBoost and LightGBM which are both decision tree classifiers, particularly gradient boosted decision tree classifiers.
the hierarchy of molecular categories including a parent molecular category and first and second (and third et al) molecular categories that are children of the parent molecular category in the hierarchy of molecular categories (see e.g. the cancer "type or subtype” in  0016-0035, 0125-0126), 
the hierarchy of RNA-based machine learning classifiers comprising first and second and third et al RNA-based machine learning classifiers corresponding to the first/ second/ third molecular categories, the processing comprising: 
processing the first RNA expression data using the first RNA-based machine learning classifier to obtain the first output indicative of whether the first molecular category is a candidate molecular category for the biological sample (see 0067, 0110-0112); 
processing the second and third et al RNA expression data using the second/ third RNA-based decision tree classifier to obtain the second/ third output indicative of whether the second/ third molecular category is a candidate molecular category for the biological sample (see 0067, 0110-0112); and 
identifying, using at least some of the RNA-based decision tree classifier outputs including the first output and the second output, at least one candidate molecular category for the biological sample (0127, Figs 14-A-B, 0186-0228).  
These methods are all computer implemented using a computer system comprising inputs, outputs, processors, storage elements, memory elements, et al.  As such, claims 1, 24 and 25 are anticipated.
With respect to claims 2-4, Pan discusses utilizing a multiplicity of RNA datasets each comprising differing RNA expression data, which can include 3 or more, et al.  (abstract, 0031-33, 0067, 0164-0171, 0110-0112, Figs 15A-C)
With respect to claim 11, Michuda provides ICD codes for certain categories, subcategories or diagnoses at [0287], particularly the ICD-10.  Certain codes associated with certain diagnoses are provided in this section. 
With respect to claim 12, Pan discloses using DNA expression data, at 0017, 0031-0033,  0125- 0127, 0132, 0300-0317; and the decision tree model at paragraphs 0157-0161, 0164-0171.  [0164, 0283, 0305, 0364] specifically recite XGBoost and LightGBM which are booth decision tree classifiers, particularly gradient boosted decision tree classifiers.
With respect to claim 13, paragraphs 0125, 0127, Figs 14A, B, 0204-0206, 0210, 0300 discuss DNA features.  The decision tree model is disclosed at 0157-0161, 0317. [0164, 0283, 0305, 0364] specifically recite XGBoost and LightGBM which are both decision tree classifiers, particularly gradient boosted decision tree classifiers.
	With respect to claim 14, each feature is set forth in an ultimate and/or, such that only one feature type is required.  Michuda discloses copy number variations 0251, tumor mutational burden 0153-0154, 0156, 0251, et al.  mutations 0251, encoded genes 0253, microsatellite instability 0251 et al. 
	With respect to claim 15, purity of the samples is discussed at [0073, 0133, 0136-0137, 0176, 0180, et al.]
	With respect to claim 17, a GUI is presented in the figures, and description of the computing elements, [0107].  The digital pathology reports represent GUI visualizing at least one category.
	With respect to claim 18, at least 10 classifiers is encompassed by the recitation of “at least two”, throughout, and at least 9 in 0210 et al.
With respect to claim 19, gradient boosting (0283), logistic regression(0165, 0172), random forest (0172, 0283), neural networks (0172, 0283) and/or multinomial distributions (0283) are each disclosed. [0283]. [0164, 0283, 0305, 0364] specifically recite XGBoost and LightGBM which are both decision tree classifiers, particularly gradient boosted decision tree classifiers.
	With respect to claim 20, Table 2 comprises at least 2 genes (0009, 0098, 0124), including up to and including 300, paragraph 0146.  
	Applicant’s arguments:
	Applicant’s arguments with respect to Michuda have been carefully considered but are not persuasive.  Michuda clearly utilizes multiple pluralities of datasets, including 3 or more, multiple decision tree classifiers, including at least 9, where the classifiers are decision tree classifiers (XGBoost and LightGBM as discussed above), and various subcategories of diseases such as cancers. 
New Grounds of rejection
Claim(s) 1-4, 11-15, 17, 19-20 and 24-25 is/are rejected under 35 U.S.C. 102a1 as being anticipated by Ma et al (16 April 2020).
	Ma et al. (published online 16 April 2020) Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data. Computers in Biology and Medicine, vol 121: 10 pages.
	With respect to claim 1, Ma discloses identifying a candidate molecular category for a sample from a subject, wherein different subtypes of carcinomas can be those categories (Table 1).  “The data used in this research were collected from The Cancer Genome Atlas (TCGA) project. We focused on four cancer types: kidney renal clear cell carcinoma (KIRC) with 537 samples, kidney renal papillary cell carcinoma (KIRP) with 291 samples, lung squamous cell
carcinoma (LUSC) with 504 samples and head and neck squamous cell carcinoma (HNSC) with 528 samples. Each sample has mRNA expression (Illumina mRNA-seq; Level 3), miRNA-seq data (Illumina HiSeq, miRNASeq; Level 3), DNA methylation data (Illumina Infinium Human DNA Methylation 450 K; Level 3) and clinical information.” P2.  Early and late stages can be further sub categories. (Table 1). “Finally, 12 core datasets were obtained for the down-stream analyses. Table 1 shows the number of patients in KIRC, KIRP, LUSC, and HNSC datasets, and the summary of DNA methylation, mRNA expression and miRNA expression data in this study.”
	Ma processes each set of RNA expression data using a separate decision tree classifier, XGBoost, section 2.3.1.  “XGBoost is a regression tree that has the same decision rules as decision tree. It supports both regression and classification. This algorithm is an efficient and scalable variant of the gradient boosting machine (GBM), which has been widely applied to computer visioning, data mining and other fields [16,17]. Recently, as a type of gradient boosting machine, XGBoost is mainly improved in two aspects: speeding up the tree construction and proposing a new distributed algorithm for tree searching [18]. Optimizing the value of the objective function is the core of XGBoost. Given a dataset D = {(xiyi)} where xi denotes the gene expression profile (or DNA methylation probe) of tumor, yi is the corresponding binary label (early stage or late stage).”
	Each RNA-decision tree classifier is trained on differing RNA expression datasets, section 2.3.3 Model optimization.  
	Section 3.1 provides tables of predictive performance information for XGBoost in the classification of a cancer type or subtype.  XGBoost was better at classifying cancer stage prediction in 9 out of 12 datasets, than other tested classifiers.
	Ma provides evaluation and validation steps in section 2.4-2.5.
	The methods of Ma are all computer implemented, using computers with inputs, processors, memory and outputs.  As such claims 1, 24 and 25 are anticipated.
	With respect to claim 2-4, Ma states up to 4 mRNA expression level datasets were used in this paper. Additonally, 4 DNA feature datasets (methylation profiles) and 4 micro RNA-expression datasets. Combining the mRNA and miRNA datasets, up to 8 RNA-expression datasets are provided, and applied to up to 8 RNA-based classifiers.  
	With respect to claim 11, TCGA can provide any related ICD codes, as set forth in the registration and lab test orders. (found on TCGA website, linked out to a clinical data page: Clinical data forms used by the TSS: “Patient Referrals for Collections; Patients referred to any of our Laboratory Service Centers should proceed to registration and present the written provider order for laboratory testing. In some cases, such as genetic testing, a consent form may be required. A laboratory test order is required (i.e., paper requisition, electronic medical record, ChildLinkTM provider). The laboratory test request must provide the following information: Ordering provider's full name, address, phone number, and provider signature, Patient’s name and date of birth, Test(s) requested, Diagnosis and/or ICD-10 Codes, Date and time of order”
	With respect to claim 12-13, DNA methylation datasets can be obtained and processed using DNA-based classifiers, wherein some of the DNA data is used to identify the candidate molecular category. See Table 1, Methylation probe datasets, and section 2.4, classification model based on multi-omics.  Gene names, as a DNA feature, can be provided.  As well as pathway information for gene pathways linked to each type of molecular cancer category, see Fig 3.  
	With respect to claim 14, the DNA features that correspond highest to a cancer type in Fig 3 are in blue, indicating pathogenic mutations, or pathogenic levels of expression.
	With respect to claim 15, tumor purity is a datapoint which can be retrieved from TCGA for each dataset.
	With respect to claim 17, the graphs and displays are representative graphical user interface outputs.
	With respect to claim 19, XGBoost is a gradient boosted decision tree classifier.  
	With respect to claim 20, Table 1 discloses the mRNA probes, which are equivalent to genes.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Wang 2017 LightGBM: an effective miRNA classification method in Breast cancer patients. ICCB, 2017, Oct 18-20 2017, Newark NJ, USA. P7-12.  Wang applies decision tree classifiers to microRNA molecules to classify the presence of breast cancer.
Ramroach et al. 2019 The efficacy of various machine learning models for multi-class classification of RNA-seq expression data.  in Arai et al (eds) CompCom 2019, AISC 997, pp918-928, 2019.  Gradient boosted decision trees are used to classify subtypes of cancers, using RNA expression data, and gene features.
	Li et al. (2019) putative biomarkers for predicting tumor sample purity based on gene expression data. BMC genomics vol 20: 1021. 12 pages.  Li uses XGBoost on data from TCGA on 33 tumor types to predict tumor purity using RNA-seq data.
	Hemphill et al. (2014) Feature selection and classifier performance on diverse biological datasets.  BMC bioinformatics, 15(Suppl 13):54, 14 pages.  Hemphill utilizes decision tree classifiers and gradient boosted decision tree classifiers to assess identification of a tissue of origin of a tumor sample. Gene and protein expression data, SNP data, and miRNA data were utilized.
	Chen et al. (2018) EGBMMDA: extreme gradient boosting machine for miRNA association prediction.  Cell Death and Disease, vol 9 issue 3, 16 pages.  Chen applies extreme gradient boosting decision trees to microRNA expression data, to predict a cancer type such as lymphoma, prostate neoplasms, breast neoplasms, etc.
___________________
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARY K ZEMAN whose telephone number is 5712720723.  The examiner can normally be reached on 8am-2pm M-F.  Email may be sent to mary.zeman@uspto.gov if the appropriate permissions have been filed.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Karl Skowronek can be reached on 571 272 9047.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MARY K ZEMAN/            Primary Examiner, Art Unit 1631