More tDETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The following claims is/are pending in this office action: 1, 3-11, 13-28
The following claim(s) is/are amended: 1, 3-11, 14-20
The following claim(s) is/are new: 21-28
The following claim(s) is/are cancelled: 2 and 12
Claim(s) rejected: 1-28

Previous Rejections Withdrawn
Rejections to claims 3, 4, 13, and 14 under 35 U.S.C. 112(b) are withdrawn based on the
Amendments.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 

Claims 1, 3-11, and 13-28 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et al. (“The Human Splicing Code Reveals New Insights into the genetic determinants of disease.” Hereinafter “Xiong”) in view of Tabibiazar et al. (US 2008/0300797A1, hereinafter “Tabibiazar”).

Regarding a claim 1, Xiong teaches A method for training a neural network comprising: (Section 4 Para 1: “An ensemble of neural network models that relates the RNA features and the observed W values was fitted to the exons in the training dataset.” Section 4 Last Para: “Next, we describe the targets used in training these two computational models.”)
providing the neural network, wherein the neural network is configured to process a
biological sequence (Section “A computational model of splicing” Para 1: “Misregulation of splicing contributes substantially to human disease (10), so we developed a computational model of splicing regulation that can be applied to any sequence containing a triplet of exons (Fig. IB). The method extracts DNA sequence features (or cis elements)… RNA sequencing (RNA-seq) data from the Illumina Human Body Map 2.0 project (NCBI GSE30611) were used to estimate Ψ for each exon in each of 16 human tissues.” Biological sequence comprising DNA and RNA features are fed in NN to process (in other words NN model was applied on biological sequence comprising DNA and RNA features). Spec Para 0056 mentions that Biological sequence comprises DNA and RNA.)
( Section: “A computational model of splicing”: “The method extracts DNA sequence features (or cis elements) and, for a given cell type, uses them to predict the percentage of transcripts with the central exon spliced in (Ψ), along with a Bayesian confidence estimate.”  “Unlike existing methods (3, 11, 12), our computational model was derived using human data, incorporates over 300 new sequence features, and outputs real-valued absolute Ψ values for individual tissues.” Per Spec Para 0072 states “The associated label, set of labels, or more structured information may be determined from a discrete molecular phenotype or a continuous molecular phenotype, such as the percent of transcripts with an exon spliced in…” This infers that percent of transcripts with an exon spliced represents molecular phenotype. Xiong in above para also teaches the predicted value also includes numerical value which quantifies individual tissues (in other words cells.)
(ii) a conservation value corresponding to the biological sequence (Section Discussion Para 5: “Furthermore, when we examined 15,386 disease variants and 1519 common SNPs within intronic regions with moderate to high conservation across vertebrates (PhastCons score > 0.5), we found that our method more accurately detects disease variants (P < l x 10-320 KS test, 60.1%) than scoring them using conservation (P = 2.2 x 10-166 , KS test, 38.2%).” Section 5.4 Para 2: “We also found that the MBNL-related features themselves were also predictive. For example, the single most predictive feature is the conservation weighted MBNL motif count in the 12 5' region.” Conservation scores or value of a biological sequence is an important feature for predicting molecular phenotype associated with a disease. Once a relevant molecular phenotype associated with a disease is predicted, it’s corresponding conservation scores will also be identified, which will tell which conservation scores are more likely associated with a phenotype of interest or causing a disease. As in Xiong, they found that with moderate to high conservation across vertebrates, the method is more likely to detect disease variants.)
(b) providing a training data set comprising (i) a set of inputs comprising biological sequences (Section “A computational model for splicing” Para 1: “To train the model, we mined 10,689 exons that displayed evidence of alternative splicing and extracted 1393 sequence features from each exon and its neighboring intrans and exons.” “The training of our regulatory model was based on 75bp single-end RNA-seq data from the Illumina Human BodyMap 2.0 project (NCBI GEO accession GSE30611 ), which was derived using poly-A selected mRNA from sixteen diverse human sources, including adrenal, adipose, brain, breast, colon, heart, kidney, liver, lung, lymph node, ovary, prostate, skeletal muscle, testis and thyroid tissues, plus white blood cells.” RNA-Seq which represents biological sequence included in training data.)
(ii) for each input in the set of one or more inputs, a set of one or more molecular
phenotypes corresponding to the input (Section “A computational model for splicing” Para 1: “To train the model, we mined 10,689 exons that displayed evidence of alternative splicing and extracted 1393 sequence features from each exon and its neighboring intrans and exons.” Section 4.3 Para 1: “This model was trained using the same objective function, RNA features, splicing patterns and dataset partitions as the Bayesian neural network model described above.” Molecular phenotype comprises splicing pattern per Spec Para 0072 and an alternative splice site per Spec Para 0014.Training data in Xiong includes these data.)
(iii) a set of conservation values corresponding to each of at least a portion of the set of inputs (Section 5.4 Para 2: “To compute the model predictions for the knockdown, we set the 24 MBNL-related features to the average value found in the training dataset… We also found that the MBNL-related features themselves were also predictive. For example, the single most predictive feature is the conservation weighted MBNL motif count in the 12 5' region.” It infers that conservation score or value was also part of features included in the training dataset.)
(d) outputting the one or more set of parameters of the trained neural network (Section “A computational model of splicing” Para 1: “Unlike existing methods (3, 11, 12), our computational model was derived using human data, incorporates over 300 new sequence features, and outputs realvalued absolute'¥ values for individual tissues.”)
Xiong does not explicitly teach configuring a set of parameters of the neural network based on the training data set to minimize a total loss of the training data set, thereby training the neural network.
Tabibiazar, however, teaches configuring a set of parameters of the neural network based on the training data set to minimize a total loss of the training data set, thereby training the neural network (Para 0034: "Within such a model, parameters may be appropriately selected so as to provide for a desired balance of sensitivity and selectivity." Para 0156: "These outputs are then compared to the target values; any difference corresponds to an error. This error or criterion function is some scalar function of the weights and is minimized when the network outputs match the desired outputs. Thus, the weights are adjusted to reduce this measure of error." Parameters of NN can be adjusted using weight values to minimize output error.)
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine the neural network and phenotype prediction method of Xiong with loss minimizing method of Tabibiazar to improve the predictive ability of neural network (Tabibiazar, Para 0269).

Regarding claim 3, Xiong and Tabibiazar teach the method of claim 1.
Xiong also teaches wherein the total loss of the training data set is minimized at least in part by minimizing (i) a total loss of the set of molecular phenotypes or (ii) a total loss of the set of conservation values (Section 3 Para 1: “The current RNA feature set is based on our previous one (3), but includes ~40% more features and some modifications of previously defined features, which were found to improve prediction performance significantly. The following notes describe these changes and additions.” Section 4 Para 3: “We also found that accuracy was improved by the addition of many of the new features, including frequencies and locations of Alu elements and nucleosome positioning features.” By modifying the inclusion of RNA features (which includes conservation), prediction performance is improved. In other words, loss in prediction of molecular phenotype is minimized.).

Regarding claim 4, Xiong and Tabibiazar teach the method of claim 1.
(Section 3 Para 1: “The current RNA feature set is based on our previous one (3), but includes ~40% more features and some modifications of previously defined features, which were found to improve prediction performance significantly. The following notes describe these changes and additions.” Section 4 Para 3: “We also found that accuracy was improved by the addition of many of the new features, including frequencies and locations of Alu elements and nucleosome positioning features.” By modifying the inclusion of RNA features (which includes conservation), prediction performance is improved. In other words, loss in prediction of molecular phenotype is minimized.).

Regarding claim 5, Xiong and Tabibiazar teach the method of claim 1.
Xiong also teaches further comprising providing a test biological sequence, and processing the test biological sequence using the trained neural network to determine a molecular phenotype corresponding to the test biological sequence (Fig. 1 B shows test sequence is being fed into computational model (or NN model) to predict percent spliced in (which represents molecular phenotype as we learnt from Spec Para 0072).).

Regarding claim 6, Xiong and Tabibiazar teach the method of claim 5.
Xiong also teaches processing the test biological sequence using the trained neural network to determine a conservation value corresponding to the test biological sequence (Section Discussion Para 5: “Furthermore, when we examined 15,386 disease variants and 1519 common SNPs within intronic regions with moderate to high conservation across vertebrates (PhastCons score > 0.5), we found that our method more accurately detects disease variants (P < l x 10-320 KS test, 60.1%) than scoring them using conservation (P = 2.2 x 10-166, KS test, 38.2%).” Per Fig 1B, test sequence is being used in the model to predict percent spliced in (which represents molecular phenotype). Once a relevant molecular phenotype is predicted, it’s corresponding conservation scores will also be identified, which will tell which conservation scores are more likely associated with a phenotype of interest or causing a disease. As in Xiong, they found that with moderate to high conservation across vertebrates, the method is more likely to detect disease variants).

Regarding claim 7, Xiong and Tabibiazar teach the method of claim 6.
Xiong also teaches wherein the trained neural network comprises a single intermediate layer configured to determine the molecular phenotype and the conservation value corresponding to the test biological sequence (Page 7 Para 2: “The structure of a single model in the ensemble is a two-layer neural network with sigmoidal hidden units shared across tissues. It is capable of modeling complex non linear and context-dependent interactions between the RNA features and the splicing patterns ... In this model, there are in total 41,820 potential input-to-hidden parameters and 960 hidden-to-output parameters.” Fig. 1B: shows the features related to Test sequence. Page 5 Section 6.1. “To ascertain the quantitative effect that a single feature F is predicted to have on 'P for a given cis- and transcontext, we define the exon-specific feature sensitivity, denoted by AW/AF, which is an estimate of the partial derivative of predicted W with respect to a feature F ... Because our regulatory model can make non-linear predictions with interactive features, both the magnitude and sign of AW/ AF can be different for different exons depending on their ciselement contexts.” Two-layer neural network (which uses one intermediate layer) is used to determine the RNA features (which includes molecular phenotype and conservation values) related to biological sequence by scoring them how much each of them is effective in predicting the output) and the conservation value corresponding to the test biological sequence (Page 14, Section 5.4 Para 2: “We also found that the MBNL-related features themselves were also predictive. For example, the single most predictive feature is the conservation weighted MBNL motif count in the 12 5' region.” Fig 1B shows these features are related to a test sequence. As we learnt from claim 1 conservation is associated with molecular phenotype. Once molecular phenotype is predicted, associated conservation values will also be known.).

 Regarding claim 8, Xiong and Tabibiazar teach the method of claim 6.
Xiong also teaches trained neural network comprises a plurality of intermediate layers, wherein a last layer of the plurality of intermediate layers is configured to determine the molecular phenotype and the conservation value corresponding to the test biological sequence (Page 1444 Section Rationale: “We used "deep learning" computer algorithms to derive a computational model that takes as input DNA sequences and applies general rules to predict splicing in human tissues.” Page 7 Par 2: “Subsequently, these nonlinear hidden variables are combined by a softmax function to produce the prediction. The tissues were trained jointly as separate output units and shared the same set of hidden variables, enabling information about RNA feature usage to be combined across tissues.” Page 14 Section 5.4 Para 2: “To compute the model predictions for the knockdown, we set the 24 MBNL-related features to the average value found in the training dataset. By comparing the knockdown prediction to the original prediction, we computed a MBNL regulatory score for each exon, similar to the regulatory score for SNVs (see Sec. S7.1). We found that the exons affected by MBNL knockdown had significantly higher predicted MBNL regulatory scores as described in the main text.” Deep learning NN (which by definition has more than one intermediate or hidden layers) is also used. RNA features are combined in hidden layers. So last layer of the NN can be configured to ascertain the which specific RNA features are more effective in predicting the output by scoring them.)
and the conservation value corresponding to the test biological sequence (Page 14, Section 5.4 Para 2: “We also found that the MBNL-related features themselves were also predictive. For example, the single most predictive feature is the conservation weighted MBNL motif count in the 12 5' region.” Section “A computational model of splicing”: “To train the model, we mined 10,689 exons that displayed evidence of alternative splicing and extracted 1393 sequence features from each exon and its neighboring intrans and exons.” Fig 1B shows these features are related to a test sequence. As we learnt from claim 1 conservation is associated with molecular phenotype. Also Spec Para 0028 mentions: “In some embodiments, the molecular phenotype corresponding to the test biological sequence is determined based on the conservation value corresponding to the test biological sequence.” Therefore, once molecular phenotype is predicted, associated conservation values will also be identified because conservation are part of features used in training the neural network. A neural network can give set of features as output during feature extraction process per Spec Para 0074: “Unsupervised learning may be used to train a machine learning model to take a biological sequence as input and output a set of features that are useful in describing the input. This is called feature extraction.” Therefore, during feature extraction process in Xiong (as conservation is part of features), conservation can be an output from neural network.).

Regarding claim 9, Xiong and Tabibiazar teach the method of claim 6.
Xiong also teaches wherein the trained neural network comprises a plurality of intermediate layers (Page 144 Section Rationale: “We used "deep learning" computer algorithms to derive a computational model that takes as input DNA sequences”)
wherein a first layer of the plurality of intermediate layers is configured to determine the molecular phenotype corresponding to the test biological sequence, and wherein a second layer of the plurality of intermediate layers is configured to determine the conservation value corresponding to the test biological sequence (Section “Rationale” Para 1: “We used "deep learning'' computer algorithms to derive a computational model that takes as input DNA sequences and applies general rules to predict splicing in human tissues. Given a test variant, which may be up to 300 nucleotides into an intron, our model can be used to compute a score for how much the variant alters splicing.” Splicing refers to molecular phenotype. Section 5.4 Para 2: “We also found that the MBNL-related features themselves were also predictive. For example, the single most predictive feature is the conservation weighted MBNL motif count in the 12 5' region, which corresponds to a pvalue of 3 .1 e-11.” The model was also used to identify the predictive feature values that associated molecular phenotype. Since deep NN is used, second layer can be used for this determination.).

Regarding claim 10, Xiong and Tabibiazar teach the method of claim 6.
Xiong also teaches wherein the molecular phenotype corresponding to the test biological sequence is determined based at least in part on the conservation value corresponding to the test biological sequence (Section “Discussion” Para 5: “Furthermore, when we examined 15,386 disease variants and 1519 common SNPs within intronic regions with moderate to high conservation across vertebrates (PhastCons score > 0.5), we found that our method more accurately detects disease variants.” Section 5.4 Para 2: “For example, the single most predictive feature is the conservation weighted MBNL motif count in the 12 5' region, which corresponds to a pvalue of 3 .1 e-11, and combining all MBNL features produce a p-value of 2.5e-11.” Conservation is part of the features used in predicting molecular phenotype (or splicing percentage per Fig 1B.).

Regarding claim 11, it is substantially similar to claim 1, and is rejected in the same manner, the same art, and reasoning applying.

Regarding claims 13-20, they are substantially similar to claim 3-10 respectively, and are rejected in the same manner, the same art, and reasoning applying.

Regarding claim 21, Xiong and Tabibiazar teach the method of claim 1.
(Conclusion: “Our computational model was trained to predict splicing from DNA sequence alone, without using disease annotations or population data.” Section: “A computational model of splicing” Para 1: “RNA sequencing (RNA-seq) data from the Illumina Human Body Map 2.0 project (NCBI GSE30611) were used to estimate Ψ for each exon in each of 16 human tissues.”).

Regarding claim 22, Xiong and Tabibiazar teach the method of claim 1.
Xiong also teaches wherein the biological sequences comprise a genetic variant as compared to a reference genome, wherein the genetic variant comprises a substitution, an insertion, a deletion, or a combination thereof (Page 20 Last Para: “After an extensive literature survey, we identified over 300 variations, including substitutions, insertions and deletions. For all of these, we used our regulatory model to predict the mutation-induced ∆Ψ.” Page 2 Para 2: “To facilitate precision medicine and whole-genome annotation, we developed a machine-learning technique that scores how strongly genetic variants affect RNA splicing.” Fig 1B shows biological sequence fed into the model which detects genetic variants.).

Regarding claim 23, Xiong and Tabibiazar teach the method of claim 22.
Xiong also teaches wherein the genetic variant is selected from the group consisting of a nucleotide variant, a single base substitution, a copy number variation (CNV), a single nucleotide variant (SNV), an insertion or deletion (indel), a fusion, a transversion, a (Page 3 Section Genome-wide analysis of splicing misregulation and disease: “To assess the implications of genetic variation for splicing regulation, we mapped 658,420 single- nucleotide variations (SNVs) to exonic and intronic sequences containing the regulatory code for ~ 120,000 exons in ~ 16,000 genes (13).” Section 10.1 Para 1: “Genetic variations that occur in known genes…Other information, such as small insertions/deletions (indels), CNVs, structural variants (SV s) were also provided.” Section 10 Para 1: “They are all Caucasians without any other known cytogenetic findings for autism (e.g., chromosome 15q duplication.” Xiong teaches relevant terminologies with reference to genetic variant.).

Regarding claim 24, Xiong and Tabibiazar teach the method of claim 1.
Xiong also teaches wherein the set of molecular phenotypes comprises a level or a percentage of transcripts that include an exon, a level or a percentage of transcripts that use an alternative splice site, a level or a percentage of transcripts that use an alternative polyadenylation site, an affinity of an RNA-protein interaction, an affinity of a DNA-protein interaction, a specificity of an RNA-binding protein, a specificity of a DNA-binding protein, a specificity of a microRNA-RNA interaction, a level of protein phosphorylation, a phosphorylation pattern, a distribution of proteins along a strand of DNA containing a gene, a number of copies of gene transcripts, a distribution of proteins along a transcript, a number of proteins, or a combination thereof (Section Discussion Para 3: “To compare our method with using functional genome annotations, we removed missense exonic SNVs that may affect phenotype without changing splicing regulation” Section “A computational model of splicing”: Para 1: “DNA sequence features (or cis elements) and, for a given cell type, uses them to predict the percentage of transcripts with the central exon spliced in (Ψ).” Section 5.3 Para 1: “To see if our model can effectively account for information obtained from independent measurement on binding affinities of RNA-binding proteins (RBPs),” Section Discussion Last Para: “mRNA turnover, protein synthesis, and protein stabilization.” Section 8 Para 1: “and 1 μg of each RNA sample was used per 20-μl reaction for first-strand cDNA synthesis.” Xiong mentions relevant terminologies with references to phenotype.).

Regarding claim 25-28, they are substantially similar to claims 21-24 and are rejected in the same manner, the same art, and reasoning applying.

Response to Arguments
Applicant’s arguments filed on 04/12/2021 with respect to the 35 U.S.C. 103 rejections have been fully considered. Claims 1, 3-11, 13-20 have been amended by the applicant, and new claims 21-28 were added by applicant. New amendments and claims have been added in 103 rejections and relevant citations have been provided. Applicant’s arguments are responded below: 

Applicant’s Argument 1: However, nothing in Tabibiazar teaches or discloses, or even suggests, “a neural network ... configured to process a biological sequence to determine (i) a molecular phenotype corresponding to the biological sequence, wherein the molecular 
(emphases added)

Examiner’s Response 1: They are new limitations added in claim 1, and 11. They are taught by Xiong et al. Relevant citations have been provided in 103 rejection section.

Applicant’s Argument 2: However, nothing in Xiong teaches or discloses, or even suggests, “a neural network...configured to process a biological sequence to determine ... a conservation value corresponding to the biological sequence,” as recited in claims 1 and 11 (emphases added).”

Examiner’s Response 2: Relevant citations added in 103 sections from Xiong that teach the above limitations. Xiong teaches percentage of transcripts with the central exon spliced in (Section “A computational model of splicing.”). Per Spec Para 0072 states “The associated label, set of labels, or more structured information may be determined from a discrete molecular phenotype or a continuous molecular phenotype, such as the percent of transcripts with an exon spliced in…” This infers that percent of transcripts with an exon spliced represents molecular phenotype. Xiong in above para also teaches the predicted value also includes numerical value which quantifies individual tissues (in other words cells.)
Per Fig 1B, Biological sequence comprising DNA and RNA features are fed in NN to process (in other words NN model was applied on biological sequence comprising DNA and RNA 
Also Xiong teaches conservation values. Section 5.4 Para 2: “We also found that the MBNL-related features themselves were also predictive. For example, the single most predictive feature is the conservation weighted MBNL motif count in the 12 5' region.” Section “A computational model of splicing”: “To train the model, we mined 10,689 exons that displayed evidence of alternative splicing and extracted 1393 sequence features from each exon and its neighboring intrans and exons.” Fig 1B shows these features are related to a test sequence. As Spec Para 0028 mentions: “In some embodiments, the molecular phenotype corresponding to the test biological sequence is determined based on the conservation value corresponding to the test biological sequence.” Therefore, once molecular phenotype is predicted, associated conservation values will also be identified because conservation are part of features used in training the neural network. A neural network can give set of features as output during feature extraction process per Spec Para 0074: “Unsupervised learning may be used to train a machine learning model to take a biological sequence as input and output a set of features that are useful in describing the input. This is called feature extraction.” Therefore, during feature extraction process in Xiong (as conservation is part of features), conservation can be an output from neural network.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
An inquiry concerning this communication or earlier communication from the examiner should be directed QAMAR IQBAL whose telephone number is 571-272-2563. The examiner can normally be reached on M-F 10-6pm (EST). 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR 

/Q.I/ 
Examiner 
Art unit 2123
06/12/2021

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123