abstract |
Cancer is a genetic disease initiated by somatic mutations and progressed by an accumulation of genomic aberrations. Differentiating cancer driver somatic mutations from passenger and benign mutations is a critical step toward better understanding of cancer biology. It also provides important insights into cancer detection and prognosis monitoring. Provided herein are machine learning methods that utilize a deep-learning framework to predict mutation-associated pathogenicity, including cancer-related pathogenicity risk of somatic mutations. The methods incorporate not only an annotation comprising functional features, genomic features, epigenetic features, and other annotated features related to the mutation, but also a separate annotation including the surrounding sequence content of the test mutation. The methods can provide a quantitative score from the two or more annotation sets of a mutant reflecting the pathogenic risk of a mutation, including those involved in carcinogenesis and cancer progression. |