Authors: Assistant Professor Pushpa Rajita G, Research Scholar Anitha Udayakumar
Abstract: With the introduction of high throughput genomics sequencing technologies, there has been an explosion of genetic information. However, predicting any useful clinical insights from this wealth of information poses major challenges. Prediction of diseases at an early stage using genomics analysis would allow us to take preventive actions prior to manifestation. In this paper, we propose a complete deep learning-based bioinformatics framework for disease prediction using DNA sequencing data. This framework uses three key elements including (1) a hybrid CNN-RNN architecture for performing variant calls as well as extracting important features from the sequencing information, (2) a Graph Attention Network (GAT) for modeling interaction networks between genes, and (3) a multi-modal fusion layer combining genomic, epigenomic, and clinical information. Tested on three large scale databases (TCGA for cancers, UK Biobank for cardiovascular disease, and ADNI for Alzheimer's disease), our proposed framework obtained AUC values of 0.956, 0.934, and 0.921 respectively, which is much higher than traditional GWAS approach as well as several deep learning baselines. Additionally, we have proposed novel attention maps providing biological insights of pathogenic variants and their interaction network. We have conducted future proofing of this model on 500 patients at high risk.
DOI: http://doi.org/
International Journal of Science, Engineering and Technology