-
oa Personalized Medicine And Genomic Wide Association Study Based On Innovative Big Data Analytic And Data Mining Paradigm
- Publisher: Hamad bin Khalifa University Press (HBKU Press)
- Source: Qatar Foundation Annual Research Conference Proceedings, Qatar Foundation Annual Research Conference Proceedings Volume 2014 Issue 1, Nov 2014, Volume 2014, HBPP0679
Abstract
Personalized medicine uses information about an individual's genes, proteins, environment, and phenotype data to prevent, diagnose, and treat diseases. In addition, the innovative bio-markers discovery as the key of personalized medicine across multiple tumor types has unlocked new information about cancer biology by providing critical insights to biological, pathogenic and pharmacologic responses to treatment. In this decade, the finalization of the human genome project, when a complete sequence was published for the first time, created the potential to identify a large set of single nucleotide polymorphisms (SNPs) across the entire genome. Consequently, this has opened the door to possibilities for great improvements in diagnosis and therapeutics. In addition, the availability of massive amounts of Genomic Wide Association Study (GWAS) data has necessitated the development of new data mining and machine learning methods for quality control, imputation and analysis issues including multiple testing, predictive modeling for chronic diseases, and to discover variants that could lead to a particular trait/disease. Currently, personalized medicine faces multiple issues when trying to predict complex diseases such as cardiovascular, cancer, and asthma…etc. Yet, disease prediction still based on SNPs and few environmental factors, while complex diseases are usually affected by gene-to-gene interactions and many environmental factors which have great impact and significance on the predicted outcomes. Therefore, the current challenge is to develop a unique personalized medicine system as an approach to discover that some tumors have unique pathologic and molecular characteristics that may warrant different treatment strategies. This research is based on the announcement of Qatar national genome project (to map the genome of the entire population of Qatar for delivering personalized medicine). The goal is develop a genomics data hub and establish an advanced big data analytic with modern data mining predictive modeling with high performance computing for memory-intensive genomic analysis/variant and data-intensive clinical analytic using petabytes of phenotype and Omics databases. Therefore, by understanding specific differences in tumor biology, researchers are identifying bio-markers for many tumor types, which are helping them to develop treatments targeting these underlying disease pathways. With these targeted therapies, clinicians can develop a more specific treatment strategy for some individuals that are potentially more effective based on the individual's tumor characteristics. The experimental and simulated genome-wide SNP data provided by the Genetic Analysis Workshop 16 and 17 will be utilized to investigate the new machine learning technique. This data afforded an opportunity to analyze the applicability and benefit of current machine learning methods, namely, penalized regression, ensemble learning methods, and network analyses resulted in several new findings while known and simulated genetic risk variants were also identified. The integrated strategies of both phenotype and Omics databases, implementation, and the learning processes are briefly proposed. The motivation of this research is to identify and discuss those GWAS challenges that will require breakthrough and innovative big data analytic and advanced predictive modeling frameworks to handle massive GWAS data towards personalized medicine at bedside. The ultimate goal is to deliver the right treatments to the right patients at the right time.