-
oa Bioinformatic parallel processing tools development for mutation identification from whole exome data following homozygosity mapping for autosomal recessive disorders
- Publisher: Hamad bin Khalifa University Press (HBKU Press)
- Source: Qatar Foundation Annual Research Forum Proceedings, Qatar Foundation Annual Research Forum Volume 2012 Issue 1, Oct 2012, Volume 2012, BMP91
Abstract
Eight consanguineous Arab families with novel autosomal recessive disorders were mapped with illumina 700K SNP. All relevant positional candidate genes were screened for pathogenic mutations. None were identified. Multiple homozygosity intervals were obtained for each family since no significant LOD scores were possible. Whole exome sequencing was on ABI SOLiD4 for 1 affected individual from each family. Mapping and annotation was on LifeScope software. Data validation was done manually for each linkage interval, by visual inspection of read depth and bead number coverage. On average 30,000 sequence variations were detected in each sample including novel variants, known polymorphisms & exome sequencing errors. For each chromosome with a linkage interval, data was isolated and filtered by exportation to Excel spreadsheets and visual inspection to exclude non-linkage interval data. The number of variants in the linkage intervals for each family was between 400 and 1300. Homozygous sequence variations within the linkage intervals were between 50 and 300 with 15-30 novel variants. Determination if a variant was homozygous or heterozygous, novel or annotated was done manually upon visual inspection of data on Excel spreadsheets. For each novel variant it was manually determined if it were exonic, splice site specific or intronic. For each annotated variation it was manually determined if it is associated to a disease phenotype relevant to the family disease. Minor genotype frequency was investigated for annotated variants if they represent disease states. All novel exonic variants were tested in silico with PolyPhen and Sift Protein Modeling software to access the effect on protein function. All damaging variants (novel or annotated exonic, and splice site) were validated by Sanger sequencing and tested for co-segregation to disease. An identical approach is essential to access pathogenic effects of insertion/deletion variants within each linkage interval. This approach is tedious, involves a tremendous amount of manual work and is prone to oversight errors. Software tools development for automating next-generation sequencing data analysis is essential to eliminate manual work and identify pathogenic mutations among the plethora of existing variants. Such automation is applicable in cases without linkage intervals to limit the number of variants under consideration.