Computational methods for efficient exome sequencing-based genetic testing
Exome sequencing, the process of sequencing the set of all known exons simultaneously using next-generation sequencing technology, has dramatically changed the landscape of genetic research and genetic testing. The incredible volume of data produced by these experiments creates challenges in: 1) annotating the affects of observed variants, 2) filtering to remove noise, 3) identifying plausible disease-causing variants, and 4) validating experimental results. Here we will present a series of bioinformatic tools and techniques intended to address these challenges with exome sequencing and associated validation experiments.
First, we will present the Automated Sequence Analysis Pipeline (ASAP), a tool for the efficient and automated management, detection and annotation of Sanger sequencing-based genetic testing and variant validation. This pipeline is extended to annotate exome-sequencing derived variants.
Exome sequencing experiments produce a great number of variants that do not cause a patient's disease. One of the biggest challenges in exome sequencing experiments is sorting through these false positives to discover the true disease-causing variants. We have developed several techniques to aid in the reduction of these errors. The techniques described include: 1) the construction of a catalog of systematic errors by reprocessing thousands of publically available exomes, 2) a tool for the filtering of variants based on family structure and disease assumptions, and 3) a tool for discovering regions of autozygosity from the exomes of several affected patients in consanguineous pedigrees.
Classes of variants that are undiscoverable using current analysis techniques gives rise to false negatives in exome sequencing experiments. We will present a tool, the Retrotransposon Insertion Detector for Exomes (RIDE) that uses the characteristic anomalies present in sequence alignments to detect the insertion of repetitive elements.
The process of identifying a the cause of a patient's disease using exome sequencing data has been equated to finding a needle in a stack of needles. Only through the proper annotation of variants and the reduction of the error rates associated with exome sequencing experiments can this task be achieved in an efficient manner.