Date of Degree
PhD (Doctor of Philosophy)
John M. Logsdon, Jr.
The transcriptome represents the entirety of RNA molecules within a cell or tissue at a given time. Recent advances have facilitated the production of large-scale, global interrogations of transcriptomes, finding that genomes are extensively transcribed and contain diverse classes of RNAs (Dinger et al., 2009). Information generated by high-throughput analyses of mRNA transcription start sites (TSSs) such as CAGE (Cap Analysis of Gene Expression) indicate that eukaryotic genomes have complex landscapes of transcription initiation. The TSS is important for the annotation of cis-regulatory sequences, because it provides a link between the mRNA transcript and the promoter. The patterns of TSS distributions observed within mRNA 5' end profiling studies prevent straightforward annotation of putative promoters.
To address this challenge, we developed a method to identify- on a genome-wide basis- the putative promoter, which we define by TSS distributions and designate the transcription start region (TSR). We applied a clustering method to identify and annotate TSRs within the budding yeast Saccharomyces cerevisiae using a full-length cDNA dataset (Miura et al., 2006). To validate these TSR annotations, we performed an integrative genomic analysis using multiple datasets. Our method identified TSRs at positions consistent with bona fide promoters in S. cerevisiae. In addition, using 5'RACE, we find overall agreement between computationally-defined TSRs and TSSs identified experimentally. From this analysis, we find that a significant proportion of genes exhibiting alternative promoter usage within sporulation are associated with respiration, suggesting that this is regulated on a condition-specific basis in budding yeast.
We further developed our TSS clustering method into a bioinformatics tool called TSRchitect, which identifies and annotates TSRs from large-scale TSS profiling information. TSRchitect is capable of handling both tag and sequence-based TSS information and efficiently computes TSRs from global TSS datasets on a desktop computer. We find support for TSRchitect's annotations in human from a CAGE experiment from the ENCODE (Encyclopedia of DNA Elements) project.
Finally, we use TSRchitect to identify TSRs from the transcriptomes of diverse eukaryotes. We investigated the conservation of TSRs among orthologous genes. We frequently identify multiple TSRs for a given gene, suggesting that alternative promoter usage is widespread. Overall, using TSS profiling data derived from separate tissues within mouse and human, we find that the positions of TSRs are relatively stable across tissues surveyed; however, a small fraction of genes exhibit tissue-specific differences in TSR use.
As transcriptome profiling information continues to be generated at an rapid pace, computational approaches are increasingly important. It is anticipated that the method and approach we describe within this dissertation will contribute to an improved of gene regulation and promoter architecture in eukaryotes.
Bioinformatics, Comparative Genomics, Promoter annotation, Transcription Initiation, Transcription Start Site, Transcriptome
xxx, 258 pages
Includes bibliographical references (pages 177-205).
Copyright 2012 R. Taylor Raborn