Document Type

Dissertation

Date of Degree

2012

Degree Name

PhD (Doctor of Philosophy)

Degree In

Biostatistics

First Advisor

Jian Huang

Second Advisor

Yi Xing

Abstract

Ultra-deep RNA sequencing has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We develop MATS (Multivariate Analysis of Transcript Splicing), a Bayesian statistical framework for flexible hypothesis testing of differential alternative splicing patterns on RNA-Seq data. MATS uses a multivariate uniform prior to model the between-sample correlation in exon splicing patterns, and a Markov chain Monte Carlo (MCMC) method coupled with a simulation-based adaptive sampling procedure to calculate the P value and false discovery rate (FDR) of differential alternative splicing. Importantly, the MATS approach is applicable to almost any type of null hypotheses of interest, providing the flexibility to identify differential alternative splicing events that match a given user-defined pattern. We evaluated the performance of MATS using simulated and real RNA-Seq data sets. In the RNA-Seq analysis of alternative splicing events regulated by the epithelial-specific splicing factor ESRP1, we obtained a high RT-PCR validation rate of 86% for differential alternative splicing events with a MATS FDR of < 10%. Additionally, over the full list of RT-PCR tested exons, the MATS FDR estimates matched well with the experimental validation rate. Our results demonstrate that MATS is an effective and flexible approach for detecting differential alternative splicing from RNA-Seq data.

Keywords

False Discovery Rate, iFDR, MATS, Multivariate Analysis of Transcript Splicing, rMATS, RNA-Seq

Pages

ix, 88 pages

Bibliography

Includes bibliographical references (pages 84-88).

Copyright

Copyright 2012 Shihao Shen

Included in

Biostatistics Commons

Share

COinS