DOI

10.17077/etd.vz9ksva3

Document Type

Dissertation

Date of Degree

Fall 2016

Access Restrictions

Access restricted until 02/23/2019

Degree Name

PhD (Doctor of Philosophy)

Degree In

Molecular and Cell Biology

First Advisor

John F. Engelhardt

First Committee Member

Terry A Braun

Second Committee Member

Robert A Cornell

Third Committee Member

Adam J Dupuy

Fourth Committee Member

Paul B McCray

Fifth Committee Member

James O McNamara

Abstract

Several challenges face bioinformaticians on a regular basis. One of these is unsupervised clustering. In RNA sequencing (RNAseq), this may come in the form of blindly sequencing single cells without a priori knowledge of the cell types being sequenced. Here we create new methods to address this problem that show increased accuracy and speed compared to competing methods. We also have developed a methodology for discovering non-parametric networks which represent relationships between the variables that have been measured across samples. In the context of RNAseq, this is the expression relationships between genes (for example a positive or negative Spearman correlation). We have packaged these techniques into a software tool called PyMINEr. We show the implementation of PyMINEr here in the analysis of single cell RNAseq (scRNAseq), and integrate this dataset with others to yield novel insights to the signaling networks among within and between pancreatic islet cell types. Additionally we used this data to predict the cell type specific importance of Type 2 Diabetes (T2D) single nucleotide polymorphisms (SNPs). Lastly we have demonstrated the use of PyMINEr’s analytic techniques in discovering genetic circuitry underlying the transcriptional networks of two transcription factors (NeuroD1 and Pdx1) in beta cells. We utilized a RNA interference to modulate the expression of these transcription factors in a beta cell line (MIN6), and observe the changes in the transcriptome over time. We used this data to generate graph network models of transcription and integrated them with ChIP-seq of these transcription factors; this enabled annotation of the functional binding sites of these transcription factors. Furthermore, this approach has enabled the discovery of regulators of beta and alpha cell identity. Overall, we have developed novel informatics methods which can be applied to complex datasets to guide bench experiments towards to discovery of molecular signaling networks.

Public Abstract

Finding patterns in large datasets has been a goal of machine learning experts and data scientists since the inception of their respective fields. Biologists are now entering a phase of research in which large datasets have become commonplace, however, expertise in both of these areas is often not available. Here we aimed to bridge this divide by creating a publically available tool called PyMINEr. One area of biology which poses a complex data analysis problem is called single cell RNAseq (scRNAseq). Using this technique, we obtain a snapshot of the expression of all genes in individual cells. This technologic breakthrough gives biologists an unprecedented look at the basic unit of life - the cell. To develop a better understanding of cellular function in the human pancreas, we examined cultured islets isolated from human pancreata by scRNAseq, and developed methods for analyzing this data. The human pancreas plays a central role in diabetes, a disease whose health and financial consequences have been growing to extraordinary heights. Cells play a large role in feeding and metabolism. These cells also communicate with each other to regulate each other’s function. The complexity of this communication is substantial, and largely enigmatic; however, scRNAseq enabled us to generate a catalog of all known cell to cell communication pathways, for the first time producing a roadmap to identifying the full network of how these cells communicate with each other. The techniques presented in PyMINEr will be applicable to nearly all areas within and beyond science.

Keywords

bioinformatics, graph theory, islets, molecular biology

Pages

xii, 140 pages

Bibliography

Includes bibliographical references (pages 136-140).

Comments

This thesis has been optimized for improved web viewing. If you require the original version, contact the University Archives at the University of Iowa: http://www.lib.uiowa.edu/sc/contact/

Copyright

Copyright © 2016 Scott Robert Tyler

Available for download on Saturday, February 23, 2019

Included in

Cell Biology Commons

Share

COinS