DOI

10.17077/etd.8nsf-31t1

Document Type

Dissertation

Date of Degree

Summer 2019

Degree Name

PhD (Doctor of Philosophy)

Degree In

Computer Science

First Advisor

Oliveira, Suely

Second Advisor

Malkova, Anna

First Committee Member

Ghosh, Sukumar

Second Committee Member

Stewart, David

Third Committee Member

Chipara, Octav

Abstract

All cellular forms of life contain Deoxyribonucleic acid (DNA). DNA is a molecule that carries all the information necessary to perform both, basic and complex cellular functions. DNA is replicated to form new tissue/organs, and to pass genetic information to future generations. DNA replication ideally yield an exact copy of the original DNA. While replication generally occurs without error, it may leave DNA vulnerable to accidental changes via mistakes made during the replication process. Those changes are called mutations. Mutations range in magnitude. Yet, mutations of any magnitude range in consequences, from no effect on the organism, to disease initiation (e.g. cancer), or even death.

In this thesis, we limit our focus to mutations in human DNA, and in particular, MMBIR mutations. Recent literature in human genomics has found Microhomology-mediated break-induced replication (MMBIR) to be a common mechanism producing complex mutations in DNA. MMBIRFinder is a tool to detect MMBIR regions in Yeast DNA. Although MMBIRFinder is successful on Yeast DNA, MMBIRFinder is not capable of detecting MMBIR mutations in human DNA. Among several reasons, one major reason for its deficiency with human DNA is the amount of computations required to process human large data. Our contribution in this regard is two fold:

1) We utilize parallel computations to significantly reduce the processing time consumed by the original MMBIFinder, and address several performance degrading issues inherent in the original design;

2) We introduce a new heuristic to detect MMBIR mutations that were not detected by the original MMBIRFinder, even in the case of small sized DNA, like Yeast DNA.

Keywords

DNA, Large Data, MMBIR, Mutations, Parallel Programming

Pages

xi, 96 pages

Bibliography

Includes bibliographical references (pages 91-96).

Copyright

Copyright © 2019 Thamer Alsulaiman

Share

COinS