DOI

10.17077/etd.zs7gxqu6

Document Type

Dissertation

Date of Degree

Fall 2016

Degree Name

PhD (Doctor of Philosophy)

Degree In

Mathematics

First Advisor

Isabel K. Darcy

First Committee Member

Colleen Mitchell

Second Committee Member

Keith Stroyan

Third Committee Member

Victor Camillo

Fourth Committee Member

Cynthia Farthing

Abstract

In this thesis we apply the idea of a barcode from persistent homology to four hierarchical clustering methods: single, average, complete, and Ward's linkage. Desirable theoretical properties of dendrograms, the standard tool to visualize the output of hierarchical clustering methods, were described by Carlsson. We define analogous properties for hierarchical clustering quasi-barcodes and prove that average and complete quasi-barcodes possess a property that dendrograms do not.

We discuss how to decide where to "cut" the output of hierarchical clustering quasi-barcodes based on the distance between the heights at which clusters merge. We find the best possible matching for calculating the Wasserstein distance between quasi-barcodes built from the same number of data points all born at time 0. We also prove that single, average, and complete quasi-barcodes are stable in the sense that small perturbations in distances between points produce small changes in quasi-barcodes.

In order to test the efficiency of quasi-barcodes and the cut-off criteria, we generate datasets of points arranged in blobs or concentric circles and look whether the combination of the quasi-barcode with the cut-off criteria successfully finds the right amount of clusters in the dataset and whether it places points in the correct clusters. Finally, we apply these tools to datasets from New York University and Peking University of typically developed controls and attention hyperactivity deficit disorder subjects between the ages of 7 and 18.

Public Abstract

Data analysis is the process of systematically applying mathematical, statistical and/or logical techniques to describe and evaluate data, detect patterns, develop explanations and test hypotheses. In this thesis we explore theoretical as well as practical properties of hierarchical clustering quasi-barcodes, a visualization tool that shows a simplified output of hierarchical clustering methods, for comparing datasets with equal number of data points. Particularly, we are interested in finding the most efficient methods to compare brain networks. We apply single, average, complete, and Ward's linkage hierarchical clustering quasi-barcodes to compare datasets of typically developed controls (TDC) and Attention Deficit Hyperactivity Disorder (ADHD) subjects between the ages of 7 and 18. Due to the small sample size of the datasets, we present our results as a proof of concept.

Keywords

barcode, brain networks, persistence diagram, quasi-barcode

Pages

xvi, 176 pages

Bibliography

Includes bibliographical references (pages 172-176).

Copyright

Copyright © 2016 Leyda Michelle Almodóvar Velázquez

Additional Files

READMEscripts.txt (4 kB)

Included in

Mathematics Commons

Share

COinS