Date of Degree
PhD (Doctor of Philosophy)
Isabel K. Darcy
First Committee Member
Second Committee Member
Third Committee Member
Fourth Committee Member
In this thesis we apply the idea of a barcode from persistent homology to four hierarchical clustering methods: single, average, complete, and Ward's linkage. Desirable theoretical properties of dendrograms, the standard tool to visualize the output of hierarchical clustering methods, were described by Carlsson. We define analogous properties for hierarchical clustering quasi-barcodes and prove that average and complete quasi-barcodes possess a property that dendrograms do not.
We discuss how to decide where to "cut" the output of hierarchical clustering quasi-barcodes based on the distance between the heights at which clusters merge. We find the best possible matching for calculating the Wasserstein distance between quasi-barcodes built from the same number of data points all born at time 0. We also prove that single, average, and complete quasi-barcodes are stable in the sense that small perturbations in distances between points produce small changes in quasi-barcodes.
In order to test the efficiency of quasi-barcodes and the cut-off criteria, we generate datasets of points arranged in blobs or concentric circles and look whether the combination of the quasi-barcode with the cut-off criteria successfully finds the right amount of clusters in the dataset and whether it places points in the correct clusters. Finally, we apply these tools to datasets from New York University and Peking University of typically developed controls and attention hyperactivity deficit disorder subjects between the ages of 7 and 18.
Data analysis is the process of systematically applying mathematical, statistical and/or logical techniques to describe and evaluate data, detect patterns, develop explanations and test hypotheses. In this thesis we explore theoretical as well as practical properties of hierarchical clustering quasi-barcodes, a visualization tool that shows a simplified output of hierarchical clustering methods, for comparing datasets with equal number of data points. Particularly, we are interested in finding the most efficient methods to compare brain networks. We apply single, average, complete, and Ward's linkage hierarchical clustering quasi-barcodes to compare datasets of typically developed controls (TDC) and Attention Deficit Hyperactivity Disorder (ADHD) subjects between the ages of 7 and 18. Due to the small sample size of the datasets, we present our results as a proof of concept.
barcode, brain networks, persistence diagram, quasi-barcode
xvi, 176 pages
Includes bibliographical references (pages 172-176).
Copyright © 2016 Leyda Michelle Almodóvar Velázquez
Almodóvar Velázquez, Leyda Michelle. "Studying brain networks via topological data analysis and hierarchical clustering." PhD (Doctor of Philosophy) thesis, University of Iowa, 2016.