Date of Degree
PhD (Doctor of Philosophy)
Isabel K. Darcy
In this thesis we apply the idea of a barcode from persistent homology to four hierarchical clustering methods: single, average, complete, and Ward's linkage. Desirable theoretical properties of dendrograms, the standard tool to visualize the output of hierarchical clustering methods, were described by Carlsson. We define analogous properties for hierarchical clustering quasi-barcodes and prove that average and complete quasi-barcodes possess a property that dendrograms do not.
We discuss how to decide where to "cut" the output of hierarchical clustering quasi-barcodes based on the distance between the heights at which clusters merge. We find the best possible matching for calculating the Wasserstein distance between quasi-barcodes built from the same number of data points all born at time 0. We also prove that single, average, and complete quasi-barcodes are stable in the sense that small perturbations in distances between points produce small changes in quasi-barcodes.
In order to test the efficiency of quasi-barcodes and the cut-off criteria, we generate datasets of points arranged in blobs or concentric circles and look whether the combination of the quasi-barcode with the cut-off criteria successfully finds the right amount of clusters in the dataset and whether it places points in the correct clusters. Finally, we apply these tools to datasets from New York University and Peking University of typically developed controls and attention hyperactivity deficit disorder subjects between the ages of 7 and 18.
barcode, brain networks, persistence diagram, quasi-barcode
Copyright © 2016 Leyda Michelle Almodóvar Velázquez