Document Type

Dissertation

Date of Degree

Fall 2016

Degree Name

PhD (Doctor of Philosophy)

Degree In

Mathematics

First Advisor

Isabel K. Darcy

Abstract

In this thesis we apply the idea of a barcode from persistent homology to four hierarchical clustering methods: single, average, complete, and Ward's linkage. Desirable theoretical properties of dendrograms, the standard tool to visualize the output of hierarchical clustering methods, were described by Carlsson. We define analogous properties for hierarchical clustering quasi-barcodes and prove that average and complete quasi-barcodes possess a property that dendrograms do not.

We discuss how to decide where to "cut" the output of hierarchical clustering quasi-barcodes based on the distance between the heights at which clusters merge. We find the best possible matching for calculating the Wasserstein distance between quasi-barcodes built from the same number of data points all born at time 0. We also prove that single, average, and complete quasi-barcodes are stable in the sense that small perturbations in distances between points produce small changes in quasi-barcodes.

In order to test the efficiency of quasi-barcodes and the cut-off criteria, we generate datasets of points arranged in blobs or concentric circles and look whether the combination of the quasi-barcode with the cut-off criteria successfully finds the right amount of clusters in the dataset and whether it places points in the correct clusters. Finally, we apply these tools to datasets from New York University and Peking University of typically developed controls and attention hyperactivity deficit disorder subjects between the ages of 7 and 18.

Keywords

barcode, brain networks, persistence diagram, quasi-barcode

Pages

xvi, 176 pages

Bibliography

Includes bibliographical references (pages 172-176).

Copyright

Copyright © 2016 Leyda Michelle Almodóvar Velázquez

Additional Files

READMEscripts.txt (4 kB)

Included in

Mathematics Commons

Share

COinS