Document Type


Date of Degree

Spring 2019

Access Restrictions

Access restricted until 07/29/2021

Degree Name

MS (Master of Science)

Degree In

Electrical and Computer Engineering

First Advisor

Canahuate, Guadalupe

First Committee Member

Johnson, Hans

Second Committee Member

Casavant, Thomas


Oropharyngeal Cancer diagnoses make up three percent of all cancer diagnoses in the United States per year. Recently, there has been an increase in the incidence of HPV-associated oropharyngeal cancer, necessitating updates to prior survival estimation techniques, in order to properly account for this shift in demographic. Clinicians depend on accurate survival prognosis estimates in order to create successful treatment plans that aim to maximize patient life while minimizing adverse treatment side effects. Additionally, recent advances in data analysis have resulted in richer and more complex data, motivating the use of more advanced data analysis techniques. Incorporation of sophisticated survival analysis techniques can leverage complex data, from a variety of sources, resulting in improved personalized prediction. Current survival prognosis prediction methods often rely on summary statistics and underlying assumptions regarding distribution or overall risk.

We propose a k-nearest neighbor influenced approach for predicting oropharyngeal survival outcomes. We evaluate our approach for overall survival (OS), recurrence-free survival (RFS), and recurrence-free overall survival (RF+OS). We define two distance functions, not subject to the curse of dimensionality, in order to reconcile heterogeneous features with patient-to-patient similarity scores to produce a meaningful overall measure of distance. Using these distance functions, we obtain the k-nearest neighbors for each patient, forming neighborhoods of similar patients. We leverage these neighborhoods for prediction in two novel ensemble methods. The first ensemble method uses the nearest neighbors for each patient to combine globally trained predictions, weighted by their accuracies within a selected neighborhood. The second ensemble method combines Kaplan-Meier predictions from a variety of neighborhoods. Both proposed methods outperform an ensemble of standard global survival predictive models, with statistically significant calibration.


k nearest neighbors, local prediction, Machine learning, oropharyngeal, QED


ix, 47 pages


Includes bibliographical references (pages 42-47).


Copyright © 2019 Keegan P. Shay

Available for download on Thursday, July 29, 2021