DOI

10.17077/etd.37hw-o12m

Document Type

Thesis

Date of Degree

Spring 2019

Degree Name

MS (Master of Science)

Degree In

Electrical and Computer Engineering

First Advisor

Casavant, Thomas L.

First Committee Member

Casavant, Thomas L.

Second Committee Member

Braun, Terry

Third Committee Member

Canahuate, Guadalupe

Abstract

In recent years, more data is becoming available for historical oncology case analysis. A large dataset that describes over 500 patient cases of Head and Neck Squamous Cell Carcinoma is a potential goldmine for finding ways to improve oncological decision support. Unfortunately, the best approaches for finding useful inferences are unknown. With so much information, from DNA and RNA sequencing to clinical records, we must use computational learning to find associations and biomarkers.

The available data has sparsity, inconsistencies, and is very large for some datatypes. We processed clinical records with an expert oncologist and used complex modeling methods to substitute (impute) data for cases missing treatment information. We used machine learning algorithms to see if imputed data is useful for predicting patient survival. We saw no difference in ability to predict patient survival with the imputed data, though imputed treatment variables were more important to survival models.

To deal with the large number of features in RNA expression data, we used two approaches: using all the data with High Performance Computers, and transforming the data into a smaller set of features (sparse principal components, or SPCs). We compared the performance of survival models with both datasets and saw no differences. However, the SPC models trained more quickly while also allowing us to pinpoint the biological processes each SPC is involved in to inform future biomarker discovery.

We also examined ten processed molecular features for survival prediction ability and found some predictive power, though not enough to be clinically useful.

Keywords

cancer, dimensionality reduction, head and neck cancer, imputation, machine learning, mutation significance

Pages

viii, 30 pages

Bibliography

Includes bibliographical references (pages 29-30).

Copyright

Copyright © 2019 Michael Rendleman

Share

COinS