Document Type


Date of Degree

Fall 2016

Degree Name

MS (Master of Science)

Degree In

Electrical and Computer Engineering

First Advisor

Guadalupe Canahuate

First Committee Member

Jon Kuhl

Second Committee Member

David Andersen


Vehicular crashes are the leading cause of death for young adult drivers, however, very little life course research focuses on drivers in their 20s. Moreover, most data analyses of crash data are limited to simple correlation and regression analysis. This thesis proposes a data-driven approach and usage of machine-learning techniques to further enhance the quality of analysis.

We examine over 10 years of data from the Iowa Department of Transportation by transforming all the data into a format suitable for data analysis. From there, the ages of drivers present in the crash are discretized depending on the ages of drivers present for better analysis. In doing this, we hope to better discover the relationship between driver age and factors present in a given crash.

We use machine learning algorithms to determine important attributes for each age group with the goal of improving predictivity of individual methods. The general format of this thesis follows a Knowledge Discovery workflow, preprocessing and transforming the data into a usable state, from which we perform data mining to discover results and produce knowledge.

We hope to use this knowledge to improve the predictivity of different age groups of drivers with around 60 variables for most sets as well as 10 variables for some. We also explore future directions this data could be analyzed in.


Car Crashes, Data Analysis, Knowledge Discovery, Vehicles


vii, 39 pages


Includes bibliographical references (page 39).


Copyright © 2016 John Dietrich Tollefson