Document Type


Date of Degree

Fall 2016

Degree Name

MS (Master of Science)

Degree In

Electrical and Computer Engineering

First Advisor

Canahuate, Guadalupe

First Committee Member

Kuhl, Jon

Second Committee Member

Andersen, David


Vehicular crashes are the leading cause of death for young adult drivers, however, very little life course research focuses on drivers in their 20s. Moreover, most data analyses of crash data are limited to simple correlation and regression analysis. This thesis proposes a data-driven approach and usage of machine-learning techniques to further enhance the quality of analysis.

We examine over 10 years of data from the Iowa Department of Transportation by transforming all the data into a format suitable for data analysis. From there, the ages of drivers present in the crash are discretized depending on the ages of drivers present for better analysis. In doing this, we hope to better discover the relationship between driver age and factors present in a given crash.

We use machine learning algorithms to determine important attributes for each age group with the goal of improving predictivity of individual methods. The general format of this thesis follows a Knowledge Discovery workflow, preprocessing and transforming the data into a usable state, from which we perform data mining to discover results and produce knowledge.

We hope to use this knowledge to improve the predictivity of different age groups of drivers with around 60 variables for most sets as well as 10 variables for some. We also explore future directions this data could be analyzed in.

Public Abstract

This thesis proposes a data-driven approach and usage of machine-learning techniques to further enhance the quality of analysis of car crash data analysis.

This thesis examines car crash data by looking at the different aspects of each crash. We divide the crashes into 6 different groups depending on the ages of drivers involved and attempt to determine important features of each group as a result of this. In doing this, we hope to make clear what factors lead to crashes in different age groups and work to avoid them.

This data could then be potentially used for the benefit of automakers, insurance companies, the trucking industry, and individual consumers. Perhaps having more insight might allow travel to become safer for everyone.


Car Crashes, Data Analysis, Knowledge Discovery, Vehicles


vii, 39 pages


Includes bibliographical references (page 39).


Copyright © 2016 John Dietrich Tollefson