DOI

10.17077/etd.n7cwkl17

Document Type

Dissertation

Date of Degree

Summer 2018

Degree Name

PhD (Doctor of Philosophy)

Degree In

Biostatistics

First Advisor

Patrick Breheny

First Committee Member

Joe Cavanaugh

Second Committee Member

Mike Jones

Third Committee Member

Yuan Huang

Fourth Committee Member

Aixin Tan

Abstract

Data containing large number of variables is becoming increasingly more common and sparsity inducing penalized regression methods, such the lasso, have become a popular analysis tool for these datasets due to their ability to naturally perform variable selection. However, quantifying the importance of the variables selected by these models is a difficult task. These difficulties are compounded by the tendency for the most predictive models, for example those which were chosen using procedures like cross-validation, to include substantial amounts of noise variables with no real relationship with the outcome. To address the task of performing inference on penalized regression models, this thesis proposes false discovery rate approaches for a broad class of penalized regression models. This work includes the development of an upper bound for the number of noise variables in a model, as well as local false discovery rate approaches that quantify the likelihood of each individual selection being a false discovery. These methods are applicable to a wide range of penalties, such as the lasso, elastic net, SCAD, and MCP; a wide range of models, including linear regression, generalized linear models, and Cox proportional hazards models; and are also extended to the group regression setting under the group lasso penalty. In addition to studying these methods using numerous simulation studies, the practical utility of these methods is demonstrated using real data from several high-dimensional genome wide association studies.

Keywords

elastic net, false discovery rate, high dimensional data, inference, lasso, penalized regression

Pages

xiii, 122 pages

Bibliography

Includes bibliographical references (pages 116-122).

Copyright

Copyright © 2018 Ryan Miller

Included in

Biostatistics Commons

Share

COinS