Date of Degree

2009

Document Type

PhD diss.

Degree Name

PhD (Doctor of Philosophy)

Department

Biostatistics

First Advisor

Jian Huang

Abstract

Many traditional approaches cease to be useful when the number of variables is large in comparison with the sample size. Penalized regression methods have proved to be an attractive approach, both theoretically and empirically, for dealing with these problems. This thesis focuses on the development of penalized regression methods for high-dimensional variable selection. The first part of this thesis deals with problems in which the covariates possess a grouping structure that can be incorporated into the analysis to select important groups as well as important members of those groups. I introduce a framework for grouped penalization that encompasses the previously proposed group lasso and group bridge methods, sheds light on the behavior of grouped penalties, and motivates the proposal of a new method, group MCP.

The second part of this thesis develops fast algorithms for fitting models with complicated penalty functions such as grouped penalization methods. These algorithms combine the idea of local approximation of penalty functions with recent research into coordinate descent algorithms to produce highly efficient numerical methods for fitting models with complicated penalties. Importantly, I show these algorithms to be both stable and linear in the dimension of the feature space, allowing them to be efficiently scaled up to very large problems.

In the third part of this thesis, I extend the idea of false discovery rates to penalized regression. The Karush-Kuhn-Tucker conditions describing penalized regression estimates provide testable hypotheses involving partial residuals. I use these hypotheses to connect the previously disparate elds of multiple comparisons and penalized regression, develop estimators for the false discovery rates of methods such as the lasso and elastic net, and establish theoretical results.

Finally, the methods from all three sections are studied in a number of simulations and applied to real data from gene expression and genetic association studies.

Pages

viii, 64

Bibliography

63-64

Copyright

Copyright 2009 Patrick John Breheny

Included in

Biostatistics Commons

Share

COinS