Document Type


Date of Degree

Summer 2009

Degree Name

PhD (Doctor of Philosophy)

Degree In


First Advisor

Jian Huang

First Committee Member

Joseph Cavanaugh

Second Committee Member

Michael Jones

Third Committee Member

Kai Wang

Fourth Committee Member

Dale Zimmerman


Many traditional approaches cease to be useful when the number of variables is large in comparison with the sample size. Penalized regression methods have proved to be an attractive approach, both theoretically and empirically, for dealing with these problems. This thesis focuses on the development of penalized regression methods for high-dimensional variable selection. The first part of this thesis deals with problems in which the covariates possess a grouping structure that can be incorporated into the analysis to select important groups as well as important members of those groups. I introduce a framework for grouped penalization that encompasses the previously proposed group lasso and group bridge methods, sheds light on the behavior of grouped penalties, and motivates the proposal of a new method, group MCP.

The second part of this thesis develops fast algorithms for fitting models with complicated penalty functions such as grouped penalization methods. These algorithms combine the idea of local approximation of penalty functions with recent research into coordinate descent algorithms to produce highly efficient numerical methods for fitting models with complicated penalties. Importantly, I show these algorithms to be both stable and linear in the dimension of the feature space, allowing them to be efficiently scaled up to very large problems.

In the third part of this thesis, I extend the idea of false discovery rates to penalized regression. The Karush-Kuhn-Tucker conditions describing penalized regression estimates provide testable hypotheses involving partial residuals. I use these hypotheses to connect the previously disparate elds of multiple comparisons and penalized regression, develop estimators for the false discovery rates of methods such as the lasso and elastic net, and establish theoretical results.

Finally, the methods from all three sections are studied in a number of simulations and applied to real data from gene expression and genetic association studies.


coordinate descent, false discovery rate, group lasso, lasso, penalized regression


viii, 64 pages


Includes bibliographical references (pages 63-64).


Copyright 2009 Patrick John Breheny

Included in

Biostatistics Commons