Document Type

Dissertation

Date of Degree

Spring 2012

Degree Name

PhD (Doctor of Philosophy)

Degree In

Biostatistics

First Advisor

Jian Huang

Abstract

A family of concave penalties, including the smoothly clipped absolute deviation (SCAD) and minimax concave penalties (MCP), has been shown to have attractive properties in variable selection. The computation of concave penalized solutions, however, is a difficult task. We propose a majorization minimization by coordinate descent (MMCD) algorithm to compute the solutions of concave penalized generalized linear models (GLM). In contrast to the existing algorithms that uses local quadratic or local linear approximation of the penalty, the MMCD majorizes the negative log-likelihood by a quadratic loss, but does not use any approximation to the penalty. This strategy avoids the computation of scaling factors in iterative steps, hence improves the efficiency of coordinate descent. Under certain regularity conditions, we establish the theoretical convergence property of the MMCD algorithm. We implement this algorithm in a penalized logistic regression model using the SCAD and MCP penalties. Simulation studies and a data example demonstrate that the MMCD works sufficiently fast for the penalized logistic regression in high-dimensional settings where the number of covariates is much larger than the sample size. Grouping structure among predictors exists in many regression applications. We first propose an l2 grouped concave penalty to incorporate such group information in a regression model. The l2 grouped concave penalty performs group selection and includes group Lasso as a special case. An efficient algorithm is developed and its theoretical convergence property is established under certain regularity conditions. The group selection property of the l2 grouped concave penalty is desirable in some applications; while in other applications selection at both group and individual levels is needed. Hence, we propose an l1 grouped concave penalty for variable selection at both individual and group levels. An efficient algorithm is also developed for the l1 grouped concave penalty. Simulation studies are performed to evaluate the finite-sample performance of the two grouped concave selection methods. The new grouped penalties are also used in analyzing two motivation datasets. The results from both the simulation and real data analyses demonstrate certain benefits of using grouped penalties. Therefore, the proposed concave group penalties are valuable alternatives to the standard concave penalties.

Keywords

concave penalty, generalized linear model, high dimentional data, MCP, SCAD, variable selection

Pages

ix, 89 pages

Bibliography

Includes bibliographical references (pages 87-89).

Copyright

Copyright 2012 Dingfeng Jiang

Included in

Biostatistics Commons

Share

COinS