Document Type

Dissertation

Date of Degree

Summer 2011

Degree Name

PhD (Doctor of Philosophy)

Degree In

Statistics

First Advisor

Jian Huang

Second Advisor

Kai Wang

Abstract

Penalized regression methods are becoming increasingly popular in genome-wide association studies (GWAS) for identifying genetic markers associated with disease. However, standard penalized methods such as the LASSO do not take into account the possible linkage disequilibrium between adjacent markers. We propose a novel penalized approach for GWAS using a dense set of single nucleotide polymorphisms (SNPs). The proposed method uses the minimax concave penalty (MCP) for marker selection and incorporates linkage disequilibrium (LD) information by penalizing the difference of the genetic effects at adjacent SNPs with high correlation. A coordinate descent algorithm is derived to implement the proposed method. This algorithm is efficient and stable in dealing with a large number of SNPs. A multi-split method is used to calculate the p-values of the selected SNPs for assessing their significance. We refer to the proposed penalty function as the smoothed MCP (SMCP) and the proposed approach as the SMCP method. Performance of the proposed SMCP method and its comparison with a LASSO approach are evaluated through simulation studies, which demonstrate that the proposed method is more accurate in selecting associated SNPs. Its applicability to real data is illustrated using data from a GWAS on rheumatoid arthritis. Based on the idea of SMCP, we propose a new penalized method for group variable selection in GWAS with respect to the correlation between adjacent groups. The proposed method uses the group LASSO for encouraging group sparsity and a quadratic difference for adjacent group smoothing. We call it smoothed group LASSO, or SGL for short. Canonical correlations between two adjacent groups of SNPS are used as the weights in the quadratic difference penalty. Principal components are used to reduced dimensionality locally within groups. We derive a group coordinate descent algorithm for computing the solution path of the SGL. Simulation studies are used to evaluate the finite sample performance of the SGL and group LASSO. We also demonstrate its applicability on rheumatoid arthritis data.

Keywords

GWAS, Linkage disequilibrium, Penalized regression

Pages

ix, 150 pages

Bibliography

Includes bibliographical references (pages 92-95).

Copyright

Copyright 2011 Jin Liu

Share

COinS