Document Type

Dissertation

Date of Degree

Summer 2017

Degree Name

PhD (Doctor of Philosophy)

Degree In

Applied Mathematical and Computational Sciences

First Advisor

Kai Wang

Abstract

Genome-wide association studies (GWAS) has played an import role in identifying genetic variants underlying human complex traits. However, its success is hindered by weak effect at causal variants and noise at non-causal variants. Penalized regression can be applied to handle GWAS problems. GWAS data has some specificities. Consecutive genetic markers are usually highly correlated due to linkage disequilibrium.

This thesis introduces a moving-window penalized method for GWAS which smooths the effects of consecutive SNPs. Simulation studies indicate that this penalized moving window method provides improved true positive findings. The practical utility of the proposed method is demonstrated by applying it to Genetic Analysis Workshop 16 Rheumatoid Arthritis data.

Next, the moving-window penalty is applied on generalized linear model. We call such an approach as smoothed lasso (SLasso). Coordinate descent computing algorithms are proposed in details, for both quadratic and logistic loss. Asymptotic properties are discussed. Then based on SLasso, we discuss a two-stage method called MW-Ridge. Simulation results show that while SLasso can provide more true positive findings than Lasso, it has a side-effect that it includes more unrelated random noises. MW-Ridge can eliminate such a side-effect and result in high true positive rates and low false detective rates. The applicability to real data is illustrated by using GAW 16 Rheumatoid Arthritis data.

The SLasso and MW-Ridge approaches are then generalized to multivariate response data. The multivariate response data can be transformed into univariate response data. The causal variants are not required to be the same for different response variables. We found that no matter how the causal variants are matched, being fully matched or 60% matched, MW-Ridge can always over perform Lasso by detecting all true positives with lower false detective rates.

Keywords

feature selection, genome-wide association studies, regularized regression

Pages

xii, 89 pages

Bibliography

Includes bibliographical references (pages 87-89).

Copyright

Copyright © 2017 Minli Bao

Share

COinS