DOI
10.17077/etd.inch3pbi
Document Type
Dissertation
Date of Degree
Summer 2017
Degree Name
PhD (Doctor of Philosophy)
Degree In
Applied Mathematical and Computational Sciences
First Advisor
Wang, Kai
First Committee Member
Breheny, Patrick
Second Committee Member
Han, Weimin
Third Committee Member
Jorgensen, Palle
Fourth Committee Member
Khurana, Surjit
Abstract
Genome-wide association studies (GWAS) has played an import role in identifying genetic variants underlying human complex traits. However, its success is hindered by weak effect at causal variants and noise at non-causal variants. Penalized regression can be applied to handle GWAS problems. GWAS data has some specificities. Consecutive genetic markers are usually highly correlated due to linkage disequilibrium.
This thesis introduces a moving-window penalized method for GWAS which smooths the effects of consecutive SNPs. Simulation studies indicate that this penalized moving window method provides improved true positive findings. The practical utility of the proposed method is demonstrated by applying it to Genetic Analysis Workshop 16 Rheumatoid Arthritis data.
Next, the moving-window penalty is applied on generalized linear model. We call such an approach as smoothed lasso (SLasso). Coordinate descent computing algorithms are proposed in details, for both quadratic and logistic loss. Asymptotic properties are discussed. Then based on SLasso, we discuss a two-stage method called MW-Ridge. Simulation results show that while SLasso can provide more true positive findings than Lasso, it has a side-effect that it includes more unrelated random noises. MW-Ridge can eliminate such a side-effect and result in high true positive rates and low false detective rates. The applicability to real data is illustrated by using GAW 16 Rheumatoid Arthritis data.
The SLasso and MW-Ridge approaches are then generalized to multivariate response data. The multivariate response data can be transformed into univariate response data. The causal variants are not required to be the same for different response variables. We found that no matter how the causal variants are matched, being fully matched or 60% matched, MW-Ridge can always over perform Lasso by detecting all true positives with lower false detective rates.
Keywords
feature selection, genome-wide association studies, regularized regression
Pages
xii, 89 pages
Bibliography
Includes bibliographical references (pages 87-89).
Copyright
Copyright © 2017 Minli Bao
Recommended Citation
Bao, Minli. "A Moving-window penalization method and its applications." PhD (Doctor of Philosophy) thesis, University of Iowa, 2017.
https://doi.org/10.17077/etd.inch3pbi