Date of Degree
PhD (Doctor of Philosophy)
High-dimensional data offers researchers increased ability to find useful factors in predicting a response. However, determination of the most important factors requires careful selection of the explanatory variables. In order to tackle this challenge, much work has been done on single or grouped variable selection under the penalized regression framework. Although the topic of variable selection has been extensively studied under the parametric framework, its applications to more flexible nonparametric models are yet to be explored.
In order to implement the variable selection in nonparametric additive models, I introduce and study two nonconvex selection methods under the penalized regression framework, namely the group MCP and the adaptive group LASSO, aiming at improvements on the selection performances of the more widely known group LASSO method in such models. One major part of the dissertation focuses on the theoretical properties of the group MCP and the adaptive group LASSO. I derive their selection and estimation properties. The application of the presently proposed methods to nonparametric additive models are further examined using simulation. Their applications to areas such as the economics and genomics are presented as well. Under both the simulation studies and data applications, the group MCP and the adaptive group LASSO have shown their advantages over the more traditionally used group LASSO method.
For the proposed adaptive group LASSO that uses the newly proposed weights, whose recursive application is therefore never studied before, I also derive its theoretical properties under a very general framework. Simulation studies under linear regression are included.
In addition to the theoretical and empirical investigations, throughout the dissertation, several other important issues have been briefly discussed, including the computing algorithms and different ways of selecting tuning parameters.
Nowadays in areas such as genetics, behavioral sciences and banking and finance, high dimensional data (i.e. data that have a much greater number of variables than the sample size) are more and more frequently available. On one hand, high-dimensional data offer researchers increased ability to find useful factors in building statistical models and predicting a response variable. On the other hand, determination of the most important factors requires careful selection of variables. In order to tackle this challenge, much work has been done on variable selection using statistical methods. Although the topic of variable selection has been studied under some modeling framework, its extensions to many other models are yet to be explored.
In order to implement the variable selection in the so-called nonparametric additive models that are very useful in many scientific areas, I introduce and study two variable selection methods in my dissertation, aiming at improvements on the variable selection performances of the more traditional variable selection method in such models. The dissertation studies the theoretical properties of the proposed methods and examines these methods using extensive simulation studies. The dissertation also presents their applications to economics and genomics. Under both the simulation studies and data applications, the proposed methods have shown their advantages over the more traditional method.
x, 92 pages
Includes bibliographical references (pages 88-92).
Copyright 2014 Xiangmin Zhang