Date of Degree
PhD (Doctor of Philosophy)
First Committee Member
Second Committee Member
Third Committee Member
Joseph B Lang
Fourth Committee Member
In fields such as statistics, economics and biology, heterogeneity is an important topic concerning validity of data inference and discovery of hidden patterns. This thesis focuses on penalized methods for regression analysis with the presence of heterogeneity in a potentially high-dimensional setting. Two possible strategies to deal with heterogeneity are: robust regression methods that provide heterogeneity-resistant coefficient estimation, and direct detection of heterogeneity while estimating coefficients accurately in the meantime.
We consider the first strategy for two robust regression methods, Huber loss regression and quantile regression with Lasso or Elastic-Net penalties, which have been studied theoretically but lack efficient algorithms. We propose a new algorithm Semismooth Newton Coordinate Descent to solve them. The algorithm is a novel combination of Semismooth Newton Algorithm and Coordinate Descent that applies to penalized optimization problems with both nonsmooth loss and nonsmooth penalty. We prove its convergence properties, and show its computational efficiency through numerical studies.
We also propose a nonconvex penalized regression method, Heterogeneity Discovery Regression (HDR) , as a realization of the second idea. We establish theoretical results that guarantees statistical precision for any local optimum of the objective function with high probability. We also compare the numerical performances of HDR with competitors including Huber loss regression, quantile regression and least squares through simulation studies and a real data example. In these experiments, HDR methods are able to detect heterogeneity accurately, and also largely outperform the competitors in terms of coefficient estimation and variable selection.
In fields such as statistics, economics and biology, heterogeneity is an important topic concerning validity of data inference and discovery of hidden patterns. Our insights and interpretation of the data can be dramatically influenced by the presence of heterogeneity. And this is especially challenging in high-dimensional data which become increasingly common nowadays in many areas such as genetics, behavioral sciences, image and natural language processing. This thesis focuses on penalized methods for regression analysis with the presence of heterogeneity in a potentially high-dimensional setting.
One strategy to deal with heterogeneity is robust regression methods that provide heterogeneity-resistant coefficient estimation. We develop a novel algorithm, Semismooth Newton Coordinate Descent, that computes two important classes of penalized robust regression methods efficiently and scales very well to ultra-high dimensions (e.g. 100000). Another strategy is direct detection of heterogeneity while estimating coefficients accurately in the meantime. We propose a nonconvex penalized regression method, Heterogeneity Discovery Regression (HDR), as a realization of this idea. We establish good theoretical properties for the approach, and demonstrate significant advantages of HDR over alternatives such as robust regressions through simulation studies. Finally, we also illustrate the application of HDR to a building energy data.
heterogeneity detection, high-dimensional, nonconvex regularization, optimization, robust regression, variable selection
ix, 98 pages
Includes bibliographical references (pages 96-98).
Copyright © 2016 Congrui Yi
Yi, Congrui. "Penalized methods and algorithms for high-dimensional regression in the presence of heterogeneity." PhD (Doctor of Philosophy) thesis, University of Iowa, 2016.