Document Type

Dissertation

Date of Degree

Fall 2016

Access Restrictions

Access restricted until 02/23/2019

Degree Name

PhD (Doctor of Philosophy)

Degree In

Biostatistics

First Advisor

Jian Huang

Abstract

Due to the rapid development and growing need for information technologies, more and more researchers start to focus on high-dimensional data. Much work has been done on problems like point estimation possessing oracle inequalities, coefficient estimation, variable selection in high-dimensional regression models. However, with respect to the statistical inference for the regression coefficients, there have been few studies. Therefore, we propose a regularized efficient score estimation and testing (RESET) approach for treatment effects in the presence of nuisance parameters, either low-dimensional or high-dimensional, in generalized linear models (GLMs). Based on the RESET method, we are also able to develop another two-step approach related to the same problem.

The RESET approach is based on estimating the efficient score function of the treatment parameters. This means we are trying to remove the influence of nuisance parameters on the treatment parameters and construct an efficient score function which could be used for estimating and testing for the treatment effect. The RESET approach can be used in both low-dimensional and high-dimensional settings. As the simulation results show, it is comparable with the commonly used maximum likelihood estimators in most low-dimensional cases. We will prove that the RESET estimator is consistent under some regularity conditions, either in the low-dimensional or the high-dimensional linear models. Also, it is shown that the efficient score function of the treatment parameters follows a chi-square distribution, based on which the regularized efficient score tests are constructed to test for the treatment effect, in both low-dimensional and high-dimensional GLMs.

The two-step approach is mainly used for high-dimensional inference. It combines the RESET approach with a first step of selecting "promising" variables for the purpose of reducing the dimension of the regression model. The minimax concave penalty is adopted for its oracle property, which means it tends to choose "correct" variables asymptotically. The simulation results show that some improvement is still required for this approach, which will be part of our future research direction.

Finally, both the RESET and the two-step approaches are implemented with a real data example to demonstrate their application, followed by a conclusion for all the problems investigated here and a discussion for the directions of future research.

Pages

xvi, 156

Bibliography

154-156

Copyright

Copyright © 2016 Lixi Yu

Available for download on Saturday, February 23, 2019

Included in

Biostatistics Commons

Share

COinS