DOI

10.17077/etd.jyl5xew4

Document Type

Dissertation

Date of Degree

Fall 2016

Access Restrictions

Access restricted until 02/23/2019

Degree Name

PhD (Doctor of Philosophy)

Degree In

Biostatistics

First Advisor

Jian Huang

First Committee Member

Joseph Cavanaugh

Second Committee Member

Kai Wang

Third Committee Member

Michael Jones

Fourth Committee Member

Patrick Breheny

Fifth Committee Member

Jian Huang

Abstract

Due to the rapid development and growing need for information technologies, more and more researchers start to focus on high-dimensional data. Much work has been done on problems like point estimation possessing oracle inequalities, coefficient estimation, variable selection in high-dimensional regression models. However, with respect to the statistical inference for the regression coefficients, there have been few studies. Therefore, we propose a regularized efficient score estimation and testing (RESET) approach for treatment effects in the presence of nuisance parameters, either low-dimensional or high-dimensional, in generalized linear models (GLMs). Based on the RESET method, we are also able to develop another two-step approach related to the same problem.

The RESET approach is based on estimating the efficient score function of the treatment parameters. This means we are trying to remove the influence of nuisance parameters on the treatment parameters and construct an efficient score function which could be used for estimating and testing for the treatment effect. The RESET approach can be used in both low-dimensional and high-dimensional settings. As the simulation results show, it is comparable with the commonly used maximum likelihood estimators in most low-dimensional cases. We will prove that the RESET estimator is consistent under some regularity conditions, either in the low-dimensional or the high-dimensional linear models. Also, it is shown that the efficient score function of the treatment parameters follows a chi-square distribution, based on which the regularized efficient score tests are constructed to test for the treatment effect, in both low-dimensional and high-dimensional GLMs.

The two-step approach is mainly used for high-dimensional inference. It combines the RESET approach with a first step of selecting "promising" variables for the purpose of reducing the dimension of the regression model. The minimax concave penalty is adopted for its oracle property, which means it tends to choose "correct" variables asymptotically. The simulation results show that some improvement is still required for this approach, which will be part of our future research direction.

Finally, both the RESET and the two-step approaches are implemented with a real data example to demonstrate their application, followed by a conclusion for all the problems investigated here and a discussion for the directions of future research.

Public Abstract

Due to the rapid development and growing need for information technologies, researchers often encounter high-dimensional data (data with more parameters to estimate than observations). One of the main problems they face is estimating the effect of a specific factor on a response variable while controlling for the influence of other factors. With respect to this problem, there have been few studies. In order to tackle this challenge, we propose a new approach based on likelihood theory but combined with parameter penalization. This approach is called the regularized efficient score estimation and testing (RESET) approach.

In this dissertation, the methodology and asymptotic properties of RESET are presented in detail. In order to study its finite sample performance, simulation studies are also implemented to compare RESET with other estimation and testing methods in both low-dimensional and high-dimensional generalized linear models. Under the simulation studies, RESET shows both advantages and disadvantages over the other approaches.

Finally, the utility of RESET for high-dimensional data is demonstrated by applying the method to a breast cancer dataset, followed by a conclusion and a discussion for the directions of future research.

Pages

xvi, 156 pages

Bibliography

Includes bibliographical references (pages 154-156).

Copyright

Copyright © 2016 Lixi Yu

Available for download on Saturday, February 23, 2019

Included in

Biostatistics Commons

Share

COinS