DOI

10.17077/etd.pgqrx2xq

Document Type

Dissertation

Date of Degree

Fall 2013

Access Restrictions

.

Degree Name

PhD (Doctor of Philosophy)

Degree In

Biostatistics

First Advisor

Cavanaugh, Joseph E.

First Committee Member

Chaloner, Kathryn

Second Committee Member

Clarke, William R.

Third Committee Member

Pendergast, Jane F.

Fourth Committee Member

Lang, Joseph B.

Fifth Committee Member

Foster, Eric D.

Abstract

Given a set of potential explanatory variables, one model selection approach is to select the best model, according to some criterion, from among the collection of models defined by all possible subsets of the explanatory variables. A popular procedure that has been used in this setting is to select the model that results in the smallest value of the Akaike information criterion (AIC). One drawback in using the AIC is that it can lead to the frequent selection of overspecified models. This can be problematic if the researcher wishes to assert, with some level of certainty, the necessity of any given variable that has been selected.

This thesis develops a model selection procedure that allows the researcher to nominate, a priori, the probability at which overspecified models will be selected from among all possible subsets. The procedure seeks to determine if the inclusion of each candidate variable results in a sufficiently improved fitting term, and hence is referred to as the SIFT procedure. In order to determine whether there is sufficient evidence to retain a candidate variable or not, a set of threshold values are computed. Two procedures are proposed: a naive method based on a set of restrictive assumptions; and an empirical permutation-based method.

Graphical tools have also been developed to be used in conjunction with the SIFT procedure. The graphical representation of the SIFT procedure clarifies the process being undertaken. Using these tools can also assist researchers in developing a deeper understanding of the data they are analyzing.

The naive and empirical SIFT methods are investigated by way of simulation under a range of conditions within the standard linear model framework. The performance of the SIFT methodology is compared with model selection by minimum AIC; minimum Bayesian Information Criterion (BIC); and backward elimination based on p-values. The SIFT procedure is found to behave as designed—asymptotically selecting those variables that characterize the underlying data generating mechanism, while limiting the selection of false or spurious variables to the desired level.

The SIFT methodology offers researchers a promising new approach to model selection, whereby they are now able to control the probability of selecting an overspecified model to a level that best suits their needs.

Keywords

AIC, BIC, Information Criterion, Likelihood Ratio, Model Selection, SIFT

Pages

xii, 81 pages

Bibliography

Includes bibliographical references (pages 79-81).

Copyright

Copyright © 2013 Knute Derek Carter

Included in

Biostatistics Commons

Share

COinS