Date of Degree
PhD (Doctor of Philosophy)
Jeffrey D. Dawson
Semicontinuous data consist of a combination of a point-mass at zero and a positive skewed distribution. This type of non-negative data distribution is found in data from many fields, but presents unique challenges for analysis. Specifically, these data cannot be analyzed using positive distributions, but distributions that are unbounded are also likely a poor fit. Two-part models incorporate both the zero values from semicontinuous data and the positive continuous values. In this dissertation, we compare zero-inflated gamma (ZIG) and zero-inflated log-normal (ZILN) two-part models. For both of these models, the probability that an outcome is non-zero is modeled via logistic regression. Then the distribution of the non-zero outcomes is modeled via gamma regression with a log-link for ZIG regression and via log-normal regression for ZILN.
In this dissertation we propose tests which combine the two parts of the ZIG and ZILN models in meaningful ways for performing a two group comparison. Then we compare these tests in terms of observed Type 1 error rates and power levels under both correctly specified and misspecified ZIG and ZILN models. Tests falling under two main hypotheses are examined. First, we look at two-part tests which come from a two-part hypothesis of no difference between the two groups in terms of the probability of non-zero values and in terms of the the mean of the non-zero values. The second type of tests are mean-based tests. These combine the two parts of the model in ways related to the overall group means of the semicontinuous variable. When not adjusting for covariates, two tests are developed based on a difference of means (DM) and a ratio of means (RM). When adjusting for covariates, tests using mean-based hypotheses are developed which marginalize over the values of the adjusting covariates. Under the adjusting framework, two ratio of means statistics are proposed and examined, an average of the subject specific ratio of means (RMSS) and a ratio of the marginal group means (RMMAR). Simulations are used to compare Type 1 error and power for these tests and standard two group comparison tests.
Simulation results show that when ZIG and ZILN models are misspecified and the coefficient of variation (CoV) and/or sample size is large, there are differences in Type 1 error and power results between the misspecified and correctly specified models. Specifically, when ZILN data with high CoV or sample size are analyzed as ZIG, Type 1 error rates are prohibitively high. On the other hand, when ZIG data are analyzed as ZILN under these scenarios, power levels are much lower for ZILN analyses than for ZIG analyses. Examination of Q-Q plots show, however, that in these settings, distinguishing between ZIG and ZILN data can be relatively straightforward. When the coefficient of variation is small it is harder to distinguish between ZIG and ZILN models, but the differences between Type 1 error rates and power levels for misspecified or correctly specified models is also slight.
Finally, we use the proposed methods to analyze a data set involving Parkinson's disease (PD) and driving. A number of these methods show that PD subjects exhibit poorer lane keeping ability than control subjects.
consonant effects, dissonant effects, semi-continuous data, two-part tests, zero-inflated gamma, zero-inflated log-normal
xvii, 280 pages
Includes bibliographical references (pages 278-280).
Copyright 2013 Elizabeth Dastrup Mills