Document Type


Date of Degree

Spring 2015

Degree Name

PhD (Doctor of Philosophy)

Degree In


First Advisor

Zhang, Ying

Second Advisor

Long, Jeffrey D

First Committee Member

Pendergast, Jane

Second Committee Member

Cavanaugh, Joseph E

Third Committee Member

Jones, Michael P


Estimating parameters in a mixture of normal distributions dates back to the 19th century when Pearson originally considered data of crabs from the Bay of Naples. Since then, many real world applications of mixtures have led to various proposed methods for studying similar problems. Among them, maximum likelihood estimation (MLE) and the continuous empirical characteristic function (CECF) methods have drawn the most attention. However, the performance of these competing estimation methods has not been thoroughly studied in the literature and conclusions have not been consistent in published research. In this article, we review this classical problem with a focus on estimation bias. An extensive simulation study is conducted to compare the estimation bias between the MLE and CECF methods over a wide range of disparity values. We use the overlapping coefficient (OVL) to measure the amount of disparity, and provide a practical guideline for estimation quality in mixtures of normal distributions. Application to an ongoing multi-site Huntington disease study is illustrated for ascertaining cognitive biomarkers of disease progression.

We also study joint modeling of longitudinal and time-to-event data and discuss pattern-mixture and selection models, but focus on shared parameter models, which utilize unobserved random effects in order to "join" a marginal longitudinal data model and marginal survival model in order to assess an internal time-dependent covariate's effect on time-to-event. The marginal models used in the analysis are the Cox Proportional Hazards model and the Linear Mixed model, and both of these models are covered in some detail before defining joints models and describing the estimation process. Joint modeling provides a modeling framework which accounts for correlation between the longitudinal data and the time-to-event data, while also accounting for measurement error in the longitudinal process, which previous methods failed to do. Since it has been shown that bias is incurred, and this bias is proportional to the amount of measurement error, utilizing a joint modeling approach is preferred. Our setting is also complicated by monotone degeneration of the internal covariate considered, and so a joint model which utilizes monotone B-Splines to recover the longitudinal trajectory and a Cox Proportional Hazards (CPH) model for the time-to-event data is proposed. The monotonicity constraints are satisfied via the Projected Newton Raphson Algorithm as described by Cheng et al., 2012, with the baseline hazard profiled out of the $Q$ function in each M-step of the Expectation Maximization (EM) algorithm used for optimizing the observed likelihood. This method is applied to assess Total Motor Score's (TMS) ability to predict Huntington Disease motor diagnosis in the Biological Predictors of Huntington's Disease study (PREDICT-HD) data.

Public Abstract

Huntington Disease (HD) is a disease of the brain that is caused entirely by a genetic mutation in an individual's genes. An individual must have this genetic mutation in order to be at-risk for being diagnosed with HD. Otherwise, individuals will never get HD throughout their lifetime. As the disease worsens, at-risk individuals lose their ability to reason properly, control their movements, and complete day-to-day tasks. The first project of this dissertation attempts to discover whether motor function or cognitive ability can determine whether an individual is at-risk for HD, meaning they have the genetic mutation mentioned above. Discovering variables that can determine at-risk status is important because it can reduce the high cost associated with genetic testing, thus severely reducing the cost of studying this disease. This may lead to more research, and hopefully, a treatment which reduces time to onset or cures the disease. The second project of this dissertation attempts to determine whether motor functioning, cognitive ability, or other measurements can predict a HD diagnosis. In other words, individuals with higher values of a measurement may be at significantly greater risk for being diagnosed, similar to a raise in blood pressure triggering a diagnosis of hypertension. A measurement with a strong predictive ability can be used for testing new therapies in clinical trials with aims of reducing time to onset and finally, curing the disease.


publicabstract, Constrained Optimization, EM Algorithm, Joint Modeling, Mixtures, Profile methods, Survival Analysis


xi, 248 pages


Includes bibliographical references (pages 244-248).


Copyright 2015 Spencer G. Lourens

Included in

Biostatistics Commons