Document Type


Date of Degree

Summer 2013

Degree Name

PhD (Doctor of Philosophy)

Degree In

Psychological and Quantitative Foundations

First Advisor

Kolen, Michael J

First Committee Member

Ansley, Timothy N

Second Committee Member

Cowles, Mary Kathryn

Third Committee Member

Harris, Deborah

Fourth Committee Member

Lee, Won-Chan


Developmental score scales represent the performance of students along a continuum, where as students learn more they move higher along that continuum. Unidimensional item response theory (UIRT) vertical scaling has become a commonly used method to create developmental score scales. Research has shown that UIRT vertical scaling methods can be inconsistent in estimating grade-to-grade growth, within-grade variability, and separation of grade distributions (effect size) of developmental score scale. In particular the finding of scale shrinkage (decreasing within-grade score variability as grade-level increases) has led to concerns about and criticism of IRT vertical scales. The causes of scale shrinkage have yet to be fully understood. Real test data and simulation studies have been unable to provide complete answers as to why IRT vertical scaling inconsistencies occur. Violations of assumptions have been a commonly cited potential cause for the inconsistent results. For this reason, this dissertation is an extensive investigation into how violations of the three assumptions of UIRT vertical scaling - local item dependence, unidimensionality, and similar reliability of grade level tests - affect estimated developmental score scales.

Simulated tests were developed that purposefully violated a UIRT vertical scaling assumption. Three sets of simulated tests were created to test the effect of violating a single assumption. First, simulated tests were created with increasing, decreasing, low, medium, and high local item dependence. Second, multidimensional simulated tests were created by varying the correlation between dimensions. Third, simulated tests with dissimilar reliability were created by varying item parameters characteristics of the grade level tests. Multiple versions of twelve simulated tests were used to investigate UIRT vertical scaling assumption violations. The simulated tests were calibrated under the UIRT model to purposefully violate an assumption of UIRT vertical scaling. Each simulated test version was replicated for 1000 random examinee samples to assess the bias and standard error of estimated grade-to-grade-growth, within-grade-variability, and separation-of-grade-distributions (effect size) of the estimated developmental score scales.

The results suggest that when UIRT vertical scaling assumptions are violated the resulting estimated developmental score scales contain standard error and bias. For this study, the magnitude of standard error was similar across all simulated tests regardless of the assumption violation. However, bias fluctuated as a result of different types and magnitudes of UIRT vertical scaling assumption violations. More local item dependence resulted in more grade-to-grade-growth and separation-of-grade-distributions bias. And local item dependence resulted in developmental score scales that displayed scale expansion. Multidimensionality resulted in more grade-to-grade-growth and separation-of-grade-distributions bias when the correlation between dimensions was smaller. Multidimensionality resulted in developmental score scales that displayed scale expansion. Dissimilar reliability of grade level tests resulted in more grade-to-grade-growth bias and minimal separation-of-grade-distributions bias. Dissimilar reliability of grade level tests resulted in scale expansion or scale shrinkage depending on the item characteristics of the test. Limitations of this study and future research are discussed.


Assumption Violations, Educational Measurement, Item Response Theory, Vertical Scaling


xiii, 196 pages


Includes bibliographical references (pages 172-184).


Copyright 2013 Anna Marie Topczewski