Date of Degree
PhD (Doctor of Philosophy)
Psychological and Quantitative Foundations
Robert L. Brennan
The purpose of this research was to compare the equating performance of various equating procedures for the multidimensional tests. To examine the various equating procedures, simulated data sets were used that were generated based on a multidimensional item response theory (MIRT) framework. Various equating procedures were examined, including both unidimensional and the multidimensional equating procedures based on an IRT framework in addition to traditional equating procedures. Specifically, the performance of the following six equating procedures under the random groups design was compared: (1) unidimensional IRT observed score equating, (2) unidimensional IRT true score equating, (3) full MIRT observed score equating, (4) unidimensionalized MIRT observed score equating, (5) unidimensionalized MIRT true score equating, and (6) equipercentile equating. A total of four factors (test length, sample size, form difficulty differences, and correlations between dimensions) were expected to impact equating performance, and their impacts were investigated by creating two conditions per each factor: long vs. short test, large vs. small sample size, some vs. no form differences, and high vs. low correlation between dimensions.
This simulation study over 50 replications yielded several patterns of equating performance of the six procedures across the simulation conditions. The following six findings are notable: (1) the full MIRT procedure provided more accurate equating results (i.e., less degree of error) than other equating procedures especially when the correlation between dimensions was low; (2) the equipercentile procedure was more likely than the IRT methods to yield a larger amount of random error and overall error across all the conditions; (3) equating for multidimensional tests was more accurate when form differences were small, sample size was large, and test length was long; (4) even when multidimensional tests were used (i.e., the unidimensionality assumptions were violated), still the unidimensional IRT procedures were found to yield quite accurate equating results; and (5) whether an equating procedure is an observed or a true score procedure did not seem to yield any differences in equating results. Building upon these findings, some theoretical and practical implications are discussed, and future research directions are suggested to strengthen the generalizability of the current findings. Given that only a handful of studies have been conducted in the MIRT literature, such research is expected to examine the various specific conditions where these findings are likely to be hold, thereby leading to practical guidelines that can be used in various operational testing situations.
Copyright 2013 Eunjung Lee