Date of Degree
PhD (Doctor of Philosophy)
Psychological and Quantitative Foundations
First Committee Member
Michael J Kolen
Second Committee Member
Robert L Brennan
Third Committee Member
Timothy N Ansley
Fourth Committee Member
Fifth Committee Member
The purpose of this study was to build upon the existing MIRT equating literature by introducing a full multidimensional item response theory (MIRT) observed score equating method for mixed-format exams because no such methods currently exist. At this time, the MIRT equating literature is limited to full MIRT observed score equating methods for multiple-choice only exams and Bifactor observed score equating methods for mixed-format exams. Given the high frequency with which mixed-format exams are used and the accumulating evidence that some tests are not purely unidimensional, it was important to present a full MIRT equating method for mixed-format tests.
The performance of the full MIRT observed score method was compared with the traditional equipercentile method, and unidimensional IRT (UIRT) observed score method, and Bifactor observed score method. With the Bifactor methods, group-specific factors were defined according to item format or content subdomain. With the full MIRT methods, two- and four-dimensional models were included and correlations between latent abilities were freely estimated or set to zero. All equating procedures were carried out using three end-of-course exams: Chemistry, Spanish Language, and English Language and Composition. For all subjects, two separate datasets were created using pseudo-groups in order to have two separate equating criteria. The specific equating criteria that served as baselines for comparisons with all other methods were the theoretical Identity and the traditional equipercentile procedures.
Several important conclusions were made. In general, the multidimensional methods were found to perform better for datasets that evidenced more multidimensionality, whereas unidimensional methods worked better for unidimensional datasets. In addition, the scale on which scores are reported influenced the comparative conclusions made among the studied methods. For performance classifications, which are most important to examinees, there typically were not large discrepancies among the UIRT, Bifactor, and full MIRT methods. However, this study was limited by its sole reliance on real data which was not very multidimensional and for which the true equating relationship was not known. Therefore, plans for improvements, including the addition of a simulation study to introduce a variety of dimensional data structures, are also discussed.
Bifactor, Dimensionality, Equating, Item Response Theory, Multidimensional Item Response Theory
xx, 207 pages
Includes bibliographical references (pages 127-135).
Copyright 2014 Jaime Peterson