Date of Degree
PhD (Doctor of Philosophy)
First Committee Member
Brennan, Robert L
Second Committee Member
Kolen, Michael J
Third Committee Member
Ansley, Timothy N
Fourth Committee Member
There is an increasing demand for subscore reporting in the testing industry. Many testing programs already include subscores as part of their score report or consider a plan of reporting subscores. However, relatively few studies have been conducted on subscore equating. The purpose of this dissertation is to address the necessity for subscore equating and to evaluate the performance of various equating methods for subscores.
Assuming the random groups design and number-correct scoring, this dissertation analyzed two sets of real data and simulated data with four study factors including test dimensionality, subtest length, form difference in difficulty, and sample size. Equating methods considered in this dissertation were linear equating, equipercentile equating, equipercentile with log-linear presmoothing, equipercentile equating with cubic-spline postsmoothing, IRT true score equating using a three-parameter logistic model (3PL) with separate calibration (3PsepT), IRT observed score equating using 3PL with separate calibration (3PsepO), IRT true score equating using 3PL with simultaneous calibration (3PsimT), IRT observed score equating using 3PL with simultaneous calibration (3PsimO), IRT true score equating using a bifactor model (BF) with simultaneous calibration (BFT), and IRT observed score equating using BF with simultaneous calibration (BFO). They were compared to identity equating and evaluated with respect to systematic, random, and total errors of equating.
The main findings of this dissertation were as follows: (1) reporting subscores without equating would provide misleading information in terms of score profiles; (2) reporting subscores without a pre-specified test specification would bring practical issues such as constructing alternate subtest forms with comparable difficulty, conducting equating between forms with different lengths, and deciding an appropriate score scale to be reported; (3) the best performing subscore equating method, overall, was 3PsepO followed by equipercentile equating with presmoothing, and the worst performing method was BFT; (4) simultaneous calibration involving other subtest items in the calibration process yielded larger bias but smaller random error than did separate calibration, indicating that borrowing information from other subtests increased bias but decreased random error in subscore equating; (5) BFO performed the best when a test is multidimensional, form difference is small, subtest length is short, or sample size is small; (6) equating results for BFT and BFO were affected by the magnitude of factor loading and variability for the estimated general and specific factors; and (7) smoothing improved equating results, in general.
There is an increasing demand for subscore reporting in the testing industry. Many test users believe that subscores provide more insight into students’ strengths and weaknesses. Due to such demands, many testing programs already include subscores as part of their score report or consider a plan of reporting subscores. However, subscores reported might represent form difference in difficulty rather than a student’s relative performances between subareas. The purpose of this dissertation is to address why equating – a statistical process to adjust possible differences in form difficulty – is required for subscores and to examine which equating method is preferred under various subtest conditions.
The results showed that reporting subscores without equating indicates a student’s strengths and weaknesses incorrectly due to form difference in difficulty. It was also noted that reporting subscores that were not intended to be reported would have numerous practical issues, which makes it difficult to conduct equating and to assign meaning to subscores. The findings of this dissertation may help test developers to make decisions on subscore reporting and equating and inform test users about how to interpret a score report including subscores.
publicabstract, equating, profile, reporting, subscore
xv, 150 pages
Includes bibliographical references (pages 145-150).
Copyright 2016 Euijin Lim
Lim, Euijin. "Subscore equating with the random groups design." PhD (Doctor of Philosophy) thesis, University of Iowa, 2016.