Date of Degree
PhD (Doctor of Philosophy)
Psychological and Quantitative Foundations
Welch, Catherine J
Dunbar, Stephen B
First Committee Member
Harris, Deborah J
Second Committee Member
LeBeau, Brandon C
Third Committee Member
The purposes of the study were to compare the relative performances of three fixed item parameter calibration methods (FIPC) in item and ability parameter estimation and to examine how the ability estimates obtained from these different methods affect interpretations using reported scales of different lengths.
Through a simulation design, the study was divided into two stages. The first stage was the calibration stage, where the parameters of pretest items were estimated. This stage investigated the accuracy of item parameter estimates and the recovery of the underlying ability distributions for different sample sizes, different numbers of pretest items, and different types of ability distributions under the three-parameter logistic model (3PL). The second stage was the operational stage, where the estimated parameters of the pretest items were put on operational forms and were used to score examinees. The second stage investigated the effect of item parameter estimation had on the ability estimation and reported scores for the new test forms.
It was found that the item parameters estimated from the three FIPC methods showed subtle differences, but the results of the DeMars method were closer to those of the separate calibration with linking method than to the FIPC with simple-prior update and FIPC with iterative prior update methods, while the FIPC with simple-prior update and FIPC with iterative prior update methods performed similarly. Regarding the experimental factors that were manipulated in the simulation, the study found that the sample size influenced the estimation of item parameters. The effect of the number of pretest items on estimation of item parameters was strong but ambiguous, likely because the effect was confounded by changes of both the number of the pretest items and the characteristics of the pretest items among the item sets. The effect of ability distributions on estimation of item parameters was not as evident as the effect of the other two factors.
After the pretest items were calibrated, the parameter estimates of these items were put into operational use. The abilities of the examinees were then estimated based on the examinees’ response to the existing operational items and the new items (previously called pretest items), of which the item parameters were estimated under different conditions. This study found that there were high correlations between the ability estimates and the true abilities of the examinees when forms containing pretest items calibrated using any of the three FIPC methods. The results suggested that all three FIPC methods were similarly competent in estimating parameters of the items, leading to satisfying determination of the examinees’ abilities. When considering the scale scores, because the estimated abilities were very similar, there were small differences among the scaled scores on the same scale; the relative frequency of examinees classified into performance categories and the classification consistency index also showed the interpretation of reported scores across scales were similar.
The study provided a comprehensive comparison on the use of FIPC methods in parameter estimation. It was hoped that this study would help the practitioners choose among the methods according to the needs of the testing programs. When ability estimates were linearly transformed into scale scores, the lengths of scales did not affect the statistical properties of scores, however, they may impact how the scores are subjectively perceived by stakeholders and therefore should be carefully selected.
Fixed item parameter calibration, Item calibration, Item response theory, Scale score, Simulation
xv, 225 pages
Includes bibliographical references (pages 128-136).
Copyright © 2019 Keyu Chen
Chen, Keyu. "A comparison of fixed item parameter calibration methods and reporting score scales in the development of an item pool." PhD (Doctor of Philosophy) thesis, University of Iowa, 2019.