Date of Degree
PhD (Doctor of Philosophy)
Psychological and Quantitative Foundations
Investigating the possibility of customizing off-the-shelf tests to provide various kinds of information is becoming increasingly interesting due to high information demands in the current testing environment. Comparing examinee achievement status on a national basis using such tests may provide a cost-effective solution for some practical problems. However, the normative estimates based on customized tests may be very different from those based on intact tests, and the validity of customized norms may be seriously compromised. The primary purpose of this study was to investigate the impact of various factors on the validity of customized norms. These factors included customizing strategy, estimating items, test length, correlations of latent abilities assessed by items from an intact test and new items, and test dimensional structures.
Monte Carlo simulation techniques were used to examine the accuracy of the customized norms. Both unidimensional and multidimensional data sets were generated and calibrated using unidimensional item response theory models. The five factors cited above were manipulated in a partially crossed design, with a total of 44 combinations of conditions. The outcomes of interest included estimated ability distributions and correlations, mean differences, mean absolute differences of ability and percentile estimates derived from intact tests and customized tests.
Based on the results of this study, it was concluded that: (1) customized instruments with all items from intact tests provided more accurate normative estimates than instruments having some items from intact tests removed; (2) using only items from intact tests to derive norms yielded more accurate estimates than using all items in customized tests; (3) lengthened customized tests yielded more accurate estimates than shortened tests; (4) the higher the correlation of latent abilities measured by items from intact tests and new items, the more accurate the normative estimates; (5) the impacts of the various factors were small when the unidimensionality assumption was satisfied; the differences increased when data structures became more complicated.
validity, IRT, augmented tests;
Copyright 2008 Xiaohui Zhao