Date of Degree
PhD (Doctor of Philosophy)
Psychological and Quantitative Foundations
Michael J. Kolen
This dissertation investigates the interaction of population invariance, equating assumptions, and equating accuracy with group differences. In addition, matched samples equating methods are considered as a possible way to improve equating accuracy with large group differences.
Data from one administration of four mixed-format Advanced Placement (AP) Exams were used to create pseudo old and new forms sharing common items. Population invariance analyses were conducted based on levels of examinee parental education using a single group equating design. Old and new form groups with common item effect sizes (ESs) ranging from 0 to 0.75 were created by sampling examinees based on their level of parental education. Equating was conducted for four common item nonequivalent group design equating methods: frequency estimation, chained equipercentile, IRT true score, and IRT observed score. Additionally, groups with ESs greater than zero were matched using three different matching techniques including exact matching on parental education level and propensity score matching with several other background variables. The accuracy of equating results was evaluated by comparing each equating relationship with an ES greater than zero to the equating relationship where the ES equaled zero. Differences between comparison and criterion equating relationships were quantified using the root expected mean squared difference (REMSD) statistic, classification consistency, and standard errors of equating (SEs).
The accuracy of equating results and the adequacy of equating assumptions was compared for unmatched and matched samples.
As ES increased, equating results tended to become less accurate and less consistent across equating methods. However, there was relatively little population dependence of equating results, despite large subgroup performance differences. Large differences between criterion and comparison equating relationships appeared to be caused instead by violations of equating assumptions. As group differences increased, the degree to which frequency estimation and chained equipercentile assumptions held decreased. In addition, all four AP Exams showed some evidence of multidimensionality. Because old and new form groups were selected to differ in terms of their respective levels of parental education, the matching methods that included parental education appeared to improve equating accuracy and the degree to which equating assumptions held, at least for very large ESs.
Copyright 2010 Sonya Powers