Document Type


Date of Degree

Spring 2013

Degree Name

PhD (Doctor of Philosophy)

Degree In

Education (Educational Measurement and Statistics)

First Advisor

Lee, Won-Chan

Second Advisor

Harris, Deborah J.

First Committee Member

Kolen, Michael J.

Second Committee Member

Welch, Catherine

Third Committee Member

Tan, Aixin


Under the Common Core State Standard (CCSS) initiative, states that voluntarily adopt the common core standards work together to develop a common assessment in order to supplement and replace existing state assessments. However, the common assessment may not cover all state standards, so states within the consortium can augment the assessment using locally developed items that align with state-specific standards to ensure that all necessary standards are measured. The purpose of this dissertation was to evaluate the linking accuracy of the augmented tests using the common-item nonequivalent groups design.

Pseudo-test analyses were conducted by splitting a large-scale math assessment in half, creating two parallel common assessments, and by augmenting two sets of state-specific items from a large-scale science assessment. Based upon some modifications of the pseudo-data, a simulated study was also conducted.

For the pseudo-test analyses, three factors were investigated: (1) the difference in ability between the new and old test groups, (2) the differential effect size for the common assessment and state-specific item set, and (3) the number of common items. For the simulation analyses, the latent-trait correlations between the common assessment and state-specific item set as well as the differential latent-trait correlations between the common assessment and state-specific item set were used in addition to the three factors considered for the pseudo-test analyses. For each of the analyses, four equating methods were used: the frequency estimation, chained equipercentile, item response theory (IRT) true score, and IRT observed score methods.

The main findings of this dissertation were as follows: (1) as the group ability difference increased, bias also increased; (2) when the effect sizes differed for the common assessment and state-specific item set, larger bias was observed; (3) increasing the number of common items resulted in less bias, especially for the frequency estimation method when the group ability differed; (4) the frequency estimation method was more sensitive to the group ability difference than the differential effect size, while the IRT equating methods were more sensitive to the differential effect size than the group ability difference; (5) higher latent-trait correlation between the common assessment and state-specific item set was associated with smaller bias, and if the latent-trait correlation exceeded 0.8, the four equating methods provided adequate linking unless the group ability difference was large; (6) differential latent-trait correlations for the old and new tests resulted in larger bias than the same latent-trait correlations for the old and new tests, and (7) when the old and new test groups were equivalent, the frequency estimation method provided the least bias, but IRT true score and observed score equating resulted in smaller bias than the frequency estimation and chained equipercentile methods when group ability differed.


Augmented Test, Equating and linking


xviii, 217 pages


Includes bibliographical references (pages 105-109).


Copyright 2013 Ja Young Kim

Included in

Education Commons