Document Type


Date of Degree

Fall 2013

Degree Name

PhD (Doctor of Philosophy)

Degree In

Psychological and Quantitative Foundations

First Advisor

Kolen, Michael J

First Committee Member

Dorans, Neil J

Second Committee Member

Lee, Won-Chan

Third Committee Member

Brennan, Robert L

Fourth Committee Member

Ansley, Timothy N

Fifth Committee Member

Tan, Aixin


Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under the common-item nonequivalent groups design (CINEG). The purpose of this dissertation was to investigate how various test characteristics and examinee characteristics influence CINEG mixed-format test score equating results.

Simulated data were used in this dissertation. Simulees' item responses were generated using items selected from one MC item pool and one CR item pool which were constructed based on the College Board Advanced Placement examinations from various subject areas. Five main factors were investigated in this dissertation, including item-type dimensionality, group ability difference, within group ability difference, length and composition of the common-item set, and format representativeness of the common-item set. In addition, the performance of two equating methods, the presmoothed frequency estimation method (PreSm_FE) and the presmoothed chained equipercentile equating method (PreSm_CE), was compared under various conditions. To evaluate equating results, both conditional statistics and overall summary statistics were considered: absolute bias, standard error of equating, and root mean squared error. The difference that matters (DTM) also was used as a criterion for evaluating whether adequate equating results were obtained.

The main findings based on the simulation studies are as follows:

(1) For most situations, item-type multidimensionality did not have substantial impact on random error, regardless of the common-item set. However, its influence on bias depended on the composition of common-item sets;

(2) Both the group ability difference factor and the within group ability difference factor had no substantial influence on random error. When group ability differences were simulated, the common-item set with more items or more total score points had less equating error. When a within group ability difference existed, conditions in which there was a balance of different item formats in the common-item set displayed more accurate equating results than did unbalanced common-item sets.

(3) The relative performance of common-item sets with various lengths and compositions was dependent on the levels of group ability difference, within group ability difference, and test dimensionality.

(4) The common-item set containing only MC items performed similarly to the common-item set with both item formats when the test forms were unidimensional and no within group ability difference existed or when groups of examinees did not differ in proficiency.

(5) The PreSm_FE method was more sensitive to group ability difference than the PreSm_CE method. When the within group ability difference was non-zero, the relative performance of the two methods depended on the length and composition of the common-item set. The two methods performed almost the same in terms of random error.

The studies conducted in this dissertation suggest that when equating multidimensional mixed-format test forms in practice, if groups of examinees differ substantially in overall proficiency, inclusion of both item formats should be considered for the common-item set. When within group ability differences are likely to exist, balancing different item formats in the common-item set appears to be even more important than the use of a larger number of common items for obtaining accurate equating results. Because only simulation studies were conducted in this dissertation, caution should be exercised when generalizing the conclusions to practical situations.


group ability difference, item-type multidimensionality, length and composition of common-item set, mixed-format test score equating, type of common-item set, within group ability difference


xxiv, 287 pages


Includes bibliographical references (pages 282-287).


Copyright 2013 Wei Wang