Document Type


Date of Degree

Fall 2017

Access Restrictions

Access restricted until 01/31/2020

Degree Name

PhD (Doctor of Philosophy)

Degree In

Psychological and Quantitative Foundations

First Advisor

Welch, Catherine J

First Committee Member

Kolen, Michael J

Second Committee Member

Dunbar, Stephen B

Third Committee Member

Ansley, Timothy N

Fourth Committee Member

Park, Soonhye


The purpose of this study was to investigate the effects of various testlet characteristics in terms of an ability parameter recovery under the modality of computerized adaptive test (CAT). Given the popularity of using CATs and the high frequency of emerging testlets into exams as either mixed format or not, it was important to evaluate the various conditions in a testlet-based CAT fitted testlet response theory models. The manipulated factors of this study were testlet size, testlet effect size, testlet composition, and exam format. The performance of each condition was compared with the true thetas which were 81 equally spaced points from -3.0 to +3.0. For each condition, 1,000 times of replication process were conducted with respect to overall bias, overall standard error, overall RMSE, conditional bias, conditional standard error, conditional RMSE, as well as conditional passing rate. The conditional results were presented in the pre-specified intervals.

Several significant conclusions were made. Overall, the mean theta estimates over 1,000 replications were close to the true thetas regardless of manipulated conditions. In terms of aggregated overall RMSE, predictable relationships were found in four study factors: A larger amount of error was associated with a longer testlet, a bigger effect size, a random composition, and a testlet only exam format. However, when the aggregated overall bias was considered, only two effects were observed: a large difference among three testlet length conditions, and almost no difference between two testlet composition conditions. As expected, conditional SEMs for all conditions showed a U-shape across the theta scale. The noticeable discrepancy occurred only within the testlet length condition: more error was associated with the condition of the longest testlet length compared to the short and medium length conditions. Conditional passing rate showed little discrepancy among conditions within each facto, so no particular association was found.

In general, a short testlet length is better, a small testlet effect size is better, a homogeneous difficulty composition is better, and a mixed format is better in terms of the smaller amount of error found in this study. Other than these obvious findings, some interaction effects were also observed. When the medium or large (i.e., greater than .50) testlet effect was suspicious, it was better to have a short length testlet. It was also found that using a mixed-format exam increased the accuracy of the random difficulty composition. However, this study was limited by several other factors which were controlled to be the same across the conditions: a fixed length exam, no content balancing, and the uniform testlet effects. Consequently, plans for improvements in terms of generalization were also discussed.


Computerized adaptive test, Mixed-format exam, Rasch model, Testlet, Testlet response theory


xi, 117 pages


Includes bibliographical references (pages 103-117).


Copyright © 2017 Seohong Pak