Date of Degree
PhD (Doctor of Philosophy)
Psychological and Quantitative Foundations
Stephen B. Dunbar
Catherine J. Welch
First Committee Member
Timothy N. Ansley
Second Committee Member
Third Committee Member
John L. Hosp
As the use of technology for personal, professional, and learning purposes increases, more and more assessments are transitioning from a traditional paper-based testing format to a computer-based one. During this transition, some assessments are being offered in both paper and computer formats in order to accommodate examinees and testing center capabilities. Scores on the paper-based test are often intended to be directly comparable to the computer-based scores, but such claims of comparability are often unsupported by research specific to that assessment. Not only should the scores be examined for differences, but the thought processes used by raters while scoring those assessments should also be studied to better understand why raters might score response modes differently. Previous comparability literature can be informative, but more contemporary, test-specific research is needed in order to completely support the direct comparability of scores.
The goal of this thesis was to form a more complete understanding of why analytic scores on a writing assessment might differ, if at all, between handwritten and typed responses. A representative sample of responses to the writing composition portion of a large-scale high school equivalency assessment were used. Six trained raters analytically scored approximately six-hundred examinee responses each. Half of those responses were typed, and the other half were the transcribed handwritten duplicates. Multiple methods were used to examine why differences between response modes might exist. A MANOVA framework was applied to examine score differences between response modes, and the systematic analyses of think-alouds and interviews were used to explore differences in rater cognition. The results of these analyses indicated that response mode was of no practical significance, meaning that domain scores were not notably dependent on whether or not a response was presented as typed or handwritten. Raters, on the other hand, had a more substantial effect on scores. Comments from the think-alouds and interviews suggest that, while the scores were not affected by response mode, raters tended to consider certain aspects of typed responses differently than handwritten responses. For example, raters treated typographical errors differently from other conventional errors when scoring typed responses, but not while scoring the handwritten duplicates. Raters also indicated that they preferred scoring typed responses over handwritten ones, but felt they could overcome their personal preferences to score both response modes similarly.
Empirical investigations on the comparability of scores, combined with the analysis of raters’ thought processes, helped to provide a more evidence-based answer to the question of why scores might differ between response modes. Such information could be useful for test developers when making decisions regarding what mode options to offer and how to best train raters to score such assessments. The design of this study itself could be useful for testing organizations and future research endeavors, as it could be used as a guide for exploring score differences and the human-based reasons behind them.
As the use of technology increases, assessments are increasingly being offered as both paper-based and computer-based tests. The paper-based scores are often intended to be comparable to the computer-based scores, but such claims are often unsupported by research. Research on this topic exists, but more contemporary, test-specific research is needed in order to fully support these claims.
The purpose of this study was to gain a more complete understanding of why scores on a writing assessment might differ between handwritten and typed responses. Each of six trained raters scored approximately six-hundred responses to the writing composition portion of a large-scale assessment. Half of the responses were typed, and the other half were the handwritten duplicates. MANOVA was used to examine score differences between response modes, and think-alouds and interviews were used to explore differences in raters’ thought processes. Findings indicated that handwritten and typed responses received approximately the same scores and raters had a more notable effect on scores. Raters’ comments suggested that, while the scores were not affected by response mode, they tended to think about certain aspects of typed responses differently than handwritten responses. Such information could be useful for test developers when making decisions regarding what mode options to offer, how to train raters, and what score scales to use for reporting. The design of this study itself could be useful for testing organizations and future research endeavors, as it could be used as a guide for exploring score differences and the reasons behind them.
publicabstract, Comparability, Handwritten, Mixed Methods, Rater Cognition, Typed, Writing Assessment
xiv, 243 pages
Includes bibliographical references (pages 234-243).
Copyright 2015 Angelica Desiree Rankin
Rankin, Angelica Desiree. "A comparability study on differences between scores of handwritten and typed responses on a large-scale writing assessment." PhD (Doctor of Philosophy) thesis, University of Iowa, 2015.