Document Type

Dissertation

Date of Degree

Fall 2014

Degree Name

PhD (Doctor of Philosophy)

Degree In

Psychological and Quantitative Foundations

First Advisor

Catherine J. Welch

Abstract

This study was undertaken to evaluate the impact of modeling decisions made by those charged with implementing teacher evaluation systems that incorporate student achievement data; such choices include how growth is to be modeled, whether student characteristics are to be controlled for, how many years of data are to be used, and which test subject is to be selected. Using a three-cohort longitudinal data set from a school district in which reading and mathematics test scores from a vertically-scaled assessment allowed determination of growth in grades three, four, and five, estimated teacher effects were derived from five value-added models, and the resulting rank orderings of the teachers were examined. The models compared were a covariate adjustment model that conditioned on prior achievement only, a covariate adjustment model that conditioned on certain student characteristics as well as prior achievement, a gain score model, the growth model underlying the vertically-scaled assessment, and student growth percentiles. Teacher rank orderings derived under the five models were highly consistent with one another using either one or three classroom years of test scores. Only when the movement of teachers between quartiles was examined did a difference in performance between some models emerge. The high degree of consistency between the two covariate adjustment models suggested that control for student-level characteristics was unnecessary. Using three years of test scores rather than one led to a small decrease in between-model correlations and a small increase in teacher movement between quartiles. Comparison of teacher value-added based on reading scores versus mathematics scores gave mixed results, with between-model correlations in mathematics being slightly higher than those for reading but with reading showing greater consistency in quartile movement between cohorts. The year-to-year change in teacher rank orderings was very striking, as low, and even negative, correlations emerged between years. Movement of teachers between quartiles from one year to the next was far greater than that observed when comparing the modeling conditions. Using a teacher rating scheme in which groups of teachers were distinguished from average effectiveness if they appeared in the extremes of the rankings, nearly half of teachers changed ratings from one year to the next. Such low inter-temporal stability of teacher value-added is a significant result that should be considered by all stakeholders in teacher evaluation.

Public Abstract

This study examined the impact of modeling decisions made in implementing value-added teacher evaluation; such choices include the growth model itself, whether to control for student characteristics, how many years of scores to use, and the subject tested.

Estimates of teacher effectiveness were derived from five models, which were a covariate adjustment model that conditioned on prior achievement only, a covariate adjustment model that conditioned on certain student characteristics as well as prior achievement, a gain score model, the growth model underlying the assessment, and student growth percentiles.

The resulting rank orderings of the teachers were examined and found to be highly consistent with one another using scores for either one or three classroom years. When the movement of teachers between quartiles of the rank orderings was examined, a difference in performance between some models did emerge. The covariate adjustment models were highly consistent, suggesting that control for student-level characteristics was unnecessary. Using three years of data rather than one did not significantly change model performance, and comparison of rank orderings based on reading scores versus mathematics scores gave mixed results.

The year-to-year inconsistency in rank orderings was striking. Movement of teachers between quartiles from one year to the next was far greater than that observed when comparing modeling conditions. Under a rating scheme in which teachers were distinguished from average effectiveness if they appeared in the extremes of the rankings, nearly half of teachers changed ratings from one year to the next.

Keywords

publicabstract, teacher evaluation, value-added modeling

Pages

xiii, 148 pages

Bibliography

Includes bibliographical references (pages 145-148).

Copyright

Copyright 2014 Paula Lynn Cunningham

Share

COinS