Document Type

Dissertation

Date of Degree

Spring 2015

Degree Name

PhD (Doctor of Philosophy)

Degree In

Psychological and Quantitative Foundations

First Advisor

Stephen B. Dunbar

Abstract

A fundamental issue in educational measurement is what frame of reference to use when interpreting students’ performance on an assessment. One frame of reference that is often used to enhance interpretations of test scores is normative, which adds meaning to test score interpretations by indicating the rank of an individual’s score within a distribution of test scores of a well-defined reference group. One of the most commonly used frames of reference on student achievement provided by test publishers of large-scale assessments is national norms, whereby students’ test scores are referenced to a distribution of scores of a nationally representative sample. A national probability sample can fail to fully represent the population because of student and school nonparticipation. In practice, this is remedied by weighting the sample so that it better represents the intended reference population.

The focus of this study was on weighting and determining the extent to which weighting grade 4 and grade 8 student records that are not fully representative of the nation can recover distributions of reading and math scores in a national probability sample. Data from a statewide testing program were used to create six grade 4 and grade 8 datasets, each varying in its degree of representativeness of the nation, as well as in the proximity of its reading and math distributions to those of a national sample. The six datasets created for each grade were separately weighted to different population totals in two different weighting conditions using four different bivariate stratification designs. The weighted distributions were then smoothed and compared to smoothed distributions of the national sample in terms of descriptive statistics, maximum absolute differences between the relative cumulative frequency distributions, and chi-square effect sizes. The impact of using percentile ranks developed from the state data was also investigated.

By and large, the smoothed distributions of the weighted datasets were able to recover the national distribution in each content area, grade, and weighting condition. Weighting the datasets to the nation was effective in making the state test score distributions more similar to the national distributions. Moreover, the stratification design that defined weighting cells by the joint distribution of median household income and ethnic composition of the school consistently produced desirable results for the six datasets used in each grade. Log-linear smoothing using a polynomial of degree 4 was effective in making the weighted distributions even more similar to those in the national sample. Investigation of the impact of using the percentile ranks derived from the state datasets revealed that the percentile ranks of the distributions that were most similar to the national distributions resulted in a high percentage of agreement when classifying student performance based on raw scores associated with the same percentile rank in each dataset. The utility of having a national frame of reference on student achievement, and the efficacy of estimating such a frame of reference from existing data are also discussed.

Public Abstract

One frame of reference that is often used to enhance interpretations of test scores is normative. Most commonly used by publishers of large-scale assessments are national norms, whereby students’ scores are referenced to a distribution of scores of a nationally representative sample. A national probability sample can fail to fully represent the population because of student and school nonparticipation. In practice, this is remedied by weighting the sample so that it better represents the intended reference population.

The focus of this study was on weighting grade 4 and grade 8 student records not fully representative of the nation to recover distributions of reading and math scores in a national probability sample. Statewide testing program data were used to create datasets that varied in their representativeness of the nation, and their proximity to reading and math distributions in a national sample. These datasets were separately weighted to different population totals in two different weighting conditions using four different bivariate stratification designs.

In general, the smoothed distributions of the weighted datasets were able to recover the national distribution in each content area, grade, and weighting condition. Weighting student records by the joint distribution of median household income and ethnic composition of the school consistently produced desirable results for the datasets used in each grade. Percentile ranks of the distributions that were most similar to the national distributions resulted in a high percentage of agreement when classifying student performance based on raw scores associated with the same percentile rank in each dataset.

Keywords

publicabstract, Achievement Tests, Large-scale assessment, Norming, Norming Samples, Test Norms, Weighting

Pages

xiv, 190 pages

Bibliography

Includes bibliographical references (pages 174-182).

Copyright

Copyright 2015 Joshua Tudor

Share

COinS