The new study on value-added teacher evaluations and its limitations

One of the most interesting debates surrounding the Education Reform Movement has been about teacher evaluations. Education reformers operate off the assumption that bad teachers and bad schools are to blame for the achievement gap between poor and rich children, making teacher evaluations an important component of their reform program. Problematically for reformers, the value-added method of evaluation that they champion has not held up well to researcher scrutiny.

The value-added method — which tracks student test score improvement instead of test score levels — has been shown in some studies to cause erratic teacher evaluation results. Despite what reformers predicted, teachers being evaluated under this method have dramatically differently scores from year to year, calling into question its reliability as an indicator of teacher performance.

A massive new study weighs in on value-added methods, this time positively. The study found that superior teachers are identifiable through value-added evaluation, and that student differences cannot account for their different scores under different teachers. Given the size of this study, the findings give hope to the proponents of value-added methods of evaluation who so far have had a difficult time making their case. Those who think the evaluation methods are not useful have many other studies and data sets to rely on to make the opposing case as well. Although the study adds to the teacher evaluation debate, it has some limitations worth noting.

First, the study does not — nor does it intend to — weigh in on the question of whether testing-focused teacher evaluations are a good idea to begin with. Whether the evaluation methods actually track teacher performance is a separate question from whether we should bother to use them. Opponents of testing-focused teacher evaluations object to things other than its alleged lack of accuracy. High-stakes testing encourages cheating; the administration of tests takes time away from instruction; and, standardized testing encourages teaching to a narrow test, which might constrain the scope of what is taught. Additionally, the highest scoring country in the world on international comparisons — Finland — has no standardized testing at all, and does not use student testing for teacher evaluation. Finland also does almost everything in the exact opposite direction of the Education Reform Movement in the United States, which makes it a particularly interesting case to study.

The second limitation concerns the ability of good teachers to actually close the achievement gap. Remember, the chief aim of the Education Reform Movement is to close the gap between rich and poor students. The only way to prove that this is possible through weeding out bad teachers is to show that rich and poor students taught by teachers of the same quality would have identical achievement results. Showing that students from identical socio-economic backgrounds perform differently under different teachers is not enough to actually show that education reform can achieve its end goal.

This might seem like a tedious point, but it is an important one. The big battle about education reform has been about the actual causes of the achievement gap. Is it problems with schools and teachers or problems with non-school factors associated with poverty and inequality? The answer is probably that both factor in to some extent, but the relative weight of each of them matters if we want to construct an actual solution. This study does not provide us any insight on that front. It merely informs us that different teachers can have different affects on students without shedding light on how much those affects actually close the gap between rich and poor students. To be clear, none of this is to criticize the study. It is very useful and interesting, but only within its intentionally narrow scope.