Skip to Content, Navigation, or Footer.
Support independent student journalism. Support independent student journalism. Support independent student journalism.
The Dartmouth
April 27, 2024 | Latest Issue
The Dartmouth

Professors develop metrics for teacher performance

Standardized test scores and student surveys successfully evaluate teacher effectiveness and identify the best educators, according to a study conducted by economics department chair Douglas Staiger and Harvard University education and economics professor Thomas Kane. The $50 million study looked at 1,600 teacher volunteers and six of the country's largest school districts to determine the most valuable evaluation methods.

Sponsored by the Bill and Melinda Gates Foundation, the study focused on test score gains called value added evaluations, which calculate teacher effectiveness by determining how well students perform on standardized tests. After identifying differences among teachers, the researchers randomized them to different classrooms to see if their students performed according to their predictions.

Staiger said the study confirmed that students of the best teachers would perform well on tests and be happier in class. The increase in students' scores when they were matched with better teachers paralleled what researchers had predicted.

"There have been lots of questions about measures of teacher effectiveness and whether we can successfully evaluate teachers," Staiger said. "This project was about trying to broaden that conversation."

Despite past criticism of value added evaluations, the study found that students' test scores are in fact useful in identifying the best teachers. This result has had enormous impact as it gives credibility to this type of teacher assessment, Staiger said.

Most classrooms currently receive "fly-by evaluations," in which principals look in on classrooms and tell teachers they are doing a good job.

"Right now, the evaluation system that is in place is pretty broken," he said. "Typically a teacher would never be seriously evaluated ever in their entire career, and certainly not after getting tenure."

In many of the districts the study considered, "fly-by" evaluations are now a thing of the past, and more rigorous methods are starting to be implemented. Many states are now passing laws that mandate such standards.

The study, part of a series of research papers surrounding education reform, was timed to help schools adapt to President Barack Obama's Race to the Top initiative.

As of this January, 42 states are in the process of revamping their teacher evaluation systems. Their pace, which Kane said was abnormally speedy compared to typical social science papers, was set by the pace of the districts and states as they moved toward change. Before the publication of the professors' work, using student surveys to assess teachers was not a mainstream idea, though universities use this method regularly.

"Now there are lots of districts using this approach with student surveys and videotaping because there is strong evidence that they're useful," Staiger said.

One of the biggest issues surrounding teacher evaluations based on test scores and student surveying is whether they measure the performance of teachers, students or schools. The study used randomization to ensure that only teacher performance was measured.

Student performance on standardized tests and student attitudes toward teachers were measured from 2009 to 2010 to establish a baseline for teacher performance. Then, from 2010 to 2011, students were randomly assigned to teachers in math and English classes, and at the end of the year the same metrics were applied to reevaluate the teachers.

"The real strength of randomization is that the kids don't differ systematically,so you can truly test whether or not it's the teacher," Staiger said.

The results for 2010-2011 fit researchers' predicted performances of each teacher. In other words, student performance was based on the abilities of the teacher.

The study was the first of its kind to receive support from teachers' unions because researchers collected significant administrative data and performed their own conceptually challenging tests on students, rather than just examining students' test scores, Staiger said.

Researchers also surveyed students on teacher perceptions and recorded over 12,000 hours of videotape of their classes, which were scored according to multiple criteria.

Researchers focused on data related more broadly to the school districts, randomizing the teachers to different classrooms in the second year of the study in order to assess teacher effectiveness in a less classroom-specific way, Staiger said.

The study featured teachers from New York, Charlotte, N.C., Tampa, Fla., Memphis, Tenn., Dallas and Denver, who taught English and math to students grades four through nine.

Staiger helped design the study based on his experience with a smaller-scale version in Los Angeles.

Staiger analyzed students at the end of the previous year, on the basis of factors including as test scores, to predict how they would do at the end of the following year. He then compared this prediction to how well students actually performed and averaged this across all students.

"There are issues that arise with that measure and the videotape score, such as how do you put those together?" Staiger said. "Both are best at predicting how that teacher will do in a different classroom, so we use those methods to predict which teachers will do better in terms of impact on student's test scores."

After conducting the study, Staiger and Kane attended a teacher conference in Arizona and met with individual districts to discuss implementation methods. Over 80 school districts were represented at the conference, which Staiger said indicated the impact the study will have on how teachers are evaluated in schools.

Much of the negative publicity surrounding value added evaluations focuses on skepticism and distortion of teacher behavior, Staiger said. The study added to the evidence that these measures are valid, and Staiger hopes it will also change how schools are managed.

Kane said these measures are particularly useful in enhancing the quality of instruction in schools. They can be used to help districts make decisions about granting tenure, as well as pointing to areas where tenured teachers still need to improve, he said. While these measures might also be useful in determining a merit pay structure for teachers, he cautioned that researchers are still not sure that a merit pay scheme is truly beneficial for retaining good teachers.

The data compiled by the professors will be stored in archives at the University of Michigan, where they will be available to all researchers.

"We hope this data will allow for the development of new approaches to classroom observation," Kane said.