Statistics can only tell us so much.
If, as F. Scott Fitzgerald famously said, the test of a first-rate intelligence is the ability to hold two opposing views in mind at the same time, then Megan Stevenson’s article calls for some first-rate thinking. We should react to it with two kinds of humility, each pulling against the other. We need to be humble about the power of social programs to alter behavior. But we should also be humble about the power of hard data to test social programs.
Stevenson gathers randomized controlled trials (RCTs) that test criminal justice interventions, everything from hot-spot policing and body-worn cameras to drug courts and counseling. She finds few that had lasting impacts, and even those few failed to replicate in other places. If, as social scientists generally agree, RCTs are the best way to distinguish causality from correlation, then the best evidence seems to imply, to a first approximation, that nothing works — at least in the sphere of criminal justice. Not stopping there, Stevenson suggests that the same is true of interventions in other social fields, such as education, health policy, and labor. Finally, in case that wasn’t provocative enough, she argues that the failure to find positive results is well known to researchers, “but, like a dirty secret, it almost never gets seriously acknowledged or discussed.”
Stevenson’s claim that rigorous measurement doesn’t show that interventions work is plausible. But is that because the interventions are flawed, or because the measurement is flawed? The answer is: yes!
“You have a problem, we’ve got a program” was Washington’s motto in the 1960s and 1970s. But a lot of programs failed outright or ameliorated some problems while exacerbating others. Federal farm subsidies, for example, succeeded at maintaining farmers’ incomes but distorted prices and planting decisions, piled up surplus commodities in government warehouses, played havoc with global trade, and jacked up farmland and input prices. Meanwhile, a burgeoning industry of scholars and consultants entered the field of program evaluation, bringing welcome rigor but not concomitant clarity. Did Head Start, possibly the most studied federal program, improve children’s life outcomes? Yes, said the research. Then — oops, sorry — no. But wait — maybe yes after all!
What have we learned? “Human beings are not nearly as plastic as people thought they were,” Charles Murray told me, when I asked what a lifetime in social science had taught. “The degree to which things are rooted in the way we are from the get-go makes it hard to move people very far off that spot.” Murray, the Hayek emeritus scholar at the American Enterprise Institute and active as a social science researcher since the 1960s, is controversial, but his view is supported by Stevenson’s assessment of the gold-standard evidence.
There is no substitute for walking around and talking to people, exploring a community, and sniffing the air.
So does nothing work? Not so fast. The beauty of RCTs is that they gauge causality by narrowing their scope to exclude extraneous variables and measure a handful of specified outcomes. Precisely because they are so narrowly delimited, they can miss unexpected effects and subtle interactions. They also fuel the assumption that what isn’t measurable isn’t important. “I think we have overemphasized the quantitative side because there’s the veneer of hard data applied to it,” Brian Klaas, the author of “Fluke: Chance, Chaos, and Why Everything We Do Matters,” told the podcaster Michael Shermer recently. “And it means that certain kinds of information that are fallible are prioritized over other kinds of information which are also fallible.” In other words, relying on data too much can be just as deceptive as relying on it too little.
I’m a journalist, not a social scientist, so I would say this, but it’s true: There is no substitute for walking around and talking to people, exploring a community, and sniffing the air. Qualitative methods can detect emerging trends, trace the intricacies of human interactions, and notice the small details that add up. In 2016, while statisticians and pollsters were busy declaring Hillary Clinton the almost-sure election winner, journalists and scholars who were driving around, chatting with voters, and counting yard signs knew something was up. Today, scholars like Arlie Russell Hochschild, Katherine J. Cramer and Andrew Cherlin demonstrate the continued relevance of the richly nuanced observational methods that established works by Edward Banfield, Elliott Liebow, Jonathan Rieder and James Q. Wilson as classics, despite having nary an R-squared among them.
“Use questionnaires and focus groups,” Klaas told Shermer. “Use information that’s gleaned from people who actually exist on the ground. Do qualitative research. Try to build theories around this.” It’s good advice. When I visited a Latino charter school in Philadelphia, I didn’t know what an RCT might show about the students’ test scores or graduation rates 10 years later, but I did know that the school was well organized, students were respectful and enthusiastic, and teachers stopped to engage them by name in the hallway. This told me something real that mattered. To be sure, RCTs have their place — and we should keep them in it.