Reading, writing and revisiting our findings
At first glance, we might think of preschool programs as limited in scope. Head Start, the federally funded program for children from low-income households, for example, provides a year of preschool programming to a 4-year-old child and perhaps some services to that child’s parents. But among government programs for families, it is one of the most ambitious, and given the program’s scale — more than 800,000 children served in 2022 at a cost of over $10.5 billion — it is hardly trivial. What, then, should we make of the fact that state-of-the-art research on preschool programs, including Head Start, produces mixed findings that often vary across contexts?
Megan Stevenson argues that both this state-of-the-art research — notably the randomized controlled trial — and the policies and programs it examines are largely exercises in futility. The hierarchy of social science methods, she argues, has led to an exaggerated focus on narrow questions and incremental interventions, which constrains knowledge production and our ability to generate systemic, meaningful change from what we are learning. Criticisms of a dedicated, almost exclusive, focus on randomized controlled trials (RCTs) are not new (see, for example, Deaton and Cartwright (2018) and Ruhm (2019)), but notably, there are often ways to address many of these criticisms — through the design of RCTs, replication, careful interpretation of results and the integration of rigorous, quasi-experimental approaches in the evidence base. Unfortunately, many interpret challenges in conducting RCTs or their limitations as fatal, rather than as a call to build on the good foundation that RCTs provide in many areas of social science.
What we know about preschool effects
The literature on investments in early childhood care and education (ECE), which Stevenson acknowledges as stronger than most, provides an illustrative example of how this can work in practice. The process of knowledge generation may, in fact, be incremental — I suggest responsible — but that does not imply that we fail to learn from that process, and that we cannot continue to learn in ways that inform broader societal change. A recent MIT Blueprint Labs review, authored by Jesse Bruhn and Emily Emick, summarizes evidence on preschool effectiveness from lottery-based studies, capitalizing on a randomized research design. Included in the meta-analysis are studies of comprehensive, “model” preschool programs in the 1960s and 1970s, Head Start and modern-day public pre-kindergarten (pre-K) programs.
We learn from this evidence base that ECE programs can have important effects on children’s later-life well-being, in the form of increased educational attainment as with Boston public pre-K, better health in the Carolina Abecedarian Project, and lower criminal engagement in the Ypsilanti, Michigan-based Perry Preschool Project. We also learn that programs do not consistently improve students’ academic performance in the elementary grades, as in the cases of Tennessee pre-K and the nationally representative Head Start Impact Study (HSIS). And one of the crucial lessons in surveying this body of research is that we can actually learn a lot from mixed findings and variation in effectiveness.
We should anticipate program effectiveness to vary for programs with different features, in different contexts, for children of different backgrounds. While advocates for particular policy goals may be uncomfortable with variation in findings, these patterns can actually be quite helpful in policymaking.
The HSIS, the randomized study of Head Start effectiveness, is instructive. The program, which has existed since 1965, is arguably our only ECE program operating “at scale” in the U.S. The HSIS findings — initial skill advantages, at the end of the Head Start year and kindergarten entry, that faded by first and third grades — generated damning responses:
Andrew Coulson: “Head Start, the most sacrosanct federal education program, doesn’t work.”
Joe Klein: “Head Start simply does not work.”
Russ Whitehurst: “Head Start does not improve the school readiness of children from low-income families … Head Start has no long term impact.”
Updating our understanding of the Head Start research
These reactions were premature and limited. As our social science methods have improved, reanalyses of HSIS data suggest a more nuanced picture, and one that is informative for policy. Authors have documented considerable variation in impact across Head Start centers. Moreover, many children who did not get Head Start participated in other forms of ECE, and effects were more pronounced among children who would otherwise be in parental or relative care. Children with low skills at program entry experienced the greatest benefit, and effects for Spanish speakers persisted. Other important pathways to potential longer-term impact on children’s outcomes are durable effects on parenting practices and effects on mothers’ employment and earnings. Stevenson describes these as “mixed and equivocal” results — but that, in fact, makes them useful. We should anticipate program effectiveness to vary for programs with different features, in different contexts, for children of different backgrounds. While advocates for particular policy goals may be uncomfortable with variation in findings, these patterns can actually be quite helpful in policymaking.
Notably, a rich, quasi-experimental evidence base on the long-term effects of Head Start augments what we have learned from the HSIS, demonstrating how causal methods for assessing program impact can complement one another. These studies use discontinuities in program eligibility, grant-writing assistance and geographic variation in program introduction — leveraging multiple survey and administrative datasets — and find effects on adolescent health and behavior, educational attainment and economic self-sufficiency. We do not (yet) have data on long-term effects for children in the HSIS, but that would fill an important gap in the Head Start literature.
In many areas of social policy, the RCT literature is relatively young. If the only question is whether a given intervention “works” and “replicates,” topline results may appear disappointing.
Building on the preschool evidence base
This body of work — answering the questions “Does preschool work?” or “Does Head Start work?” — has also served as the foundation for experimental studies seeking to answer the question “Why?” The MIT Blueprint Labs meta-analysis includes randomized studies of curriculum, class structure, professional development efforts and dual-language programs. Innovative new research explores how to complement ECE experiences with parent engagement in children’s skill development, including a parent academy and intervention intended to boost cognitive skills, behavioral nudges to increase preschool attendance and incentive payments to reduce turnover in the ECE workforce, all using randomized designs to understand impact.
In many areas of social policy, the RCT literature is relatively young. If the only question is whether a given intervention “works” and “replicates,” topline results may appear disappointing. But those same studies can also help us generate useful hypotheses about what works, for whom, in which contexts. Many of those hypotheses can also be tested in rigorous RCT designs so that we can answer important questions that ultimately shape people’s lives. The research on early childhood has done just that: It established, through experimental studies, that investments in high-quality child care and preschool can change children’s trajectories, then built on this “proof of concept” to further test programs at scale and examine components of programs to understand key ingredients to success. While this work is ongoing, it has been critical to garnering policymaker and practitioner attention, and has spurred real changes to the investments that many children in the United States experience in their early years.
A research path forward
RCTs are resource-intensive, not only in terms of finances, but in time and partnership cultivation. We should confront directly the challenges to executing randomized studies well, with attention to the details of sufficiently large samples, informative treatment contrast, tests of mechanisms and efforts to scale. And, we, as researchers, funders and policymakers, should be willing to replicate to learn from different contexts. This construction of an evidence base — though perhaps incremental and painstaking and at times frustrating — serves as the foundation for understanding successful diffusion and scale-up. We owe it to the people affected by social programs and policies, to those working hard to serve them and to the taxpayers to understand program and policy impact and to intentionally direct resources and effort to those investments with the most promise.