What does the push for evidence-based policy actually look like inside institutions?
You can listen to this podcast on Spotify, Apple Podcasts, or wherever else you get your podcasts. You can also watch this conversation on YouTube, and we've also started a Substack for these episodes.
In 2022, Dean Karlan became Chief Economist at USAID, tasked with steering the world’s largest bilateral aid agency towards evidence-backed approaches. He left in 2025, as the agency was being dismantled, with he and his team having moved roughly $1.7 billion of funding towards evidence-based approaches in the process.
It is a number he describes as simultaneously big and small – a source of pride given his team started with no office, no staff, and a congressional approval process to navigate, but a fraction of the agency’s $50 billion annual budget at the time, most of which has now collapsed.
Dean, Professor of Economics and Finance at Northwestern University and founder of Innovations for Poverty Action, joined me on this episode of Ideas in Development to discuss what the push for evidence-based policy actually looks like inside institutions, the rise of embedded evidence labs in developing country governments, and whether there are questions in development economics on which we now have enough evidence.
How did they move $1.7 billion? And why not more?
The constraints on moving money towards evidence were partly institutional. Dean’s team had to build an office from scratch – a process that requires Congress not to veto it – and hiring within government was slow enough that the office raised philanthropic money to bring in staff quickly.
‘We spent a fair amount of time fundraising, we spent a lot of time working through the bureaucratic process.’
Without that outside support, it would have taken two years simply to hire a team. But the constraints were also about knowledge itself.
‘My first instinct in most situations is to start by fearing how little we actually know.’
There were areas where the evidence base was too thin to make strong claims, and his team was careful not to overstate what research could tell them. Equally, there were large areas – much of global health, for instance, including PEPFAR’s delivery of medicines – where the evidence already supported what USAID was doing. Those programmes were left alone, and deliberately excluded from the $1.7 billion figure, which only includes what they themselves moved.
Moving funding inside a $50 billion agency
Dean’s office used a two-pronged approach: an in-house consulting team working directly with USAID teams, and a slower effort to change institutional processes, regulations, and culture. The consulting arm vetted opportunities against three criteria:
- The size of the money at stake – with a small team, his office had to think about its own cost-effectiveness and leverage, which meant walking away from intellectually fascinating areas like carbon markets where USAID simply wasn’t spending that much money.
- The delta: was there a meaningful gap between what teams were doing and what the best evidence suggested? Where teams were already close to the evidence, or where evidence had little to say, his office stayed out.
- A willing and eager partner.
That last criterion was decisive, because in USAID’s structure the detailed decisions about how money got spent sat with foreign service officers and foreign nationals working in country offices around the world.
‘If they’re not bought in, then that was a recipe for disaster.’
An unenthusiastic partner had too many ways to quietly undo the work – taking advice at one stage of the process and discarding it at the next. This is why Dean still believes in a more collaborative approach over a more prescriptive one, even though it likely meant a smaller headline number. A directive approach was never actually on the table – but he argues it would also have been unhealthy, sacrificing the long game of building state capacity, both within USAID and in partner countries.
Taking goals as given – up to a point
Dean’s office took sector and country goals as given: those were decisions made by Congress, not by USAID, and his team’s job was to help achieve them more effectively. That stance is also why he was initially prepared to stay on into a Republican administration – the wonky policy people on both sides of the aisle supported hard-nosed measurement and walking away from things that don’t work.
But economics could still speak to whether a goal was doing what its advocates thought. He points to arguments for pressuring credit rating agencies to score African countries more favourably, to bring down their cost of borrowing. ‘As a matter of economic theory, that’s flawed,’ Dean argues. If political pressure distorts ratings, markets will simply discount them, worsening the adverse selection problem for investors and ultimately making borrowing harder for high-risk countries. The better path is the unglamorous one: helping ministries of finance manage debt, improve forecasting, and collect more revenue – work his team was directly engaged in.
The logic of embedded evidence labs
‘Rather than saying, what’s the best way we can use our money in the provision of direct services, instead ask: what’s the best way we can use our money to help influence and improve the way local government uses their money?’
For all USAID’s scale, its budget in any given country was tiny relative to that country’s own budget. The leverage question follows naturally from this fact: rather than asking how to spend aid money best on direct services, ask how aid money can improve the way local governments spend their own.
‘That knock on effect creates leverage, creates much bigger impact if successful.’
Though Dean is candid that it is a riskier proposition, as influence one step removed is less likely to actually change decisions.
This is the logic behind embedded evidence labs. The labs are not cookie-cutter, but typically combine three components: helping governments review and adapt the global evidence base for their context; building the data infrastructure to manage programmes at scale; and deliberate, donor-funded innovation, where outside money pays for large-scale pilots so that ministries aren’t forced to cannibalise existing programmes to test new ideas.
He offers two examples, both from education. In Rwanda, a lab grew out of a randomised trial on teacher compensation, and its first task was building the national data infrastructure – attendance, test scores – that lets district heads manage schools and that underpins further testing, with the resulting policies now reaching all of Rwanda. In Peru, the focus was innovation: additional funding supported 14 randomised trials co-designed by researchers and the Ministry of Education, from which three successful ideas emerged and are now being scaled nationally.
The synthesis gap and the implementation gap
Dean’s forthcoming piece in the Journal of Economic Perspectives argues the academic-policy gap is smaller than often claimed but larger than it should be. Smaller, because in many cases he saw first-hand at USAID, research did speak fairly well to the goal at hand – it simply wasn’t being used. Larger, because nothing existed to bridge the last mile – in USAID’s case, there was no document a foreign service officer could pick up that synthesised the literature and translated it into what belongs in a request for proposals.
Dean sees the gap in two main parts. The first is synthesis. Economics, he argues, does not reward meta-analysis enough, and the discipline’s quantitative instincts desert it when it comes to aggregating evidence
‘We are a very quantitative oriented discipline, and yet when it comes to synthesis we shudder at the idea of being too quantitative — instead we want to take our 17 papers... put them up on the wall and then stare at it until we see a pattern.’
The result is that policymakers, donors, and academics alike put too much weight on individual studies – the one they funded, the one they wrote, the one in a top five journal.
The second is implementation. Papers are written to answer theoretical questions, not to equip an implementer to act on the findings. Dean thinks that papers could carry far more useful insight, even if it’s in the appendix. Things like the design decisions behind a programme, the pilots that failed and why, the details of the cash transfer amounts and savings components that get a few passing sentences in the main text. For a policymaker who has just read about impressive treatment effects, the immediate next question is how the thing was actually done.
When do we have enough evidence?
With 117 randomised trials comparing cash to no cash, Dean is comfortable saying we don’t need another study of whether cash transfers work for households. But ‘everything’s relative’ – he is himself running new cash trials, because how to deliver cash remains wide open, and the macro question also looms large: what happens to markets and multipliers when programmes hit a whole country rather than a district?
Even in medicine, where a drug’s efficacy is settled, the question of how to get pills into mouths and shots into arms is not. Asked where he would push the field if he were its general manager, Dean returns to his two gaps: more synthesis, supported by more systematic measurement across studies, and more research on the inner workings of organisations and programmes.