Here's a number worth sitting with. In a randomized controlled trial of experienced open-source developers, AI coding tools made them 19% slower. The same developers, asked afterwards, said the tools made them 20% faster.
A 39-point gap between what people felt and what the stopwatch showed.
I run engineering teams. I've shipped a lot of software. And when I saw the METR study, my first reaction wasn't "AI is bad." It was "we don't know what we're measuring."

The study you should read before your next AI procurement meeting
METR ran a proper randomized trial with 16 experienced developers working on their own open-source projects. Big projects... 22,000-plus stars, more than a million lines of code, ten years old on average. They had 246 real issues to work through. Half got AI tools (Cursor Pro with Claude 3.5 / 3.7 Sonnet at the time). Half went without.
Going in, the developers expected a 24% speedup. Coming out, they still believed AI gave them a 20% lift in output.
The data showed they took 19% longer.
The METR researchers were honest about what this means. The result is a snapshot of early-2025 AI on a specific workload: experienced engineers, mature codebases, high quality standards. It does not say AI is useless. It says the productivity story we keep telling each other is not the productivity story the clock is telling.
Why the gap exists
The follow-up analysis from DX walked through five factors. The shortlist is worth chewing on:
- Overconfidence in the tool. Industry hype primed developers to keep reaching for AI even when it was costing them time.
- Expert tax. AI helped least when the developer was a deep expert in the codebase. The tool behaved like a confident but junior contributor. Familiar territory plus junior input equals review overhead.
- Codebase complexity. Old, large repos with their own idioms eat AI suggestions for breakfast. The tool does not know your conventions.
- Low acceptance rates. Devs accepted under 44% of AI suggestions. The rest got reviewed, weighed, rejected. None of those steps come for free.
- Missing tacit knowledge. The AI suggested things which read fine but missed the unspoken rules. Like a new contributor who hasn't been on the team yet.
None of those are reasons to throw the tools out. They are reasons to be careful about where you point them.
A bigger benchmark, a more complicated picture
The METR study had 16 developers. The Opsera 2026 benchmark looked at 250,000-plus developers across 60-plus enterprises. Different scale, different result, more nuance.
Their numbers: - AI reduces time-to-PR by up to 58% - AI-generated PRs wait 4.6x longer in review - AI-generated PRs introduce 15-18% more security vulnerabilities - Senior engineers capture nearly 5x the productivity gains of juniors - 21% of paid AI coding licenses go unused
Put those together. AI gets code out of the developer's fingers faster. Then it sits in review. Then it ships with more vulnerabilities. Then the senior who already knew what they were doing pulls further ahead of the junior who needed help most.
If you measure "time to first PR," AI looks like a win. If you measure "time to merged, secure, maintainable code," the picture flips.

The leadership problem hiding in the data
I've watched a lot of engineering leaders make the same mistake with AI they made with agile, then with microservices, then with cloud. They buy the tool. They count the licenses. They report adoption rates in the next board deck.
Adoption is not impact.
Twenty-one percent of those AI coding licenses sit unused. Most of the rest get pointed at tasks where they slow people down. The seniors who needed the help least benefit the most because they already know when to override the AI and when to trust it. The juniors who needed the help most get suggestions they are unable to evaluate, which they then either accept blindly (bad) or reject and write themselves anyway (also slow).
You are not buying productivity. You are buying a tool with wildly uneven returns depending on who holds it and where they point it.
What I'd do tomorrow morning
If I ran your engineering org, I'd do five things this week.
1. Stop measuring AI adoption. Start measuring outcomes.
License count and prompt volume tell you nothing. Track cycle time end-to-end... commit to merged, merged to deployed, deployed to incident-free for thirty days. Compare before-AI and after-AI on the same teams.
2. Pair AI with the right task type.
The METR data is clear on this. Experienced developers on familiar code lose time with AI. Use the tool where the dev does not already know the answer... unfamiliar language, exploratory prototyping, boilerplate, test scaffolding. Stop using it as a default for "produce code please."
3. Pay for the review tax.
If your AI-generated PRs sit 4.6x longer in review, your review process is the bottleneck, not your code generation. Invest in static analysis, security scanning, and reviewer training before you invest in more AI seats.
4. Treat the perception gap as a signal.
If your team tells you AI made them 20% faster, ask them to show you the data. If the data is "it felt faster," push back. Feeling fast is not the same thing as shipping faster.
5. Help the juniors before the seniors.
The 5x gap in benefits between senior and junior engineers is a culture problem. Senior engineers know when to ignore AI. Juniors do not yet. Pair them. Review their AI-assisted code together. Build the judgment they need to use these tools well. This is leadership work, not tool work.

The thing nobody wants to say out loud
The METR study is a year old now. The tools have improved. Cursor in 2026 is not Cursor in 2025. Claude in 2026 is not Claude in 2025. Some of those slowdown effects have softened.
But the perception gap has not.
Developers still feel faster with AI than they are. Engineering leaders still buy tools based on vibe and headlines instead of cycle time data. Boards still want to see "AI strategy" in next quarter's slides, and they will get one whether or not it makes the team better.
If you want a real edge in 2026, do not chase the next model release. Build the discipline to measure what you ship and how long it takes. The teams who know their own numbers are the teams who will know whether AI is helping them or not.
Everyone else is operating on feel. And feel, as the METR study showed, is off by 39 points.
Where to go from here
If you lead an engineering team, the METR paper and the Opsera benchmark are both worth your time this week. Read them with your tech leads. Then ask one question: do we have data, or do we have a story?
If you only have a story, you do not have a strategy.