The AI nobody has explained to us how to use
The missing ingredient in AI success: measurement and feedback loops.
There's a quiet contradiction at the center of AI adoption right now. Everyone agrees it's transformative, everyone is investing, and everyone expects results. And yet, very few people can clearly explain what "using AI well" actually looks like inside an engineering organization.
We've introduced a disruptive layer of capability into teams without establishing the operating model that makes it valuable, and in the absence of that model, most companies are defaulting to experimentation without direction, or worse, adoption without understanding.
A technology has no playbook
Most transformative technologies come with a learning curve, but AI is different because the expectation is immediate proficiency. Engineers are given access to tools like GitHub Copilot, Cursor, Claude, or ChatGPT and implicitly expected to "figure it out" without a shared standard for what good looks like. There are no consistent workflows, no clear benchmarks, and no common language to describe effective usage.
As a result, usage increases and output appears to accelerate, but the system behind it remains undefined and inconsistent across teams.
The benchmark changed faster than most teams
If you want to understand what AI is actually doing to engineering, you have to look at the data, not the narrative. That's exactly what we do in Pensero, where we have data from thousands of engineers and orgs from around the globe. Over the last six months, engineering delivery across thousands of engineers increased by 34%. That alone would already be one of the most significant shifts the industry has seen in years. But the more important number is not the average. The top 5% of teams improved by more than 50%, and the gap between elite and average performance widened to nearly 6x.
This is where the story becomes interesting: AI is not lifting everyone equally, it is stretching the curve. The benchmark is rising, but it is rising faster at the top than in the middle. What used to be considered strong performance six months ago is now below average.
What we actually see in the data
When you observe how AI is used in real workflows, a clear pattern emerges. AI is not a uniform productivity layer; it is a multiplier of existing behaviors.
Teams that already operate with strong fundamentals, that have a clear problem definition and a tight collaboration with product, with fast feedback loops, are able to integrate AI into their workflows and turn it into sustained performance gains. These are the teams driving the top 5% curve, and their improvements are not temporary spikes.
At the same time, other teams are also adopting AI, but the outcomes are different. They generate more output, but also more rework. They move faster initially, but without the same level of control or consistency. The result is progress, but not compounding progress. This is why the average is improving while the gap continues to widen. The same tools, used across the same industry, are producing outcomes that differ by multiples, not percentages.
The illusion of productivity
One of the most common mistakes right now is confusing activity with impact. AI makes it easier to produce output, which naturally leads to more pull requests, more code, and faster initial delivery. But this does not necessarily translate into meaningful outcomes. In many cases, the increase in speed is offset by an increase in rework, inconsistencies, and downstream corrections.
The benchmark makes this visible in a way that anecdotes cannot. If AI alone was the driver, you would expect the gap between teams to narrow as tools become widely available. Instead, the opposite is happening. The gap is widening, which means that how AI is used matters more than whether it is used at all.
What "good" actually looks like
The teams that are extracting real value from AI are not the ones using it the most, but the ones using it with intent. They treat AI as part of the workflow, not as an isolated tool. It is embedded in how features are defined, how code is reviewed, and how decisions are made. They prioritize clarity before generation, understanding that the quality of the input determines the usefulness of the output.
They also measure outcomes, not activity. They track delivery per engineer, cycle time, defect rates, and rework, and they use these signals to understand where AI is actually helping and where it is introducing risk. Over time, this creates a feedback loop where each improvement compounds on the next. This is what allows them to move from incremental gains to structural advantage.
The missing layer: measurement
Right now, most organizations still lack the visibility required to understand the real impact of AI. They can tell you how many licenses they have purchased and sometimes how frequently tools are used, but they cannot explain what those tools are actually producing. Without connecting AI usage to delivery, quality, and cost, adoption becomes an act of faith rather than a disciplined investment.
The benchmark highlights how dangerous that is. If the industry baseline has moved 34% in six months and you cannot measure your own position relative to it, you are operating without context. And if the top performers are moving 50% faster while you are unsure whether you are improving at all, the gap will not stay static. It will compound.
Why this matters now
We are entering a phase where AI is no longer optional, and the expectations around it are changing quickly. Leaders are no longer being asked whether they are adopting AI, but whether it is delivering results. Boards and executives are starting to assume that productivity gains should be visible, measurable, and tied to business outcomes.
This changes the nature of the conversation. AI is no longer a bet. It is an investment that needs to be justified. And without a clear understanding of how it is being used and what it is producing, that justification becomes increasingly difficult.


