/

Article

AI Usage Is Easy to Measure. AI Impact Is Not.

A technical look at the data model behind delivery impact, quality tax, cost visibility, and real AI leverage.

The problem with measuring AI adoption

Engineering organizations have made this mistake before. When something is hard to measure directly, we tend to measure what is visible. For years, that meant counting pull requests, tickets, commits, story points, deployments, or review comments. Those signals were not useless. In many cases, they were the best visibility we had into systems that were otherwise opaque. The problem started when volume became a proxy for value.

More pull requests did not always mean more impact. More closed tickets did not always mean better outcomes. More activity did not always mean a healthier engineering system. Without context, complexity, and alignment, activity can easily be misread as contribution. We learned, sometimes the hard way, that engineering metrics need interpretation. Volume can be useful, but only when it is connected to the type of work being done, the complexity of that work, the quality of the outcome, and the direction of the business.

AI creates the same risk, only faster. The easiest things to measure are again the most visible ones: licenses, active users, prompts, tokens, generated code, accepted suggestions, or agents running in the workflow. These are adoption signals, and they matter. They tell us whether people are using the tools, where experimentation is happening, and where there may be friction. But they do not prove productivity, quality, or business impact.

A team can burn a lot of tokens and create very little leverage. Another team can use AI less frequently but remove a real bottleneck, improve test coverage, reduce repetitive work, or make better decisions earlier. The signal is not the usage itself. The signal is what changes in the engineering system because of that usage.

That is why measuring AI adoption is not enough. Adoption tells us that AI is present in the system. It does not tell us whether the system is getting better. The real question is not: “Are people using AI?” The real question is: “Is AI making our engineering organization more effective?”

The real question: did AI improve the engineering system?

Once we stop treating adoption as the final answer, the real question becomes more interesting: is AI making the engineering system better? Not just busier, not just more automated, and not just more AI-enabled on paper, but genuinely more effective in the way work moves from intent to customer impact.

That means looking beyond usage and asking whether AI is improving the core operating dimensions of engineering. Is work moving faster through the system? Are teams reducing waiting time, handoffs, and repetitive effort? Are engineers spending more time on complex, valuable problems and less time on low-leverage work? Are we seeing better alignment between the work being produced and the priorities that matter for the business?

Speed is only one part of the question. The more important question is whether speed comes with a positive or negative tax. If AI helps a team ship faster but increases review churn, defects, rework, fragmented pull requests, or post-release issues, the system may not be improving. It may simply be moving effort from one part of the workflow to another. In that case, the productivity gain is not real; it is deferred cost.

The same applies to collaboration, decision-making, and team dynamics. AI may help engineers explore options faster, write better tests, summarize context, or remove friction from repetitive tasks. But it may also create noise, reduce shared understanding, or increase the validation burden on senior engineers. And this will not affect every cohort in the same way. Senior engineers, junior engineers, managers, product engineers, infrastructure teams, and support-oriented workflows may experience very different levels of leverage.

Finally, there is the cost question. AI is not free. Licenses, tokens, agents, infrastructure, review effort, and operational complexity all add cost to the system. The goal is not to minimize AI spend in isolation, but to understand the net contribution: what value did AI create compared with the extra cost, complexity, and risk it introduced?

So the real question is not simply whether people are using AI. The real question is whether AI is improving the operating model of engineering: delivery, quality, collaboration, alignment, efficiency, and leverage across the organization.

Why AI impact measurement is technically hard

Measuring AI impact is hard because measuring engineering impact was already hard. Engineering work is not a simple production line where every unit has the same value, complexity, or risk. A small pull request can unblock a major customer. A large one can add little value. A ticket can be closed without solving the real problem. AI does not remove that complexity; it adds another layer on top of it.

The first challenge is capturing the AI activity itself. Different providers expose different levels of information: who used the tool, how many input and output tokens were consumed, what the request cost, which model was used, or what kind of event was generated. Some providers are starting to expose richer telemetry, including OpenTelemetry-based events, which creates a better foundation for measurement. But collecting AI events is only the beginning.

The next challenge is attribution. An AI event needs to be connected to the right person and to the right organizational context: team, role, discipline, seniority, project, or workflow. This is similar to the challenge engineering organizations already face when integrating Git, ticketing, documentation, messaging, CI/CD, and other systems. Each tool has its own identity model, and none of them were designed to make cross-system measurement easy.

But member attribution is not enough. The harder problem is work attribution. If an engineer has a conversation with an AI coding agent, uses Copilot, works with Cursor, or sends prompts to Claude Code or Codex, what piece of engineering work did that activity contribute to? Was it related to a pull request, a ticket, a design document, a production issue, a test suite, or a piece of exploratory analysis? Without that connection, we can measure usage, but we cannot reliably connect it to delivery, quality, complexity, or strategic alignment.

Even then, correlation is not contribution. A team using AI more often and shipping faster does not automatically mean AI caused the improvement. The team may have become more senior, the scope may have changed, the work may have become simpler, or the organization may have removed a bottleneck elsewhere. Good AI measurement needs context.

This is also why individual-level conclusions are dangerous. AI impact should not become a simplistic ranking of who uses more tools or who produces more AI-assisted output. Used badly, these metrics can create surveillance, gaming, and false confidence. Used well, they help leaders understand where AI is creating leverage, where it is creating cost, and where the system needs better support.

From AI events to engineering outcomes

Once AI activity is captured, the next challenge is turning those events into something the engineering organization can actually learn from. An AI event by itself has limited meaning. It may tell us who used a tool, which model was involved, how many tokens were consumed, how much it cost, or whether a suggestion was accepted. That is useful metadata, but it is still disconnected from the work.

The measurement model needs to connect that activity with the real artifacts of the engineering system. Today, most of that connection happens around code. But the same logic will increasingly apply to other technical artifacts such as documents, tickets, architecture decisions, or any other knowledge work created or influenced with AI.

This is where the signal becomes more valuable. If we can connect AI activity to a pull request, we can start comparing it with the complexity of the change, the review pattern, the amount of rework, the time to merge, and the quality signals after release. If we can connect AI activity to a document, we can start understanding whether it supported alignment, discovery, decision-making, or execution. The value is not in the AI event alone, but in the relationship between the event and the work it helped produce.

The technical chain is conceptually simple, but difficult to execute well: capture AI activity, resolve identity, connect people to teams and workflows, link events to work artifacts, classify the type and complexity of the work, and compare the result against delivery, quality, cost, and alignment signals.

That is the bridge from AI usage to AI impact. Without it, we are mostly measuring activity. With it, we can start understanding where AI is creating leverage, where it is adding cost, and where the engineering system is actually improving.

The KPI families that actually matter

Once the measurement model exists, the relevant KPIs are not isolated numbers. They are different lenses on the same question: is AI making the engineering system more effective?

The first lens is delivery impact. Is AI helping work move faster through the system, or are we just creating more activity? The goal is not to produce more code. The goal is to reduce friction between intent and customer impact: shorter cycle times, fewer unnecessary handoffs, better flow, and more valuable work reaching production.

The second lens is quality tax. Speed only matters if the system is not paying for it later. If AI increases review churn, rework, defects, fragmented pull requests, or post-release issues, the apparent gain may be a deferred cost. A good measurement model should help leaders understand whether AI is improving quality, preserving it, or creating hidden operational debt.

The third lens is AI efficiency. Not every AI interaction creates the same value. Some usage patterns may create strong leverage in testing, refactoring, discovery, documentation, support analysis, or repetitive implementation work. Others may create noise. The interesting question is not who uses AI the most, but which workflows produce better outcomes when AI is involved.

The fourth lens is cost visibility. AI introduces new costs into the engineering system: licenses, tokens, agents, infrastructure, review effort, and operational complexity. The objective is not to minimize spend in isolation. The objective is to understand the relationship between cost and value. Expensive AI usage can be justified if it unlocks meaningful leverage. Cheap usage can still be wasteful if it creates no real improvement.

The fifth lens is value distribution. AI will not affect every team, discipline, seniority level, or type of work in the same way. It may create strong leverage for one workflow and very little for another. It may help senior engineers move faster while increasing review burden elsewhere. Or it may help less experienced engineers access context and produce better work, if the right support system exists around them.

Together, these KPI families move the conversation from adoption to impact. They help leaders understand where AI is accelerating delivery, where it is protecting or damaging quality, where it is economically sensible, and where the organization needs to invest in better practices, training, tooling, or workflow design.


AI measurement is an engineering capability

The companies that win with AI will not simply be the companies with the highest usage. They will be the companies that understand where AI is actually improving the system: where it accelerates delivery, where it protects quality, where it creates leverage, where it adds cost, and where it changes the way teams collaborate.

That requires more than a dashboard. It requires an engineering measurement capability: the ability to connect activity with context, work artifacts, outcomes, cost, and organizational structure. Without that context, AI metrics can quickly become another version of the same old vanity metrics. Used badly, they create surveillance, gaming, and false confidence. Used well, they help leaders ask better questions.

This is the challenge we are solving at Pensero.

As AI becomes embedded in software development, engineering leaders need more than adoption metrics. Knowing how many tokens were consumed, how often a tool was used, or how many suggestions were accepted does not explain whether the engineering system is actually improving.

The organizations that will benefit most from AI will be the ones that can connect AI activity to real outcomes: faster delivery, better quality, stronger alignment, greater efficiency, and clearer business impact.

That requires a new level of visibility into how work happens across the engineering organization, and where AI is creating genuine leverage.

AI is changing how software is built. The next challenge is understanding its impact.

That is the shift: from measuring AI usage to understanding AI impact and ROI.

If you are trying to understand engineering execution, technical risk, or team capability beyond static reports and interviews, this is the problem we are solving at Pensero.

And if this space resonates with you, we’re also hiring: https://pensero.ai/careers

Get months of engineering performance data now

Stop deciding on gut feel. Get 90 days of objective data in minutes.

Get months of engineering performance data now

Stop deciding on gut feel. Get 90 days of objective data in minutes.

Get months of engineering performance data now

Stop deciding on gut feel. Get 90 days of objective data in minutes.