95% AI projects do not work

The structural reasons AI pilots stall in organizations.

A recent MIT-backed report, referenced by Fortune, suggests that 95% of generative AI pilots in companies are failing to deliver meaningful results.

That number sounds extreme, but it isn’t surprising.

Most AI projects are not designed to succeed, they are designed to impress.

Why most pilots stall

Based on the previous report, the reason most AI projects stall is structural.

They are measured through proxies that are easy to collect but impossible to act on. Adoption goes up, usage and costs increases, and dashboards show progress, but none of it explains whether the system is actually improving.

So organizations find themselves in a false perception that the needle is moving: They are spending more, producing more output, and yet they cannot clearly explain what has improved or why.

At some point, that uncertainty becomes the limiting factor, not the technology.

The problem is not AI

AI works, that part is already proven and we all can agree about it. What is failing is how companies are deploying it.

Most organizations introduce AI as an additional layer on top of existing workflows. No matter what industry the company is, AI is being pushed to be used. Teams roll out copilots, agents, and automation tools, and very quickly they see an increase in output. More code gets written, more content gets produced, more activity across the system…

At that point, the narrative becomes simple: AI seems to be working.

But that conclusion is premature because output is not performance and activity is not impact.

Where cost makes it visible

This becomes even more critical as cost structures change.

AI introduces a new type of variability. What used to be relatively stable now fluctuates based on model usage, agents, and layers of automation. Costs increases in ways that are not always predictable, and they are often disconnected from how value is created.

This is where the system breaks.

Because you can increase spend and increase output at the same time, and still be less efficient overall.

The only way to understand what is happening is to connect cost directly to how work performs.

Not at a high level, but at the level where delivery, quality, and rework can be observed, and that is at project or even team level.

What working teams do differently

The teams that are getting value from AI are not doing more: they are working differently. We can say this confidently because in Pensero we see thousands of companies sharing their journey through AI disruption.

  • They start by adapting their ways of working so they can understand how work actually moves through their system.

  • They define what improvement looks like in terms of delivery speed, quality, and efficiency.

  • And they observe how those metrics change as AI is introduced.

They don’t assume that more output means better performance.

And when it doesn’t hold, they pivot. That is the difference between running an experiment and building a company.

What we’ve focused on

At Pensero, we’ve focused on connecting AI usage to the actual performance of engineering systems. Looking at how work is produced, how it moves, and what is ultimately delivered, and tying that directly to the cost of producing it, including the cost introduced by AI.

This is what allows teams to answer the questions that actually matter.

  • Did AI increase delivery?

  • Did quality improve or degrade?

  • Did rework increase?

  • Did cost scale responsibly?

Not as assumptions, but as observable outcomes.

Because the only real way to measure ROI in AI projects is to understand how cost and performance move together over time.

AI increases capacity, that is clear, what is not clear, in most cases, is whether that capacity translates into better performance. And until that becomes measurable, most AI projects will continue to look promising on the surface and fail underneath.

A recent MIT-backed report, referenced by Fortune, suggests that 95% of generative AI pilots in companies are failing to deliver meaningful results.

That number sounds extreme, but it isn’t surprising.

Most AI projects are not designed to succeed, they are designed to impress.

Why most pilots stall

Based on the previous report, the reason most AI projects stall is structural.

They are measured through proxies that are easy to collect but impossible to act on. Adoption goes up, usage and costs increases, and dashboards show progress, but none of it explains whether the system is actually improving.

So organizations find themselves in a false perception that the needle is moving: They are spending more, producing more output, and yet they cannot clearly explain what has improved or why.

At some point, that uncertainty becomes the limiting factor, not the technology.

The problem is not AI

AI works, that part is already proven and we all can agree about it. What is failing is how companies are deploying it.

Most organizations introduce AI as an additional layer on top of existing workflows. No matter what industry the company is, AI is being pushed to be used. Teams roll out copilots, agents, and automation tools, and very quickly they see an increase in output. More code gets written, more content gets produced, more activity across the system…

At that point, the narrative becomes simple: AI seems to be working.

But that conclusion is premature because output is not performance and activity is not impact.

Where cost makes it visible

This becomes even more critical as cost structures change.

AI introduces a new type of variability. What used to be relatively stable now fluctuates based on model usage, agents, and layers of automation. Costs increases in ways that are not always predictable, and they are often disconnected from how value is created.

This is where the system breaks.

Because you can increase spend and increase output at the same time, and still be less efficient overall.

The only way to understand what is happening is to connect cost directly to how work performs.

Not at a high level, but at the level where delivery, quality, and rework can be observed, and that is at project or even team level.

What working teams do differently

The teams that are getting value from AI are not doing more: they are working differently. We can say this confidently because in Pensero we see thousands of companies sharing their journey through AI disruption.

  • They start by adapting their ways of working so they can understand how work actually moves through their system.

  • They define what improvement looks like in terms of delivery speed, quality, and efficiency.

  • And they observe how those metrics change as AI is introduced.

They don’t assume that more output means better performance.

And when it doesn’t hold, they pivot. That is the difference between running an experiment and building a company.

What we’ve focused on

At Pensero, we’ve focused on connecting AI usage to the actual performance of engineering systems. Looking at how work is produced, how it moves, and what is ultimately delivered, and tying that directly to the cost of producing it, including the cost introduced by AI.

This is what allows teams to answer the questions that actually matter.

  • Did AI increase delivery?

  • Did quality improve or degrade?

  • Did rework increase?

  • Did cost scale responsibly?

Not as assumptions, but as observable outcomes.

Because the only real way to measure ROI in AI projects is to understand how cost and performance move together over time.

AI increases capacity, that is clear, what is not clear, in most cases, is whether that capacity translates into better performance. And until that becomes measurable, most AI projects will continue to look promising on the surface and fail underneath.

Know what's working, fix what's not

Pensero analyzes work patterns in real time using data from the tools your team already uses and delivers AI-powered insights.

Are you ready?

To read more from this author, subscribe below…