Token Winter is coming
The illusion of cheap AI and the rise of uncontrolled usage.
There is a pattern that repeats itself every time a new layer of technology becomes widely accessible: first comes the excitement, then the experimentation, and only later the realization that scale has a cost. We are entering that third phase with AI.
Token spend is quietly becoming one of the fastest-growing and least understood cost centers in modern engineering organizations. What starts as a few API calls for experimentation quickly turns into production workloads, internal tools, copilots, agents, and automated pipelines, all consuming tokens at a rate that is easy to underestimate and even easier to ignore.
Until it shows up in the budget.
The illusion of cheap AI
The early promise of AI was shaped by accessibility: you could start using it instantly, pay as you go, and scale without friction. That created a perception that cost would naturally follow value, that more usage would mean more productivity. In reality, the relationship is far less linear.
When something is cheap and easy to use, the default behavior is to maximize it. Teams prompt more, generate more, retry more, chain models together, and build systems that assume infinite availability because, at the beginning, it feels that way. In some organizations, there is already a quiet competition to see who can “get more out of AI,” which in practice often translates into who can burn more tokens in the process.
I’ve said in many of my blogs: It sounds like progress but it is often just empty consumption.
Why this becomes a measurement problem
The problem is not the technology, it rarely is. The problem is the absence of a model to understand how that consumption translates into value, I’ve written about it, Bernardo has written about it too. Most organizations today can tell you how many tokens they are using, but very few can explain what they are getting in return.
Which workflows actually improve delivery?
Which use cases reduce rework?
Which teams are using AI to produce better outcomes, not just more output?
Without that layer of understanding, token spend becomes the new cloud bill from ten years ago: growing fast, poorly attributed, and only questioned when it becomes uncomfortable.
AI as a multiplier of waste or leverage
Using AI effectively is about optimizing for outcomes, not maximizing usage. That requires moving beyond aggregate metrics and into a more granular understanding of how AI interacts with real work: where it accelerates, where it introduces noise, where it improves quality, and where it creates hidden costs in the form of rework or complexity.
Not all tokens are equal. Some generate leverage. Others generate waste.
The organizations that will navigate this transition well are not the ones using AI the most, but the ones that understand it the best. They will be able to identify which patterns of usage actually compound productivity, which teams are extracting real value, and which practices should be scaled or eliminated.
Everyone else will continue to optimize for activity. And that is how you end up with systems that look highly productive on the surface while quietly eroding efficiency underneath.
There is a pattern that repeats itself every time a new layer of technology becomes widely accessible: first comes the excitement, then the experimentation, and only later the realization that scale has a cost. We are entering that third phase with AI.
Token spend is quietly becoming one of the fastest-growing and least understood cost centers in modern engineering organizations. What starts as a few API calls for experimentation quickly turns into production workloads, internal tools, copilots, agents, and automated pipelines, all consuming tokens at a rate that is easy to underestimate and even easier to ignore.
Until it shows up in the budget.
The illusion of cheap AI
The early promise of AI was shaped by accessibility: you could start using it instantly, pay as you go, and scale without friction. That created a perception that cost would naturally follow value, that more usage would mean more productivity. In reality, the relationship is far less linear.
When something is cheap and easy to use, the default behavior is to maximize it. Teams prompt more, generate more, retry more, chain models together, and build systems that assume infinite availability because, at the beginning, it feels that way. In some organizations, there is already a quiet competition to see who can “get more out of AI,” which in practice often translates into who can burn more tokens in the process.
I’ve said in many of my blogs: It sounds like progress but it is often just empty consumption.
Why this becomes a measurement problem
The problem is not the technology, it rarely is. The problem is the absence of a model to understand how that consumption translates into value, I’ve written about it, Bernardo has written about it too. Most organizations today can tell you how many tokens they are using, but very few can explain what they are getting in return.
Which workflows actually improve delivery?
Which use cases reduce rework?
Which teams are using AI to produce better outcomes, not just more output?
Without that layer of understanding, token spend becomes the new cloud bill from ten years ago: growing fast, poorly attributed, and only questioned when it becomes uncomfortable.
AI as a multiplier of waste or leverage
Using AI effectively is about optimizing for outcomes, not maximizing usage. That requires moving beyond aggregate metrics and into a more granular understanding of how AI interacts with real work: where it accelerates, where it introduces noise, where it improves quality, and where it creates hidden costs in the form of rework or complexity.
Not all tokens are equal. Some generate leverage. Others generate waste.
The organizations that will navigate this transition well are not the ones using AI the most, but the ones that understand it the best. They will be able to identify which patterns of usage actually compound productivity, which teams are extracting real value, and which practices should be scaled or eliminated.
Everyone else will continue to optimize for activity. And that is how you end up with systems that look highly productive on the surface while quietly eroding efficiency underneath.


