Why pilots stall (and what to do instead)
Most AI pilots stall because they target the wrong loop. Pick the one that compounds, not the one that demos well.
Most AI pilots stall around month four. Not because the model failed. Most do what they were asked. They stall because the loop they automated wasn't the one that compounds.
There are two kinds of loops in any operation. The demo loop and the compounding loop. They look similar from the outside. Inside they're not.
A demo loop is the one that's easy to film. Lead capture into a chatbot. Slack agent that summarizes a thread. The "AI assistant" tile on the dashboard. They land impressively in a Wednesday review and save almost no real hours.
A compounding loop is the one that touches every working hour. Tier-1 support triage. Deal-routing. Billing exception handling. CSM follow-up sequencing. The loops where the team currently spends the most time, where wrong answers are visible the next morning, and where structured output produces a feedback signal you can grade.
Pilots stall on the first kind because no one really uses them. The team goes back to the old way after a month. The tool sits in the stack collecting dust.
Compounding loops keep going because they're load-bearing. If they break, the team feels it immediately. That pressure is what produces the iterations that make the system better.
Before scoping a pilot we ask four questions:
- Does this loop touch hours every day, or hours every quarter?
- Are the wrong answers visible quickly, by someone who notices?
- Does the loop produce a clear judgment we can capture as an eval?
- Can the AI handle the easy 80% and route the rest to a human, without the human becoming a bottleneck?
If three of those are no, we won't take the engagement. The demo will look fine. The pilot will stall. We'd rather not.
The second pass of the question, what to do instead, is mostly about picking the right starting loop. The one the team complains about in standup. The one that creates a backlog. Look for places where someone is doing the same judgment over and over, where the judgment can be written down, and where being slightly wrong has a recoverable cost.
That's where AI compounds. Not the demo. The boring loop nobody films.