Supply Chain Evolution: When AI stops answering questions and starts chasing goals

Somewhere in a bottling plant, an automated ordering system has just done exactly what it was told. A request came in for 10,000 bottles, so it placed the order. Nobody paused it. Nobody asked whether 10,000 was a real number, a test, or a fat-fingered typo. By the time a human looks up, the pallets are already moving.

That story arrives early in this episode of the S&OP MasterClass, and it's the quiet fear sitting behind every agentic AI conversation in supply chain right now. Rafael Amaral, CTO and Co-Founder of TilliT, meets it head on. "Somebody comes in and orders 10,000 bottles and suddenly you have 10,000 bottles delivered," he says. "That lack of basic assertions, it's what's causing some of these projects to fail."

And yet the same conversation is, underneath, an optimistic one. Because the very thing that makes that failure possible, an AI that takes action on its own, is also what puts the biggest planning prize in a generation within reach.

Why agentic AI counts as a lever in Supply Chain Management

Rafael's framing is deliberately blunt. "This technology that we have right now, if it's used in the right format, it creates a lever that allows you to go further than you have been able to do so far," he says. "And if you're not doing that, your competitor is."

It helps to see it as a second wave. The first wave of AI in supply chain made the things we already did a little better: sharper forecasts, less variability, more confidence in a plan we already understood. Useful, but incremental. The second wave is a change in kind, and Rafael's clear that ignoring it is no longer a strategy. "We are past the time where we can put our heads under the blanket and ignore it's happening."

The difference between a question and a goal

So what actually separates this wave from a clever chatbot? Strip away the jargon and a language model does one thing: text in, text out. It predicts the next most likely word. It becomes agentic the moment you hand it tools, the ability to query a dataset, run code, send an email, and let it work through them in a loop towards an objective.

Rafael's example is the one every planner will recognise. "If I ask, okay, what is my on time in full, then that's a request and a reply," he says. "But then when you say, okay, how do I increase my on time in full, then you start setting up a goal." The first is a number. The second is a piece of work the system can chew on for hours, overnight, at a speed no human team can match.

Why the AI projects fail, and why some succeed

The frustration many leaders feel is real, and it has a cause. The model can still get things wrong. "The moment that you can get it wrong, even if it's 10% wrong, is the moment that it's really hard for you to be certain to automate a full end-to-end process," Rafael says. That's why an unguarded, fully automatic ordering system carries real risk.

There's a technical trap underneath it too. Because these models can swallow a million tokens, it's tempting to select an entire table and dump it in. "That's not a great idea," he says, because that's exactly the route into hallucination and unpredictable answers. The teams getting real value do two things at once: they pick the right use case, and they pair it with the right method to implement it. Neither alone is enough.

The shovels nobody had

Where it goes right, the results are hard to argue with. Rafael's favourite image is about data. "People say data is the oil, is the gold," he says. "Nobody has shovels." The reports most companies pull from an ERP or an IBP plan are a thin slice of what the data actually holds, and digging deeper has always meant hiring data scientists that most planning teams cannot find or afford.

This is the gap Aura, built across the Roima portfolio, is meant to close. Aura goes a step beyond a chatbot that formats a query. "What we gave Aura is the power of developer," Rafael says. It writes and runs its own code, applying the kind of statistical analysis you'd expect from a specialist, across the whole dataset, built security-first because that level of access has to be. He describes it as hiring a team of five specialists you could never otherwise justify, working for you overnight.

The test of whether it works is refreshingly unromantic. "Did you change your process? Did you talk to your employees? Did you act on what the system told you?" Increasingly, customers do, because the system surfaces correlations that were always there but never connected. "I had my dashboard looking at this. I had my dashboard looking at that," as Rafael puts it. "But I never had them talking to each other."

What it means for the people

Every agentic conversation ends up here, and the honest answer from the work so far is that these tools are empowering planners. The planner who used to spend hours building a report to find the hotspots now asks in plain language and has the answer in seconds. The value stops being a black box somewhere far away and starts showing up in the daily job.

It also reframes the question of trust. Waiting for the technology to be perfect is the wrong test, because people aren't perfect either. "If your accuracy is 90% and the accuracy of the AI gets to 95%," Rafael asks, "should you use that, or should you keep it with your 90%?" The real work is to put the right gates and guardrails around the decision, so the agent acts where the cost of an error is low and a human stays in the loop where it is high.

That’s what the second wave actually buys you: a planning organisation that finally has the shovels, and the judgement to know where to dig, before your competitor starts digging too.

Many companies have the first steps in place. They accept that AI is a black box. They build business cases, and they choose priorities.

What’s missing are the last two steps:

How do we guard AI?
How do we design the human role?

Guardrails are in two layers.

At a very practical level, you can fence predictions in. For a forecast, you might limit how much it can increase or decrease, so it doesn’t go “completely nuts” compared to what’s historically plausible.

At a more strategic level, you define the playing field of AI. For example, in the product portfolio: what should be stocked, and what shouldn’t? AI can propose moving SKUs between make-to-stock and make-to-order. It’s a commercial decision – It’s your company’s value proposition to customers.

AI can provide excellent input, but humans must decide what the company wants to offer “off the shelf” versus what customers can wait for.

And this is why “it’s just calculating safety stocks” is a dangerous sentence. Safety stock decisions shape inventory strategy and customer promise. That should never be handed over without human intention and governance.

The beauty and horror of automation is that you can create structured, scaled errors very fast. Nobody wants that.