The metric that mattered for the last decade of software was uptime. Five nines, four nines, the page that goes down for ninety seconds and ends up in a postmortem. Uptime is still load-bearing. It is no longer the metric that decides whether a production AI system is safe to run.
The new metric is reversibility. When an agent sends the wrong email to a list of two thousand prospects, the question is not whether the system was up. It was. The question is whether the action can be undone, compensated, or at minimum traced and explained. If you cannot undo it, you should not have shipped it.
We treat reversibility as a runtime feature, not a policy document. Every action an agent takes either has a direct reverse, a compensating action, or a human-in-the-loop gate in front of it. A draft saved to the CRM has a direct reverse. A message sent to a customer has a compensating action: an apology, a follow-up, a flag in the next conversation. A wire transfer has a gate. The runtime knows the difference and refuses to fire actions whose blast radius exceeds the configured tolerance.
This shows up in three places in the stack. In the data layer, every write is event-sourced so the state at any prior moment is reconstructable. In the agent layer, every tool call carries a reversal contract, recorded with the call. In the command center, an operator can replay any decision the system made in the last 90 days and see what changed.
The compounding benefit is that the cost of being wrong drops. Teams that have to be 99% sure before they ship an automation ship slowly and conservatively. Teams that can undo any action in one click ship aggressively and recover. The OS is not safer because it is more cautious. It is safer because it can take it back.
Production AI without rollback is a liability. Production AI with rollback is leverage. The line between the two is a runtime decision, made early, that everything else inherits.