When Agents Get It Right and Nobody Notices the Problem

May 21, 2026

By Justin Hingorani, Managing Director of AI, Duco

Agentic AI is replacing human investigation and resolution across the trade lifecycle: in execution, client onboarding, compliance, post-trade, risk, and finance. Agents now read data, reason, evidence conclusions, and propose action.

With this shift, most focus on the obvious failure: an agent making a bad call. But the real damage happens when an agent makes correct calls, each one defensible, while system-wide problems go unnoticed.

The symptoms, not the disease

Take an agent given a well-scoped job: clear small confirmation mismatches inside matching tolerance. Each decision is technically correct – tolerances are designed to absorb the routine noise from different price sources, timings, or formatting across counterparties. A human spot-checking any single case would sign it off.

But tolerances absorb random noise; they don’t catch systematic drift. A counterparty’s pricing engine update produces trades a few basis points below agreed levels; each is inside tolerance, but the cumulative P&L impact becomes material. A reference data change remaps an instrument identifier; matches pass economic tolerance, but the firm is booking against the wrong security.

The agent handles each symptom correctly. No one is looking at the disease. By the time teams dig in, individual cases look fine. The issue is in the aggregate. Per-decision approval gates never catch this. System-level monitoring will. The control layer must watch for patterns of decisions, not just individual ones, and escalate anomalies.

Why agents change the shape of control

Markets professionals know the routine. A junior prepares the work, a senior signs it off, and the audit trail makes plain who did what. The control layer for agents extends that same logic, but with one critical difference. Agents act at machine speed and scale, so the bar for control must be higher than for humans, not the same.

That shows up in two places. The first is the audit trail. It must not only capture what the agent did but why: the evidence, alternatives, and conclusion. The “show your working” you would expect a junior analyst to defend. Because no human can manually inspect thousands of real-time decisions, active monitoring must watch for policy violations, drift, or patterns warranting a scope reduction before something becomes a real problem.

The second is scope. A sensible first step is constraining the environment an agent operates in, having it inherit the user’s permissions, and isolating each session so an agent cannot carry context or instructions between users. The mature version is agents that hold their own purpose-scoped identities. An exception-management agent that can read trade data and propose resolutions but cannot modify rules, with permissions an administrator can grant, narrow, or revoke independently of any user.

Those unmodifiable rules are deterministic checks – position caps, payment authorisation thresholds, exposure ceilings. If the agent’s reasoning concludes a trade is fine but the rules say it’s not, the rules win.

Three mistakes firms are making

Three patterns come up repeatedly when firms put agentic AI into production – less about the agents themselves than the workflows around them.

The first is plugging agents into workflows built for humans. These rely on human cadence, informal sanity checks, and the ability to pause when something feels off. Drop an agent into the same shape and the bottlenecks shift to humans, while safety nets vanish. Redesign around what has changed. Identify where agents add the most leverage (pattern detection, evidence-gathering, classification at scale), and rebuild with humans positioned where judgment is genuinely needed.

The second is throwing the whole job and vast context at one agent. A single agent covering the whole workflow is hard to control, debug, and govern. When it gets something wrong, you cannot isolate where the reasoning breaks. Other patterns work better, like an orchestrator – a top-level agent delegates to tightly scoped sub-agents for data gathering, analysis, drafting the action, with explicit handoffs between them. With context compressed as progress is made and each sub-agent only getting the slice of context it needs, decisions are sharper, controls are tighter, economics are better, and performance scales – less context per call means materially lower cost and latency at the volumes agentic workflows run at.

The third is optimising the agent without rethinking the human’s role. If the agent gets dramatically more productive but the human just “approves everything”, you have built a new bottleneck. Disengaged reviewers rubber-stamp, which is worse than no review at all. Stop applying the same approval regime to every action. Tier human involvement

by the action’s stakes: how much risk it carries, whether it is reversible if wrong, and the impact on financial, regulatory, or reputational outcomes, modulated by model confidence. High-confidence, low-impact, easily reversible actions run autonomously with aggregate monitoring. Low-confidence, high-impact, irreversible actions get a human decision-maker, every time. Most operational agent work sits in the middle band.

Take trade confirmation exceptions as an example. The right design is not one agent doing everything while a human rubber-stamps it. It’s an orchestrator dispatching classification, evidence, and resolution sub-agents – each exchanging needed context. High-confidence mismatches inside understood tolerances clear autonomously. Only material disagreements or exceptions threatening settlement escalate to a human, fully evidenced. Agents narrow the surface area; humans spend time where it counts.

The role of humans – from doers to decision-makers

The work that disappears was already a proxy for missing automation: humans bridging two systems by hand, rekeying entries, working through the same routine exception type for the hundredth time. Those were not jobs. They were patches and should not come back. The people doing them today are best placed to design what replaces them.

Humans become the architects of how agents and people split the work. They become senior decision-makers on high-stakes, irreversible, or ambiguous decisions. They become trust-grantors, extending agent autonomy as confidence builds within clearly defined limits. They become pattern-watchers, looking for the moments where individual decisions are correct but the system is missing the bigger picture. And they become governance defenders, deciding what authority an agent should hold, on what evidence, and what to do when the evidence drifts.

The underlying shift is from task worker to decision maker. These are senior, judgment-heavy capabilities. Firms that get this transition right will not just deliver more with the same team. They will write the next operating model for the industry.

AI Drives Early Regulatory Monitoring Improvements