Designing Multi-Agent AI Over Sensitive Data: Traceable and Observable by Construction

📚 This is Part 2 of a 3-part series: Auditable AI in Regulated Industries

The Evidence Layer in Banking — BCBS 239, CCAR, SOX
Designing Multi-Agent AI Over Sensitive Data (you are here)
The Evidence Layer in Healthcare & Biotech AI

Part 1 established the demand that finance regulators have made for decades: prove how you got every number, and prove nothing was silently altered on the way. That demand was already hard with deterministic ETL. Now put a multi-agent AI system in the middle — autonomous agents that read sensitive data, call tools, hand work to sub-agents, and decide for themselves which sources to consult — and the naive version becomes an auditor's nightmare: a non-deterministic black box wired directly into your most regulated data.

This is the engineering article of the series. The thesis is simple and uncompromising: traceability and observability cannot be bolted on after the fact. They have to be properties of the architecture itself. If an agent can touch sensitive data through a path your audit log doesn't see, you don't have a compliant system — you have a liability with a nice demo. Let's design one that isn't.

This is now a regulatory requirement, not a nicety. Agent observability — the ability to see, understand, and explain what agents did across systems — is increasingly mandated. The EU AI Act's Article 12 requires high-risk AI systems to automatically log events enabling traceability across their lifecycle (with a 10-year retention obligation under Article 18, and enforcement ramping for regulated industries through 2026). The NIST AI RMF asks for the same governance evidence (GOVERN 1.7, 6.1). "We can't reconstruct what the agent did" is becoming a finding, not an inconvenience.

Why Agents Break Traditional Audit

A conventional pipeline is a directed graph you drew in advance: data moves through known steps in a known order. Agents violate every assumption that made that auditable:

Agent property	Why it breaks audit
Autonomy	The agent chooses which tools and data sources to use at runtime — the graph isn't fixed in advance
Multi-step & recursive	One request fans out into many tool calls and sub-agent invocations; the "transformation" is a tree, not a line
Non-determinism	The same prompt can take different paths — reproducing "what happened" requires capturing the actual path, not re-deriving it
Persistence	Memory across sessions means a decision today can depend on data accessed weeks ago
Direct data access	Agents reach into the same sensitive systems your humans do — but faster, and without a person in the loop by default

The fix isn't to make agents deterministic — you can't. It's to ensure that every consequential thing an agent does flows through a mediated, logged path, so the actual execution graph is captured as it happens. Capture the path, and non-determinism stops being scary: you're not re-deriving what happened, you're replaying a recorded fact.

What Must Be Provable

Borrow the discipline from Part 1 and from life-sciences data integrity (ALCOA+, which Part 3 covers): for every agent action touching regulated data, you must be able to answer the classic audit questions —

Who — which agent (and which human or system on whose behalf) took the action? Agents need identities.
What — which data was read or written, at what granularity?
When — a trustworthy, immutable timestamp.
Why — what was the goal, what policy permitted it, what reasoning led here?
Under what authority — which policy was evaluated, what was granted or denied, were exceptions approved?
With what result — the output, and its link back to the inputs that produced it.

If your architecture can answer all six for any agent action — months later, queryably, without the agent's cooperation — you have a system that survives audit. The rest of this article is how to build for those six answers.

Six Architectural Principles

1. Every agent has an identity

Treat each agent and sub-agent as a first-class non-human identity, not as an extension of the user who triggered it. It gets its own credentials, its own scoped permissions, and its own entry in your identity provider. This is the foundation of "who" — and it's why the 2026 governance consensus is that agents are digital identities. An agent acting "as the user" with the user's full access is the anti-pattern: you lose attribution and you over-grant.

2. Govern at the data layer, not the agent layer

Prompt-based guardrails ("please don't access records outside your scope") are not controls — they're suggestions to a probabilistic system. Real enforcement lives where the data lives: row/column-level security, masking policies, and access rules evaluated by the data platform or a policy engine on every request, regardless of how clever or confused the agent is. The governance consensus for 2026 is explicit that governance is shifting to the data layer; the agent should be structurally incapable of reading what it isn't entitled to.

3. All data & tool access flows through a governed gateway

This is the keystone. Agents never call databases, APIs, or tools directly. They call them through a governed tool gateway (the Model Context Protocol is the emerging standard for exactly this mediation) that does four things on every call: authenticate the agent identity, evaluate policy, redact/tokenize sensitive fields the agent isn't entitled to see, and log the full request and response. One mediated chokepoint turns "what did the agent access?" from an unanswerable question into a database query.

4. Capture the decision trace, not just the action

The "why" requires recording the reasoning layer: which policy was applied, which records or precedents the agent cited, which exceptions were granted, and what the agent's plan was. Decision traceability — making the governance-and-reasoning layer queryable for audit — is what separates "the agent did X" from "the agent did X because, under policy P, with data D, having considered alternatives A." Regulators increasingly want the latter.

5. The audit log is append-only, immutable, and long-lived

Everything the gateway and orchestrator emit lands in a tamper-evident, append-only store with retention measured in years (the EU AI Act says 10). If an agent — or an attacker who compromised one — can edit the audit log, the log proves nothing. WORM storage, hash-chaining, and write-only credentials for the logging path are the table stakes.

6. Lineage spans data and agent actions as one graph

The payoff from Part 1: an agent's action should be just another edge in the lineage graph. "This regulatory narrative was drafted by agent-7, which read these three datasets (themselves traced to source per BCBS 239), under policy P, approved by human H." The agent doesn't create a gap in lineage; it's a node within it. That continuity is what makes an agentic pipeline defensible to the same regulators from Part 1.

A Reference Architecture

Putting the six principles together yields a recognizable shape. Note that nothing reaches sensitive data except through the governed gateway, and everything emits to the immutable audit and observability plane.

flowchart TB
    U["User / system request\n(with identity + purpose)"]
    ORCH["Orchestrator\n(plans, delegates, enforces HITL gates)"]

    subgraph Agents["Agent tier — each has its own identity"]
        A1["Agent A"]
        A2["Agent B"]
        A3["Sub-agent"]
    end

    GW["🔐 Governed tool gateway (MCP)\nauthN · policy eval · redaction · full logging"]

    subgraph Data["Sensitive / regulated data"]
        D1["Warehouse\n(row/col security)"]
        D2["Document store"]
        D3["External APIs"]
    end

    AUDIT["🧾 Immutable audit + lineage store\nappend-only · 10-yr retention"]
    OBS["📈 Observability plane\ntraces · evals · guardrail alerts"]

    U --> ORCH --> A1 & A2
    A2 --> A3
    A1 & A2 & A3 --> GW
    GW --> D1 & D2 & D3
    ORCH -.emits.-> AUDIT
    GW -.emits.-> AUDIT
    Agents -.spans.-> OBS
    GW -.metrics.-> OBS

Traceable-by-construction multi-agent architecture. The gateway is the single mediated path to sensitive data; the audit store and observability plane receive everything. Remove the gateway and the whole compliance story collapses.

The Patterns That Make It Work

Scoped, deterministic tools over raw queries

Don't hand agents a raw SQL connection and hope. Expose a semantic layer of narrow, validated tools — get_exposure(counterparty_id), not "run arbitrary SQL." Each tool encodes the access policy and returns only what the agent's identity is entitled to. This is the same "code engineering beats prompt engineering" pattern from the RAG work in earlier posts: deterministic tools are auditable, testable, and safe in a way that free-form generation never is.

# The gateway wraps every tool call: identity → policy → redaction → log
def invoke_tool(agent_id: str, tool: str, args: dict, purpose: str) -> dict:
    decision = policy_engine.evaluate(agent_id, tool, args, purpose)
    if not decision.allow:
        audit.write(agent_id, tool, args, purpose, outcome="DENIED",
                    policy=decision.policy_id)
        raise AccessDenied(decision.reason)

    raw = TOOLS[tool](**args)
    safe = redact(raw, entitlements=decision.entitlements)  # mask out-of-scope fields

    audit.write(agent_id, tool, args, purpose, outcome="ALLOWED",
                policy=decision.policy_id, data_refs=lineage_refs(raw),
                result_hash=sha256(safe))
    return safe

Every call produces an audit record linking the agent identity, the policy that authorized it, the lineage references of the data touched, and a hash of what was returned. That single function is most of your "who/what/when/why/result."

Redaction and tokenization at the boundary

Sensitive fields (PII, PHI, account numbers, MNPI) should be masked or tokenized before they ever enter an agent's context window. The principle: minimize what the model sees to what the task requires. If an agent only needs to know two records belong to the same customer, give it a token, not the SSN. This shrinks both your breach surface and the volume of sensitive data sprayed across LLM prompts and logs.

Human-in-the-loop gates for consequential actions

For actions that are irreversible or high-stakes — moving money, submitting a filing, changing a clinical record — the orchestrator pauses for explicit human approval, and that approval is itself an audited event ("approved by H at T, having seen plan X"). This is both a safety control and an evidence artifact regulators specifically look for.

Span-based tracing: OpenTelemetry for agents

Borrow the observability stack you already use for microservices. Model each agent step — a plan, a tool call, a sub-agent delegation, a generation — as a span in a distributed trace, with the parent request as the root. This gives you the execution tree for free, lets you debug a misbehaving agent the way you'd debug a slow API, and produces the timeline an auditor wants. Observability is the control plane that turns autonomous behavior into measurable, auditable outcomes.

Evals and guardrails are observability

Continuous evaluation (faithfulness, policy-violation rate, PII-leak detection) running on real traffic isn't a separate QA activity — it's part of the observability plane. A guardrail that blocks an out-of-policy action should emit the same kind of audited event as a successful one. "We caught and blocked 14 attempts to access out-of-scope records this quarter" is exactly the evidence that demonstrates the control is operating.

Mapping Controls to Obligations

The reason this architecture is worth the effort: each component maps directly to a named regulatory requirement. The same build satisfies multiple regimes at once.

Obligation	What it demands	Architectural control that satisfies it
EU AI Act Art. 12 / 18	Automatic event logging, lifecycle traceability, 10-yr retention	Immutable append-only audit store fed by gateway + orchestrator
NIST AI RMF (GOVERN)	Documented governance & risk evidence	Decision traces + policy attestations, queryable
SR 11-7 (model risk)	Provenance + reproducibility of model-driven outputs	Versioned model/data refs pinned per run; spans capture the path
BCBS 239 / SOX (Part 1)	Traceable, controlled path to a reported figure	Agent action as a lineage edge; gateway as the §404 control point
HIPAA / 21 CFR Part 11 (Part 3)	Access control + audit trail over sensitive records	Identity + data-layer policy + redaction + audited access

Pitfalls That Sink Real Projects

Prompt-based "controls." Instructions in a system prompt are not access control. If the only thing stopping an agent from reading restricted data is a polite request, you have nothing. Enforce at the data layer.
Logging the action but not the reasoning. "Agent read table X" without the policy, purpose, and decision context fails the "why." Capture the decision trace.
Mutable logs. An audit trail an agent (or attacker) can edit is not evidence. Append-only or it doesn't count.
Agents inheriting the user's full access. Scope every agent identity to least privilege; "acting as the user" destroys attribution and over-grants.
Sensitive data in prompts and traces. If you log full prompts containing PII/PHI, your observability store is now a regulated data store too. Redact before logging.
Treating observability as optional tooling. Under the EU AI Act and NIST AI RMF it's a requirement. Budget for it as core architecture, not a post-launch add-on.

The test that tells you you're done: pick any agent action from three months ago and ask — can I show, from immutable records alone, which agent did it, on whose behalf, what data it touched, which policy allowed it, why, and what it produced? If yes for every action, you've built traceable-by-construction. If you find yourself reconstructing or inferring any of those answers, that gap is your next finding.

Where This Goes Next

This architecture is domain-agnostic — it's the same shape whether the sensitive data is trading positions or patient genomes. Part 1 grounded the why in finance; Part 3 carries the identical evidence-layer thinking into healthcare and biotech, where HIPAA, FDA Good Machine Learning Practice, and 21 CFR Part 11 audit trails make the stakes literally life-and-death — and where the architecture above maps, almost component for component, onto a different alphabet of regulators asking the same fundamental question: prove it.

📚 Continue the series

The Evidence Layer in Banking — BCBS 239, CCAR, SOX
Designing Multi-Agent AI Over Sensitive Data (this article)
The Evidence Layer in Healthcare & Biotech AI →