π This is Part 1 of a 3-part series: Auditable AI in Regulated Industries
- The Evidence Layer in Banking β BCBS 239, CCAR, SOX (you are here)
- Designing Multi-Agent AI Over Sensitive Data: Traceable by Construction
- The Evidence Layer in Healthcare & Biotech AI
There's a sentence that ends careers in banking: "We think the number is right, but we can't show you how we got it." Regulators don't run on trust β they run on evidence. Every figure in a risk report, a capital submission, or a financial statement has to be provable: traceable from the dashboard a regulator is looking at, back through every transformation, all the way to the system where the fact was born. That chain of proof is data lineage, and it is the single most underappreciated piece of infrastructure in a regulated data platform.
This article is the first of three on building AI and data systems that hold up under audit. We start with banking, because banking has the oldest and most demanding traceability regimes, and because the framing here carries straight into the AI era: as models and autonomous agents enter the pipeline, lineage is the evidence layer that makes regulatory obligations provable. You don't need to be a compliance lawyer to design for it. You need to understand what three regulators are actually asking for, and recognize that all three are asking the same architectural question.
The one-line thesis: BCBS 239, CCAR, and SOX look like three different compliance burdens. Architecturally they are one requirement repeated three times β prove that every reported number traces accurately to its source, and that nothing was silently altered along the way. Build that capability once and you've answered all three.
Three Regimes, One Underlying Demand
Here are the three frameworks an engineer at a large US/global bank runs into, what each actually requires, and what lineage specifically proves in each.
| Regime | Who & what it governs | What lineage proves |
|---|---|---|
| BCBS 239 | Basel Committee principles for G-SIBs/D-SIBs on risk data aggregation and reporting (14 principles across governance, aggregation, reporting, supervision) | Risk exposures aggregate accurately, completely, and on time, and can be traced to source systems without being altered, truncated, or corrupted in transit |
| CCAR / DFAST | Federal Reserve capital stress testing for banks >$100B; built on SR 11-7 model risk management | The data feeding stress models is traceable from source to decision, models have clear provenance, and results are reproducible by an examiner |
| SOX | Sarbanes-Oxley (Β§302, Β§404, Β§409) β internal control over financial reporting for all US public companies | Every figure in the financial statements traces from its entry point through controlled transformations, with auditable evidence the controls operated |
Notice the shared verb: trace. Each regime is, at its core, demanding an unbroken, tamper-evident path from a reported value back to ground truth. That path is what we'll call the evidence layer.
BCBS 239: Aggregation You Can Defend
BCBS 239 β formally the Principles for effective risk data aggregation and risk reporting β exists because of 2008. When the crisis hit, many banks discovered they could not answer a simple question quickly: "What is our total exposure to counterparty X across every desk and legal entity?" Their data was scattered across incompatible systems, and assembling an accurate firm-wide number took days. BCBS 239's risk-data-aggregation principles demand that this data be accurate, complete, timely, and adaptable β especially under stress, exactly when systems are most strained.
Lineage is how you defend the aggregate. When a supervisor points at a concentration-risk figure and asks "where did this come from?", the answer cannot be a shrug. Lineage lets you trace that exposure back to the source trading and lending systems, validate that the aggregation logic is correct, and demonstrate that the value wasn't silently altered by an undocumented transformation somewhere in the pipeline.
flowchart RL
REPORT["π Risk report:\n'Counterparty X exposure = $4.2B'"]
AGG["Aggregation & netting layer"]
DQ["Data quality / reconciliation"]
WH["Risk data warehouse"]
S1["Trading system"]
S2["Loan origination"]
S3["Derivatives / collateral"]
REPORT --> AGG --> DQ --> WH
WH --> S1
WH --> S2
WH --> S3
Reading lineage right-to-left is the auditor's view: start at the reported number and walk back through every transformation to the source systems. Each hop must be reconstructable and evidenced. This is what BCBS 239 "traceability" means in practice.
Why column-level lineage matters
Table-level lineage ("this report draws from these five tables") is not enough for BCBS 239. The defensible unit is the column. When a regulator asks how the exposure field in a report was derived, you need to show that it came from position.notional joined to collateral.haircut, netted by master_agreement_id β field by field, transformation by transformation. This is exactly the granularity at which "prove this number" can actually be answered β and why column-level lineage is the unit large institutions invest in. Table-level lineage tells you which files were in the room; column-level lineage tells you who said what.
CCAR: From Source to Decision, Reproducibly
CCAR (Comprehensive Capital Analysis and Review) and its sibling DFAST are the Federal Reserve's stress-testing programs for the largest banks. They are arguably the most rigorous model-governance environment in US finance, and they sit on top of SR 11-7, the Fed/OCC guidance on model risk management. For banks above $100B in assets, SR 11-7 compliance is effectively a precondition for a credible CCAR submission.
The traceability bar here is higher than BCBS 239 because there's a model in the middle. A stress-testing calculation might consume data from 200+ source systems β core banking, loan origination, trading, the general ledger, economic-scenario databases β feed it through a capital model, and produce a number the Fed will scrutinize. SR 11-7 expects you to document the model's architecture, its training/input data lineage, its validation results, and its ongoing performance monitoring. The governing requirements are blunt: lineage from data to decision, clear model provenance, and reproducibility.
Reproducibility is the trap. "We ran the model in March and got this number" is not enough. An examiner can ask you to re-run it and get the same answer β which means you must have versioned the input data, the model code, the parameters, and the scenario definitions, all pinned together. If any of those four drifted and you didn't capture which version produced the submitted number, you cannot reproduce it, and you have a finding. This is the same discipline ML engineers call experiment tracking β banking just made it a legal obligation a decade earlier.
This is the point where data lineage and model lineage merge. The evidence layer has to span both: the data's journey to the model, and the model's identity (version, training data, validation) at the moment it produced a regulated output. When AI models enter stress testing β and they are β this requirement doesn't relax; it intensifies.
SOX: Lineage to the Financial Statement
SOX (Sarbanes-Oxley, 2002) is the broadest of the three because it applies to every US public company, not just banks. The relevant machinery is internal control over financial reporting (ICFR): Β§302 makes executives personally certify the financials, Β§404 requires documented and tested controls over how those financials are produced, and Β§409 demands timely disclosure of material changes.
The engineering translation of Β§404 is precise: map data lineage from each entry point through to the financial statements, and place a control at every risk point along the way. A control is just a rule that prevents or detects an error β a reconciliation, an access restriction, a change-management gate, a four-eyes approval. SOX doesn't only ask "is the number right?"; it asks "can you show the controls that keep it right, and prove they operated during the period?"
The classic SOX failure mode is the rogue spreadsheet β a manual transformation between a system and the financial statement that nobody governed, where a fat-fingered formula silently changes a reported figure. Lineage exposes exactly these uncontrolled hops. If your lineage graph has an edge that runs through an analyst's laptop, that's a Β§404 finding waiting to happen.
The Common Pattern: Lineage as Evidence
Step back and the three regimes collapse into one architectural capability. Each wants a tamper-evident path from a regulated output to its origin, with proof that the right controls operated in between.
flowchart TB
subgraph Obligations["Regulatory obligations"]
B["BCBS 239:\naccurate, traceable\nrisk aggregation"]
C["CCAR / SR 11-7:\ndataβdecision,\nreproducible models"]
S["SOX Β§404:\ncontrolled path to\nfinancial statements"]
end
EV["π§Ύ THE EVIDENCE LAYER\ncolumn-level lineage Β· immutable audit log Β·\nversioned data + models Β· control attestations"]
B --> EV
C --> EV
S --> EV
EV --> P["Provable answer to:\n'Show me how you got this number.'"]
Three regimes, one capability. The evidence layer is the shared infrastructure that satisfies all of them β and the foundation everything in Parts 2 and 3 builds on.
Concretely, an evidence layer that survives all three audits has four properties:
| Property | What it means | Which regime leans on it hardest |
|---|---|---|
| Column-level lineage | Field-by-field derivation, not just table-to-table | BCBS 239 |
| Immutable, time-stamped audit | Append-only record of who changed what, when, and why | SOX |
| Transformation capture | Every join, filter, and calculation recorded as a lineage edge | BCBS 239 / SOX |
| Versioned reproducibility | Data + code + params pinned so a result can be re-derived | CCAR / SR 11-7 |
The design instinct that pays off: treat lineage and audit as first-class outputs of the pipeline, emitted automatically as data flows, not as documentation written after the fact. Lineage you reconstruct manually for an exam is already stale and untrustworthy. Lineage the platform emits on every run is evidence. The cultural shift β from "document it later" to "the system proves it continuously" β is the whole game.
Where AI Changes the Picture
Everything above predates the current AI wave, but it's the reason regulated institutions are cautious about putting models and agents into their data pipelines. An LLM that summarizes risk, or an autonomous agent that pulls data and drafts a regulatory narrative, becomes a new node in the lineage graph β and a non-deterministic one. The regulator's question doesn't change: show me how you got this number, and prove nothing was silently altered. But the answer is harder when one of the transformations is a probabilistic model invoked by an agent that decided, on its own, which data to read.
That is exactly the problem Part 2 takes on: how to design a multi-agent AI system that operates over sensitive, regulated data while remaining traceable and observable by construction β so that an agent's action is just another evidenced edge in the lineage graph, not a black hole in it. Part 3 then carries the same evidence-layer thinking into healthcare and biotech, where the regulators are different but the underlying demand is identical.
π Continue the series
- The Evidence Layer in Banking (this article)
- Designing Multi-Agent AI Over Sensitive Data: Traceable by Construction β
- The Evidence Layer in Healthcare & Biotech AI β