Real-Time Snowflake on AWS: Snowpipe Streaming, Dynamic Tables & Openflow

❄️ This is Part 3 of a 3-part series: Snowflake Deep Dive (2026)

Snowflake Internals: How the Three-Layer Architecture Actually Works
Snowflake Cortex AI in 2026: Agents, Analyst, and the Agentic Data Cloud
Real-Time Snowflake on AWS: Snowpipe Streaming, Dynamic Tables, and Lessons Learned (you are here)

"Real-time Snowflake" used to be a contradiction. Snowflake was the batch warehouse you loaded overnight, optimized for big scans over immutable micro-partitions — exactly the storage model that makes row-by-row, second-by-second ingestion awkward (recall from Part 1 that a single-row write can rewrite a whole partition). For years the honest answer to "can I stream into Snowflake?" was "sort of, with micro-batch Snowpipe and a few minutes of latency."

That answer is now out of date. The next-generation Snowpipe Streaming architecture went GA on AWS in late 2025, and it changes the ceiling: up to 10 GB/s per table with ingest-to-query latency typically under 10 seconds. Paired with Dynamic Tables for declarative incremental transformation, you can build a genuinely low-latency pipeline that lands raw events and serves modeled, query-ready (and AI-ready) data continuously. This article covers how the new streaming engine works on AWS, how Dynamic Tables fit, a reference architecture, and the lessons that only surface once real volume hits a real pipeline.

The Two Roads Into Snowflake

First, get the terminology straight, because three things share the "Snowpipe" name and they are not interchangeable.

Mechanism	Pattern	Latency	Use when
COPY / batch loads	Bulk file load on a warehouse	Minutes–hours	Scheduled batch, backfills
Snowpipe (classic)	Auto-ingest files from S3 on arrival (event-driven)	~1 min+	File-based micro-batch from a data lake
Snowpipe Streaming (next-gen)	Rowset/SDK ingestion, serverless, no files	< 10 s	Event streams, CDC, IoT, true low-latency

The first two are about files. The next-gen streaming path is about rows — you push rows over an SDK or REST endpoint and they become queryable within seconds, with no intermediate files to manage. That distinction is the whole story of why the new architecture matters.

How Next-Gen Snowpipe Streaming Works on AWS

The classic streaming SDK worked through client-buffered "channels" and a compute-based cost model. The redesign — GA on AWS in late 2025 — moves the heavy lifting server-side and rebuilds ingestion around the mature Snowpipe service. The components worth understanding:

graph TD
    subgraph AWS["Your AWS account"]
        SRC["Producers\nKinesis / MSK (Kafka) / app / IoT"]
        CONN["Kafka Connector\nor Streaming SDK (Java/Python)\nor REST (NDJSON)"]
    end
    subgraph SF["Snowflake (next-gen Snowpipe Streaming)"]
        ENVOY["Envoy ingress\nauth + routing\n(exchange-scoped tokens)"]
        BUF["Buffering tier\nsharded buffer service\n+ buffer controller"]
        ASGN["Assigner\n(auto-sharding,\nscales with throughput)"]
        PIPE["Serverless Snowpipe\nschema validation +\nin-flight transforms (PIPE def)"]
        TBL["Target table\n(queryable < 10s)"]
    end
    SRC --> CONN --> ENVOY --> BUF
    ASGN -.scales.-> BUF
    BUF --> PIPE --> TBL

Next-gen Snowpipe Streaming on AWS. Authentication and routing happen at an Envoy ingress; a sharded buffering tier absorbs the firehose; an Assigner (inspired by Google's Slicer auto-sharding) scales infrastructure to throughput; and a serverless Snowpipe applies server-side schema validation and optional in-flight transformations defined on the PIPE object before writing query-ready data.

File mode vs rowset mode

The Java/Python SDK is built on a Rust core and dynamically switches between two modes depending on the workload:

File mode — high-throughput and cost-effective; the client encrypts and uploads data to an internal stage. This is the path for large, sustained volume.
Rowset mode — maximum flexibility; rows travel in the request payload itself. Better for smaller, latency-sensitive or irregular flows.

For lightweight or IoT producers there is also a REST endpoint accepting NDJSON, suited to MB/s-range volumes where running an SDK is overkill.

Server-side schema validation and in-flight transforms

The redesign pushes schema validation to the server and supports in-flight stateless transformations declared on the PIPE object using familiar COPY-style syntax. This is a real simplification: your client just sends rows; column mapping, casting, and light reshaping happen as part of ingestion instead of in fragile client code. There's also a pre-clustering optimization that batches data lexicographically by channel name to improve downstream pruning on clustered keys.

The pricing change that matters

Cost moved from a compute/warehouse model to consumption by volume: roughly 0.0037 credits per uncompressed GB ingested, serverless. You're billed for data ingested, not for keeping a warehouse warm to catch it. For steady high-volume streams this is both cheaper and far more predictable than sizing a warehouse for peak.

// Next-gen Streaming SDK: open a channel to a pipe and append rows.
// Server-side validation + PIPE-defined transforms handle the rest.
SnowflakeStreamingIngestClient client =
    SnowflakeStreamingIngestClientFactory.builder("client-1")
        .setProperties(props)            // account, role, key-pair auth
        .build();

SnowflakeStreamingIngestChannel channel = client.openChannel(
    OpenChannelRequest.builder("orders_channel")
        .setDBName("RAW").setSchemaName("EVENTS")
        .setPipeName("orders_pipe")      // pipe defines target + transforms
        .build());

for (OrderEvent e : batch) {
    Map row = Map.of(
        "order_id", e.id(), "amount", e.amount(),
        "event_ts", e.ts(), "region", e.region());
    channel.insertRow(row, e.id());      // offset token enables exactly-once
}
// Durable once Snowflake acknowledges the committed offset token.

Benchmark for scale context: Cboe Global Markets runs market-data ingestion on the next-gen architecture at over 100 TB uncompressed per day (190+ billion rows/day) with P95 query latency under 30 seconds, and Snowflake reports ~56% better query performance versus the classic streaming architecture on its internal TPC-DS runs. You almost certainly don't need 10 GB/s — but the headroom means streaming is no longer the bottleneck in your design.

Dynamic Tables: Declarative Incremental Transformation

Landing raw rows fast is only half a pipeline. You still need to clean, join, dedupe, and model them — continuously, not in a nightly batch. Dynamic Tables are Snowflake's answer: you declare the target as a SELECT, set a TARGET_LAG, and Snowflake figures out how to keep it fresh, processing only the rows that changed since the last refresh (incremental where it can, full refresh where it must).

-- Raw events stream in via Snowpipe Streaming → RAW.EVENTS.orders_raw
-- A Dynamic Table models them, refreshing automatically every 60s.
CREATE OR REPLACE DYNAMIC TABLE analytics.prod.orders_clean
    TARGET_LAG = '60 seconds'
    WAREHOUSE  = transform_wh
AS
SELECT
    order_id,
    amount,
    region,
    event_ts,
    -- dedupe late/duplicate events, keep the latest per key
    ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY event_ts DESC) AS rn
FROM raw.events.orders_raw
QUALIFY rn = 1;

-- Chain them: a downstream Dynamic Table reads the first and Snowflake
-- builds a refresh DAG, propagating changes through the chain.
CREATE OR REPLACE DYNAMIC TABLE analytics.prod.revenue_by_region
    TARGET_LAG = 'DOWNSTREAM'        -- refresh only as fast as consumers need
    WAREHOUSE  = transform_wh
AS
SELECT region, DATE_TRUNC('hour', event_ts) AS hr, SUM(amount) AS revenue
FROM analytics.prod.orders_clean
GROUP BY 1, 2;

Two things make Dynamic Tables the right tool here. First, TARGET_LAG is a declarative freshness SLA — you state how stale the data may be and Snowflake schedules refreshes to meet it, rather than you hand-writing schedules and MERGE logic. Second, chained Dynamic Tables form a DAG: set the leaf to a concrete lag and intermediates to DOWNSTREAM, and Snowflake only does work when something downstream actually needs fresher data.

Dynamic Tables vs Streams + Tasks

Before Dynamic Tables, incremental pipelines were built with Streams (change-tracking on a table — the set of rows changed since last read) and Tasks (scheduled SQL). That pattern still exists and is still the right choice when you need imperative control: complex multi-statement procedures, conditional branching, calls to external functions. But for the common "keep this modeled table fresh from these sources" case, Dynamic Tables replace a pile of Stream/Task/MERGE boilerplate with one declarative object.

	Dynamic Tables	Streams + Tasks
Model	Declarative (`SELECT` + lag)	Imperative (change set + scheduled SQL)
Incremental logic	Managed by Snowflake	You write the MERGE
Best for	Transformation DAGs, freshness SLAs	Procedural logic, branching, side effects
Operational overhead	Low	Higher (you own the orchestration)

Openflow: Managed Ingestion for Everything Streaming Doesn't Cover

Added June 11, 2026. Snowflake Openflow reached general availability after this article first published — initially as a BYOC (bring-your-own-cloud) deployment on AWS in mid-2025, then as fully-managed Snowflake Deployments running on Snowpark Container Services (GA November 4, 2025). It changes the ingestion story enough to warrant its own section.

Snowpipe Streaming is the right tool when you control the producer and can push rows. But a lot of "get this data into Snowflake" work isn't a stream you own — it's a SaaS API (Salesforce, Google Ads, SharePoint, Box), a database you need to CDC, or a pile of unstructured documents headed for an AI workload. Historically that meant Fivetran, custom connectors, or hand-rolled NiFi. Openflow is Snowflake's managed answer: a data-integration service built on Apache NiFi, with hundreds of processors and prebuilt connectors, wrapped in Snowflake governance, security, and observability.

Two deployment models matter:

Snowflake Deployments — Openflow runs fully managed on Snowpark Container Services inside Snowflake. Nothing to operate; the simplest path and the one to default to.
BYOC (bring your own cloud) — Openflow runs in your own AWS account, for cases where data residency or network isolation requires the runtime to sit in your VPC.

Where Openflow fits relative to streaming: it is the connector and movement layer for the sources that aren't a native row stream, and it can land that data into Snowflake (often via Snowpipe Streaming under the hood for the low-latency paths). Concrete use cases:

SaaS ingestion — pull from Salesforce, marketing platforms, collaboration tools without standing up Fivetran.
Database CDC — change data capture from MySQL/PostgreSQL and others, as an alternative to the Debezium-on-MSK pattern when you'd rather not operate Kafka Connect.
Unstructured data for AI — Openflow's headline use case: moving documents, images, audio, and other unstructured content into Snowflake so Cortex (see Part 2) can index and reason over it. This is the ingestion half of a RAG pipeline.
Multimodal and event sources — hundreds of NiFi processors cover the long tail of formats and protocols that a pure streaming SDK doesn't.

How to choose between Openflow and Snowpipe Streaming. If you own the producer and need sub-10-second latency on a high-volume row stream, use Snowpipe Streaming directly. If the source is a SaaS app, a database to CDC, or unstructured content — or you want managed connectors instead of writing ingestion code — use Openflow. They're complements, not competitors: a mature platform often runs Snowpipe Streaming for its owned event firehose and Openflow for everything else, both landing into the same Dynamic Table DAG.

A Reference Architecture on AWS

graph LR
    subgraph Ingest["Ingest (AWS)"]
        APP["Apps / services"] --> KIN["Kinesis / MSK"]
        CDC["Database CDC\n(Debezium → MSK)"] --> KIN
    end
    subgraph Stream["Stream into Snowflake"]
        KIN --> SPS["Snowpipe Streaming\n(Kafka connector / SDK)\n< 10s, serverless"]
    end
    subgraph Transform["Transform (continuous)"]
        SPS --> RAW["RAW landing table"]
        RAW --> DT1["Dynamic Table\nclean + dedupe\nLAG 60s"]
        DT1 --> DT2["Dynamic Table\nmodeled marts\nLAG DOWNSTREAM"]
    end
    subgraph Serve["Serve"]
        DT2 --> BI["BI dashboards"]
        DT2 --> CORTEX["Cortex Agents /\nAnalyst (Part 2)"]
        DT2 --> ALERT["Alerts / reverse-ETL"]
    end

End-to-end low-latency pipeline on AWS: Kinesis/MSK (including Debezium CDC) → Snowpipe Streaming → RAW → a Dynamic Table DAG → serving. The same modeled tables feed BI and the Cortex agents from Part 2 — fresh data is what makes "ask your data" answers actually current.

The Kafka path deserves a note: the Snowflake Kafka connector can target Snowpipe Streaming directly, so an existing MSK topic becomes a low-latency Snowflake table with configuration rather than code. For CDC, the common pattern is Debezium → MSK → Kafka connector → Snowpipe Streaming, landing change events that a Dynamic Table then collapses into current-state rows.

Lessons Learned

The mechanics are the easy part. These are the things that bite in production.

1. Exactly-once is your job, with help

The streaming SDK supports per-row offset tokens: you tag each row with a monotonic token, and on reconnect Snowflake tells you the last committed token so you resume without gaps or duplicates. This only works if your producer can replay deterministically from a known offset (Kafka/Kinesis can). Design the offset scheme up front — retrofitting idempotency after you've discovered duplicates in a mart is painful. Even so, dedupe defensively in the first Dynamic Table (the QUALIFY ROW_NUMBER() pattern above) as a safety net.

2. TARGET_LAG is a cost dial, not just a freshness dial

Every refresh of a Dynamic Table runs on a warehouse and costs credits. A TARGET_LAG of '1 minute' on a table nobody reads more than hourly is pure waste. Set lag to the actual consumption need, use DOWNSTREAM for intermediates so they only refresh when a leaf demands it, and remember that the streaming ingest (volume-priced) and the transform refresh (warehouse credits) are two separate line items — both deserve a resource monitor.

3. Don't stream what you should batch

Streaming is seductive, but most "real-time" requirements are really "fresh enough." If the business looks at a dashboard hourly, sub-10-second ingestion buys nothing and adds operational surface. Reserve true streaming for cases where freshness has measurable value — fraud signals, operational monitoring, live personalization, market data — and let everything else ride classic Snowpipe or batch. The cheapest pipeline is the one you didn't over-engineer.

4. Watch the small-files / partition pressure

Continuous ingestion naturally produces many small micro-partitions. Snowflake's background services compact over time, but heavy streaming into a table that's also queried hard can show degraded pruning until compaction catches up. Land raw streams in a dedicated table, do the heavy modeling in Dynamic Tables (which write well-organized output), and query the modeled layer — not the raw firehose.

5. Schema evolution will happen — plan the contract

Producers add fields. Server-side schema validation means a row that doesn't match the table is rejected, which is correct but will halt a naive pipeline. Decide your contract: either enforce a schema registry on the producer side (Kafka + Avro/Protobuf), or land a permissive VARIANT column for semi-structured payloads and project typed columns in the first Dynamic Table, where a schema change is a one-line SQL edit instead of a dropped pipeline.

6. Latency is end-to-end, not just ingest

"< 10 second ingest" is one hop. Your real latency is producer buffering + ingest + Dynamic Table lag + query/cache. A 60-second TARGET_LAG downstream of a sub-10-second ingest gives ~70-second freshness, not 10. Measure the whole chain against the business SLA and tune the dominant term — usually the transform lag, not the ingest.

The pragmatic default: Kinesis or MSK → Snowflake Kafka connector on Snowpipe Streaming → a RAW table → one dedupe/clean Dynamic Table at the freshness your consumers actually need → modeled marts at DOWNSTREAM lag. This covers the large majority of real-time requirements with almost no bespoke code, predictable volume-based ingest cost, and modeled tables that feed both BI and the Cortex agents from Part 2.

Where This Leaves You

Across this series we went down the stack and back up. Part 1 explained the engine — separated storage and compute over immutable micro-partitions, with the cloud services layer doing the thinking. Part 2 showed what Snowflake now builds on that engine: Cortex agents reasoning over governed data. This part closed the loop on the input side — getting data in continuously, fast, and cheaply enough that the dashboards and agents downstream are working with reality, not last night's snapshot.

The through-line is that Snowflake's original architectural bet — separate storage from compute, keep one governed copy of the data, let the services layer coordinate — is exactly what made each of these later capabilities possible. Streaming ingest is just another writer producing micro-partitions; Dynamic Tables are just managed transforms over them; Cortex agents are just governed compute reading them. Understand the engine, and the rest of the platform stops being a list of features and becomes a set of consequences.

❄️ Snowflake Deep Dive (2026) — series complete

Snowflake Internals: The Three-Layer Architecture
Snowflake Cortex AI in 2026: Agents, Analyst, and the Agentic Data Cloud
Real-Time Snowflake on AWS (this article)