Amazon Kinesis vs Apache Kafka (MSK): Streaming Data on AWS Without the Regret

Every AWS data team eventually faces this question: Kinesis Data Streams or Apache Kafka (via MSK)? Both ingest real-time data at scale. Both integrate with the AWS ecosystem. The choice matters, because migrating between them later is painful. Kinesis is an AWS proprietary service with simple operational model and Kafka-incompatible APIs. Kafka (MSK) is the open standard with a richer ecosystem, more complex operations, and far more flexibility. The right choice depends heavily on your throughput patterns, consumer diversity, and operational tolerance.

Kinesis Data Streams: The AWS Native Choice

Kinesis Data Streams is built around shards — fixed units of throughput capacity. Each shard provides 1 MB/s inbound and 2 MB/s outbound (with Enhanced Fan-Out, each consumer gets their own 2 MB/s). You provision shards explicitly, and scaling means adding or splitting shards. The pricing model: $0.015/shard-hour + $0.014/GB ingested.

The Kinesis Retention and Replay Model

Default retention is 24 hours (configurable up to 365 days for extra cost). Records are stored by shard, ordered within a shard, and consumed via sequence number position. Consumers can replay from any sequence number within the retention window — this is Kinesis's answer to Kafka's offset replay model.

The Enhanced Fan-Out feature (EFO) is critical for multi-consumer scenarios. Without EFO, all consumers share the 2 MB/s per-shard read limit — add a third consumer and each gets ~670 KB/s max. With EFO, each registered consumer gets a dedicated 2 MB/s. EFO costs ~$0.015/consumer-shard-hour extra but is essential when multiple independent services consume the same stream.

Kinesis Data Firehose: The Delivery Partner

Firehose is Kinesis's built-in sink layer — it batches records and delivers to S3, Redshift, OpenSearch, Splunk, or Snowflake with optional Lambda transformation. For the "stream events to S3 for batch processing" pattern, Firehose handles buffering, compression, and retry automatically. A very common pattern: Kinesis Data Streams → Firehose → S3 Parquet (with Firehose doing the Parquet conversion). No custom consumer code, no cluster to manage.

Amazon MSK: Kafka Without the ZooKeeper Nightmares

Amazon Managed Streaming for Apache Kafka (MSK) runs a managed Apache Kafka cluster on your behalf — you specify broker type, number of brokers, storage, and MSK handles provisioning, patching, monitoring, and broker replacement. You get full Kafka API compatibility: any Kafka producer library, Kafka Streams, Kafka Connect, Flink with Kafka connector, and ksqlDB all work against MSK exactly as they do against self-managed Kafka.

MSK Serverless

MSK Serverless (GA 2022) removes the broker management entirely — you pay per GB ingested/egressed with no shard or partition provisioning. The trade-off: limited configurability (max 200 partitions per topic, max 200 MB/s total throughput) and potentially higher per-GB cost at high volumes compared to provisioned MSK. Best for variable, unpredictable throughput where you want zero infrastructure management.

MSK Connect

MSK Connect is a managed Kafka Connect service — run Kafka connectors (Debezium for CDC, S3 Sink Connector, JDBC Source Connector) without managing a Connect cluster. This is particularly useful for CDC pipelines: Debezium on MSK Connect reading from RDS/Aurora PostgreSQL and writing to MSK, then consumed by Flink or Spark.

The Decision Table

Factor	Kinesis Data Streams	MSK (Kafka)
API compatibility	AWS proprietary	Apache Kafka (open standard)
Consumer types	KCL, Lambda, Firehose, KDA	Any Kafka client (Python, Java, Flink, Spark, ksqlDB...)
Max throughput per shard/partition	1 MB/s write, 2 MB/s read	Up to ~50 MB/s per partition (broker-dependent)
Scaling model	Add/split shards (minutes)	Add partitions (fast) or brokers (minutes)
Retention max	365 days (extra cost)	Unlimited (limited by storage)
Operational overhead	Very low (fully managed)	Low-medium (provisioned) / Low (serverless)
Ecosystem lock-in	High (AWS-specific)	Low (open standard)
CDC (Debezium) support	Via Lambda workaround	✅ Native (MSK Connect)
Cost at 100 MB/s sustained	~$300–450/month	~$200–400/month (provisioned)

The practical decision rule: Choose Kinesis if your consumers are primarily AWS services (Lambda, Firehose, Kinesis Data Analytics) and you don't need Kafka API compatibility. Choose MSK if you need CDC (Debezium), Kafka Streams for stateful processing, Flink with native Kafka connectors, or portability across clouds. The ecosystem argument is real: if you later need ksqlDB, Kafka MirrorMaker, or Flink's exactly-once Kafka sink, you'll be glad you chose Kafka.

Real-World Cost Comparison

Let's use a concrete example: 50 MB/s average inbound throughput, 3 consumers, 7-day retention, us-east-1.

Kinesis:

Shards needed: 50 MB/s ÷ 1 MB/s = 50 shards
Shard cost: 50 × $0.015 × 24 × 30 = $540/month
Ingestion: 50 MB/s × 86,400s × 30 = ~129 TB/month × $0.014 = $1,805/month
EFO (3 consumers): 3 × 50 × $0.015 × 24 × 30 = $1,620/month
7-day retention extension: ~$240/month
Total: ~$4,205/month

MSK Provisioned (kafka.m5.2xlarge, 3 brokers):

Broker cost: 3 × $0.476/hr × 24 × 30 = $1,026/month
Storage (7 days × 129 TB incoming ≈ ~900 GB stored compressed): ~$90/month
Data transfer: minimal within VPC
Total: ~$1,116/month

At significant throughput (50 MB/s+), MSK is dramatically cheaper than Kinesis, primarily because Kinesis's per-GB ingestion charge dominates. At low throughput (<5 MB/s), Kinesis is often cheaper and simpler. The crossover is roughly at 10–15 MB/s sustained throughput.

from confluent_kafka import Consumer, KafkaError
import json

# MSK consumer with TLS (same code works against any Kafka)
conf = {
    'bootstrap.servers': 'b-1.mycluster.kafka.us-east-1.amazonaws.com:9096',
    'security.protocol': 'SASL_SSL',
    'sasl.mechanism': 'SCRAM-SHA-512',
    'sasl.username': 'service_account',
    'sasl.password': 'SECRET',
    'group.id': 'analytics-consumer',
    'auto.offset.reset': 'earliest',
    'enable.auto.commit': False  # manual commit for exactly-once semantics
}

consumer = Consumer(conf)
consumer.subscribe(['orders-topic'])

while True:
    msg = consumer.poll(1.0)
    if msg and not msg.error():
        record = json.loads(msg.value().decode('utf-8'))
        process_order(record)
        consumer.commit()  # commit after successful processing

Kinesis for simple AWS-native event routing; MSK for complex streaming architectures with diverse consumers. The portability and ecosystem richness of Kafka become increasingly valuable as your streaming architecture matures — it's much easier to add a Flink job to an existing MSK cluster than to migrate a Kinesis-native architecture to Kafka three years later.