Every AWS data team eventually faces this question: Kinesis Data Streams or Apache Kafka (via MSK)? Both ingest real-time data at scale. Both integrate with the AWS ecosystem. The choice matters, because migrating between them later is painful. Kinesis is an AWS proprietary service with simple operational model and Kafka-incompatible APIs. Kafka (MSK) is the open standard with a richer ecosystem, more complex operations, and far more flexibility. The right choice depends heavily on your throughput patterns, consumer diversity, and operational tolerance.
Kinesis Data Streams: The AWS Native Choice
Kinesis Data Streams is built around shards โ fixed units of throughput capacity. Each shard provides 1 MB/s inbound and 2 MB/s outbound (with Enhanced Fan-Out, each consumer gets their own 2 MB/s). You provision shards explicitly, and scaling means adding or splitting shards. The pricing model: $0.015/shard-hour + $0.014/GB ingested.
The Kinesis Retention and Replay Model
Default retention is 24 hours (configurable up to 365 days for extra cost). Records are stored by shard, ordered within a shard, and consumed via sequence number position. Consumers can replay from any sequence number within the retention window โ this is Kinesis's answer to Kafka's offset replay model.
The Enhanced Fan-Out feature (EFO) is critical for multi-consumer scenarios. Without EFO, all consumers share the 2 MB/s per-shard read limit โ add a third consumer and each gets ~670 KB/s max. With EFO, each registered consumer gets a dedicated 2 MB/s. EFO costs ~$0.015/consumer-shard-hour extra but is essential when multiple independent services consume the same stream.
Kinesis Data Firehose: The Delivery Partner
Firehose is Kinesis's built-in sink layer โ it batches records and delivers to S3, Redshift, OpenSearch, Splunk, or Snowflake with optional Lambda transformation. For the "stream events to S3 for batch processing" pattern, Firehose handles buffering, compression, and retry automatically. A very common pattern: Kinesis Data Streams โ Firehose โ S3 Parquet (with Firehose doing the Parquet conversion). No custom consumer code, no cluster to manage.
Amazon MSK: Kafka Without the ZooKeeper Nightmares
Amazon Managed Streaming for Apache Kafka (MSK) runs a managed Apache Kafka cluster on your behalf โ you specify broker type, number of brokers, storage, and MSK handles provisioning, patching, monitoring, and broker replacement. You get full Kafka API compatibility: any Kafka producer library, Kafka Streams, Kafka Connect, Flink with Kafka connector, and ksqlDB all work against MSK exactly as they do against self-managed Kafka.
MSK Serverless
MSK Serverless (GA 2022) removes the broker management entirely โ you pay per GB ingested/egressed with no shard or partition provisioning. The trade-off: limited configurability (max 200 partitions per topic, max 200 MB/s total throughput) and potentially higher per-GB cost at high volumes compared to provisioned MSK. Best for variable, unpredictable throughput where you want zero infrastructure management.
MSK Connect
MSK Connect is a managed Kafka Connect service โ run Kafka connectors (Debezium for CDC, S3 Sink Connector, JDBC Source Connector) without managing a Connect cluster. This is particularly useful for CDC pipelines: Debezium on MSK Connect reading from RDS/Aurora PostgreSQL and writing to MSK, then consumed by Flink or Spark.
The Decision Table
| Factor | Kinesis Data Streams | MSK (Kafka) |
|---|---|---|
| API compatibility | AWS proprietary | Apache Kafka (open standard) |
| Consumer types | KCL, Lambda, Firehose, KDA | Any Kafka client (Python, Java, Flink, Spark, ksqlDB...) |
| Max throughput per shard/partition | 1 MB/s write, 2 MB/s read | Up to ~50 MB/s per partition (broker-dependent) |
| Scaling model | Add/split shards (minutes) | Add partitions (fast) or brokers (minutes) |
| Retention max | 365 days (extra cost) | Unlimited (limited by storage) |
| Operational overhead | Very low (fully managed) | Low-medium (provisioned) / Low (serverless) |
| Ecosystem lock-in | High (AWS-specific) | Low (open standard) |
| CDC (Debezium) support | Via Lambda workaround | โ Native (MSK Connect) |
| Cost at 100 MB/s sustained | ~$300โ450/month | ~$200โ400/month (provisioned) |
The practical decision rule: Choose Kinesis if your consumers are primarily AWS services (Lambda, Firehose, Kinesis Data Analytics) and you don't need Kafka API compatibility. Choose MSK if you need CDC (Debezium), Kafka Streams for stateful processing, Flink with native Kafka connectors, or portability across clouds. The ecosystem argument is real: if you later need ksqlDB, Kafka MirrorMaker, or Flink's exactly-once Kafka sink, you'll be glad you chose Kafka.
Real-World Cost Comparison
Let's use a concrete example: 50 MB/s average inbound throughput, 3 consumers, 7-day retention, us-east-1.
Kinesis:
- Shards needed: 50 MB/s รท 1 MB/s = 50 shards
- Shard cost: 50 ร $0.015 ร 24 ร 30 = $540/month
- Ingestion: 50 MB/s ร 86,400s ร 30 = ~129 TB/month ร $0.014 = $1,805/month
- EFO (3 consumers): 3 ร 50 ร $0.015 ร 24 ร 30 = $1,620/month
- 7-day retention extension: ~$240/month
- Total: ~$4,205/month
MSK Provisioned (kafka.m5.2xlarge, 3 brokers):
- Broker cost: 3 ร $0.476/hr ร 24 ร 30 = $1,026/month
- Storage (7 days ร 129 TB incoming โ ~900 GB stored compressed): ~$90/month
- Data transfer: minimal within VPC
- Total: ~$1,116/month
At significant throughput (50 MB/s+), MSK is dramatically cheaper than Kinesis, primarily because Kinesis's per-GB ingestion charge dominates. At low throughput (<5 MB/s), Kinesis is often cheaper and simpler. The crossover is roughly at 10โ15 MB/s sustained throughput.
from confluent_kafka import Consumer, KafkaError
import json
# MSK consumer with TLS (same code works against any Kafka)
conf = {
'bootstrap.servers': 'b-1.mycluster.kafka.us-east-1.amazonaws.com:9096',
'security.protocol': 'SASL_SSL',
'sasl.mechanism': 'SCRAM-SHA-512',
'sasl.username': 'service_account',
'sasl.password': 'SECRET',
'group.id': 'analytics-consumer',
'auto.offset.reset': 'earliest',
'enable.auto.commit': False # manual commit for exactly-once semantics
}
consumer = Consumer(conf)
consumer.subscribe(['orders-topic'])
while True:
msg = consumer.poll(1.0)
if msg and not msg.error():
record = json.loads(msg.value().decode('utf-8'))
process_order(record)
consumer.commit() # commit after successful processing
Kinesis for simple AWS-native event routing; MSK for complex streaming architectures with diverse consumers. The portability and ecosystem richness of Kafka become increasingly valuable as your streaming architecture matures โ it's much easier to add a Flink job to an existing MSK cluster than to migrate a Kinesis-native architecture to Kafka three years later.