How to reduce AWS Kinesis costs

At high volumes, Kinesis is one of the more expensive AWS services. The example we'll look at in this post is a 1 million 1kb records per second stream with 7 day retention, whose sticker price is over $800k per year (provisioned mode) or $4.2 million per year (on-demand mode). (This is in us-west-2, though you can expect similar prices elsewhere. Also see the AWS official pricing and note that your organization may have different discounts in place.)

Breaking down the above numbers a bit, we see that for on-demand mode, bandwidth costs dominate ($0.08/GB for data ingested and $0.04/GB to read), amounting to over $3.6 million annually. You're also hit with an extra charge when storing data more than 24 hours. Beyond 24 hours (but less than 7 days), it's nearly 5x more expensive than S3. After 7 days, the storage cost is equal to S3 standard's worst price: $0.023/GB per month.

If your org is looking to save money, you probably shouldn't be using on-demand mode for Kinesis at scale. You're better off using provisioned mode and adjusting the number of shards as needed; these take effect in minutes. This does require monitoring your application to know when to to scale it up or down, making it operationally more complicated, but it's not rocket science. Even at this low volume, savings of millions of dollars per year is possible.

For provisioned mode, the > $800k/yr cost is due to a per-shard cost, the more expensive storage cost beyond 24 hours, and "PUT payload unit" usage. Breaking this down:

Each shard can do 1000 records/s writes and 2000 records/s reads. To get a million records per second written, we need 1000 shards, and it's advisable to leave some buffer (say 20%), so let's call it 1200 shards, costing > $157k per year ($10.95/mo per shard x 1200 x 12).
There's more expensive storage during the week before it reverts to ~ S3 standard. S3 standard would cost $158,400 per year (1GB/s x seconds per week x S3 per-GB cost), but Kinesis with 7-day retention in our example costs over $210,000 per year.
Each record written to the stream is rounded up to the nearest 25kb increment, called a "PUT payload unit", and you're charged $0.014 per million units. This is not much, but since it's rounded up, even a 1kb record costs the same as a 25kb record. Our 1 million records/s stream costs about $440k/yr in PUT payload unit costs.

To get additional cost savings with provisioned mode, a basic idea is to create aggregate records which are lists of records, and compress them before shipping to Kinesis. If you hit the sweet spot of records just shy of 25kb, you're paying little in PUT payload unit costs and also reducing the number of records per second as far as Kinesis is concerned.

This can result in savings, but comes with a few caveats:

It's somewhat limited in its effectiveness since provisioned shards are still limited to 1MiB/s ingest and 2MiB/s read capacity and PutRecords batches are also limited to 10MiB.
Achieving good batching into aggregate records can be fairly complicated to achieve in real workloads. For instance, our 1 million records / s stream might be due to 1 million users each publishing 1 event per second to the stream. We don't want each publisher to linger for 1000 seconds to collect a 1000 element batch. It's possible to coalesce and aggregate concurrent requests from multiple users, but this sort of interesting ingest service isn't something that comes out of the box and it's nontrivial to implement well.
Dealing with aggregate compressed records can also sometimes be more complicated for consumers. It's not bad if you fully control the consumers and don't mind writing a little custom code, but this isn't always the case. (For instance, you may be at an organization where many teams are consuming records from the same streams, and you can't stop the world and get them all to upgrade.)
Another unfortunate aspect of dealing with aggregate records: if any consumers need to read historical data from the stream, AWS can no longer directly tell you which "physical" record contains the logical record you're interested in.

Kinesis over S3

If the above techniques don't sound appealing, there's another option available: a Kinesis-compatible API that uses S3 for storage and runs on your infrastructure. Yes, this is a product we make. It's a good idea, and has some serious advantages for this use case:

S3 has no bandwidth charges in-region; it's a fixed per-request cost to read/write a file to S3. This avoids on-demand Kinesis' bandwith costs and provisioned mode's PUT payload unit costs.
S3's cost per GB-month is excellent, with a variety of options including intelligent tiering. You win handily on storage costs vs AWS Kinesis and don't need to set up a separate process to send data from Kinesis to S3.
Because shards write large pages of records, per-shard capacity can be much higher, no longer limited to 1MiB/s and 1000 records. This reduces the number of shards needed, sometimes by a factor of 10 or more.
Because the data goes directly to S3, records can be any size, and batches of records can be any size (no longer limited to 500 elements or 10MiB).

Since the implementation is API-compatible with Kinesis, code changes aren't necessary, and we make it easy to seamlessly try out and migrate existing data streams, without downtime or gaps in processing.

There's a fair amount of engineering that goes into this. Sign up below if you're interested in early access or want to see a whitepaper on the design.