You’ve built a slick application on Azure Cosmos DB. It’s fast, it’s scalable, and everything is humming along. Then, peak hour hits. Suddenly, your logs are flooded with the dreaded 429 Too Many Requests error, and your application’s performance grinds to a halt.

What’s going on? You check your container’s metrics, and you’ve got plenty of overall throughput to spare!

Welcome to the most misunderstood concept in Cosmos DB performance: throttling happens at the physical partition, not the container. To fix it, you need to understand the two main tools at your disposal: Autoscale and Burst Capacity. Let’s break down the simple math behind them so you can choose the right tool for the job.

The Golden Rule: It’s All About the Physical Partitions

Before we talk about solutions, we have to internalize one rule: Cosmos DB enforces throughput limits on each physical partition individually.

Think of your total provisioned throughput (your RU/s) as a big pizza. Cosmos DB slices it up and gives a piece to each physical partition. A physical partition is a real “slice” of the server that stores your data and has a hard cap of 10,000 RU/s.

If you provision 20,000 RU/s and Cosmos DB splits your data across two physical partitions, each one gets a budget of roughly 10,000 RU/s. If all your requests suddenly hammer one of those partitions with 12,000 RU/s of traffic, it will get throttled—even though the other partition is sitting idle. The container’s “headroom” can’t save a single, overloaded partition.

With that in mind, let’s look at our tools.

Tool #1: Autoscale (The Automatic Transmission)

Burst Capacity vs Autoscale: throttle math For Azure CosmosDB | GigXP.com

GigXP.com

Bright Mode

Burst Capacity vs Autoscale: CosmosDB Throttle Math

Interactive Planner

Model per-partition RU behavior, burst buckets, and autoscale caps. Add skew to simulate hot partitions and instantly see allowed vs throttled RU/s.

1) Inputs

Mode

Container RU/s (manual)

Total RU/s provisioned on the container.

Idle minutes before spike (for Burst bucket)

Fills up to 5 min. Not applicable if partition share ≥ 3000 RU/s.

Physical partitions (approx) Incoming load (RU/s, total) Load Distribution

2) Summary

—

Total Allowed RU/s

Total Throttled RU/s

Container Throttle Share

Tip: Reduce throttling by spreading hot keys (e.g., HPK) or raising capacity. Burst is a short-spike shock absorber.

3) Per-partition details

Manual

#	Load Lᵢ	Allowed	Throttled	P (share)	Burst B

How the math is applied

Partitioning: Throttling is per physical partition. Load is split based on your distribution model (even/hot).
Manual mode: Each partition’s share is P = ContainerRU / parts. If P < 3000 and there's idle time, a burst bucket B is available. Allowed RU/s is capped at min(3000, Lᵢ) while B lasts.
Autoscale mode: Per-partition cap is approximated as Tmax/parts. Allowed RU/s is min(Lᵢ, Tmax/parts).
Throttled RU/s: For any partition, this is max(0, Lᵢ − Allowed). High throttling indicates a need to re-key or increase capacity.

Runbook text (copy/paste)

Notes & FAQ

This planner is a simplification for capacity conversations. Cosmos DB may allocate throughput dynamically; always validate with partition-level metrics in your environment. Key takeaways:

Burst capacity buffers short spikes only (per partition, up to 300s of idle × P) and caps at 3000 RU/s per partition while bursting.
Autoscale right-sizes between 10%×Tmax and Tmax. Throttling can still happen on a hot partition if its load exceeds its share of Tmax.
Hierarchical Partition Keys (HPK) are the most effective way to reduce skew and avoid throttling on hot partitions.

Reminder: If a partition's share P ≥ 3000 RU/s, that partition is not eligible for burst. Only provisioned capacity matters.

Autoscale is like the automatic transmission in your car. You tell it the maximum speed you ever want to go (Tmax), and it automatically shifts gears for you, scaling your RU/s between 10% and 100% of that Tmax based on demand. It’s fantastic for handling predictable, rolling traffic patterns like daily peaks and troughs.

The Math You Can Use

The capacity for any single partition is roughly your total max throughput divided by the number of partitions.

Partition Capacity \approx Number of Partitions (N) T ^{ma x}

Example 1: Smooth Sailing
- You set Tmax = 50,000 RU/s and have 5 physical partitions.
- Each partition can handle spikes up to 50,000 / 5 = 10,000 RU/s.
- If one partition gets a sudden burst of 8,000 RU/s, it’s no problem! Autoscale handles it.
Example 2: The Traffic Jam
- Same setup (Tmax = 50,000, N = 5), so each partition has a 10,000 RU/s ceiling.
- A “hot partition” gets slammed with a 15,000 RU/s spike.
- Result: You’ll be throttled for 15,000 - 10,000 = 5,000 RU/s, even though your container could scale to 50,000 RU/s in total.

Key Takeaway: Autoscale is great for overall capacity management, but it does not solve a fundamental hot partition problem.

Tool #2: Burst Capacity (The Power Bank)

Burst capacity is like a portable power bank for your phone. It’s a fantastic shock absorber for small, unexpected spikes.

Here’s how it works:

Charging: When a partition is idle, it saves up its unused capacity. It can bank up to 5 minutes’ worth of its provisioned throughput.
Spending: When a sudden spike in traffic hits, the partition can spend its saved-up capacity to serve requests up to a hard limit of 3,000 RU/s.
The Catch: This feature only works for partitions whose provisioned throughput is less than 3,000 RU/s.

The Math You Can Use

Example A: The Perfect Use Case (A Wide, Short Spike)
- You have a container with 8,000 RU/s and 4 partitions. Each partition gets 8,000 / 4 = 2,000 RU/s. This is less than 3,000, so burst is eligible!
- After being idle, each partition has banked plenty of capacity.
- A one-second spike sends 2,500 RU/s of traffic to each partition.
- Result: No problem! Each partition uses its base 2,000 RU/s and “bursts” by spending an extra 500 RU from its saved bank. No throttling occurs.
Example B: Where Burst Can’t Help (An Extreme Hot Spot)
- Same setup (2,000 RU/s per partition).
- A massive spike sends 10,000 RU/s to a single partition.
- Result: The partition will serve requests up to its burst maximum of 3,000 RU/s. The remaining 10,000 - 3,000 = 7,000 RU/s is throttled immediately.

Key Takeaway: Burst is a lifesaver for brief, unexpected spikes across many partitions. It cannot fix a severe hot partition.

Which Should You Choose? A Quick Guide

If your workload looks like this…	The best tool is…
📈 Variable traffic with predictable daily peaks	Autoscale. Set `Tmax` to handle your highest peak, and let it scale down automatically.
⚡️ Short, infrequent spikes on an otherwise quiet system	Burst Capacity. It’s a free, built-in cushion that smooths out bumps without you paying for higher throughput.
🌋 Constant 429s on just one or two partition keys	Neither. This is a data modeling problem. You need to fix your partition key strategy to spread the load.
🌪️ Variable traffic AND short, spiky bursts	Both! Use Autoscale to manage the overall load and let Burst handle the small, second-to-second jitters.

Final Thoughts: Model Your Load Correctly

Stop thinking about your container’s total throughput. Start thinking about the peak load on your busiest physical partition.

Autoscale is your cruise control for the whole highway.
Burst Capacity is the shock absorber for individual lanes.
A good partition key is what prevents one lane from getting all the traffic in the first place.

By modeling the load on a per-partition basis, you can finally understand why throttling happens and use the right combination of Autoscale, Burst, and smart data modeling to eliminate those 429 errors for good.

Disclaimer: The Questions and Answers provided on https://gigxp.com are for general information purposes only. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose.