Share You’ve built a slick application on Azure Cosmos DB. It’s fast, it’s scalable, and everything is humming along. Then, peak hour hits. Suddenly, your logs are flooded with the dreaded 429 Too Many Requests error, and your application’s performance grinds to a halt. What’s going on? You check your container’s metrics, and you’ve got plenty of overall throughput to spare! Welcome to the most misunderstood concept in Cosmos DB performance: throttling happens at the physical partition, not the container. To fix it, you need to understand the two main tools at your disposal: Autoscale and Burst Capacity. Let’s break down the simple math behind them so you can choose the right tool for the job. The Golden Rule: It’s All About the Physical Partitions Before we talk about solutions, we have to internalize one rule: Cosmos DB enforces throughput limits on each physical partition individually. Think of your total provisioned throughput (your RU/s) as a big pizza. Cosmos DB slices it up and gives a piece to each physical partition. A physical partition is a real “slice” of the server that stores your data and has a hard cap of 10,000 RU/s. If you provision 20,000 RU/s and Cosmos DB splits your data across two physical partitions, each one gets a budget of roughly 10,000 RU/s. If all your requests suddenly hammer one of those partitions with 12,000 RU/s of traffic, it will get throttled—even though the other partition is sitting idle. The container’s “headroom” can’t save a single, overloaded partition. With that in mind, let’s look at our tools. Tool #1: Autoscale (The Automatic Transmission) Burst Capacity vs Autoscale: throttle math For Azure CosmosDB | GigXP.com GigXP.com Bright Mode Burst Capacity vs Autoscale: CosmosDB Throttle Math Interactive Planner Model per-partition RU behavior, burst buckets, and autoscale caps. Add skew to simulate hot partitions and instantly see allowed vs throttled RU/s. Preset: 8k Manual • 4 parts • even • 10k/s load Preset: 8k Manual • 4 parts • hot 100% • 10k/s load Preset: Autoscale 50k • 5 parts • hot 60% • 35k/s 1) Inputs Reset Mode Manual Throughput (+ optional Burst) Autoscale (Tmax) Container RU/s (manual) Total RU/s provisioned on the container. Idle minutes before spike (for Burst bucket) Fills up to 5 min. Not applicable if partition share ≥ 3000 RU/s. Autoscale Tmax (RU/s) Cosmos scales between 10% × Tmax and Tmax. Physical partitions (approx) Incoming load (RU/s, total) Load Distribution Even across all partitions Hot partition (one gets X%) % of total load on the hot partition Analyze Generate runbook 2) Summary— Total Allowed RU/s 0 Total Throttled RU/s 0 Container Throttle Share Tip: Reduce throttling by spreading hot keys (e.g., HPK) or raising capacity. Burst is a short-spike shock absorber. 3) Per-partition detailsManual # Load Lᵢ Allowed Throttled P (share) Burst B How the math is applied Partitioning: Throttling is per physical partition. Load is split based on your distribution model (even/hot). Manual mode: Each partition’s share is P = ContainerRU / parts. If P < 3000 and there's idle time, a burst bucket B is available. Allowed RU/s is capped at min(3000, Lᵢ) while B lasts. Autoscale mode: Per-partition cap is approximated as Tmax/parts. Allowed RU/s is min(Lᵢ, Tmax/parts). Throttled RU/s: For any partition, this is max(0, Lᵢ − Allowed). High throttling indicates a need to re-key or increase capacity. Runbook text (copy/paste)Copy Notes & FAQ This planner is a simplification for capacity conversations. Cosmos DB may allocate throughput dynamically; always validate with partition-level metrics in your environment. Key takeaways: Burst capacity buffers short spikes only (per partition, up to 300s of idle × P) and caps at 3000 RU/s per partition while bursting. Autoscale right-sizes between 10%×Tmax and Tmax. Throttling can still happen on a hot partition if its load exceeds its share of Tmax. Hierarchical Partition Keys (HPK) are the most effective way to reduce skew and avoid throttling on hot partitions. Reminder: If a partition's share P ≥ 3000 RU/s, that partition is not eligible for burst. Only provisioned capacity matters. Autoscale is like the automatic transmission in your car. You tell it the maximum speed you ever want to go (Tmax), and it automatically shifts gears for you, scaling your RU/s between 10% and 100% of that Tmax based on demand. It’s fantastic for handling predictable, rolling traffic patterns like daily peaks and troughs. The Math You Can Use The capacity for any single partition is roughly your total max throughput divided by the number of partitions. Partition Capacity≈Number of Partitions (N)Tmax Example 1: Smooth Sailing You set Tmax = 50,000 RU/s and have 5 physical partitions. Each partition can handle spikes up to 50,000 / 5 = 10,000 RU/s. If one partition gets a sudden burst of 8,000 RU/s, it’s no problem! Autoscale handles it. Example 2: The Traffic Jam Same setup (Tmax = 50,000, N = 5), so each partition has a 10,000 RU/s ceiling. A “hot partition” gets slammed with a 15,000 RU/s spike. Result: You’ll be throttled for 15,000 - 10,000 = 5,000 RU/s, even though your container could scale to 50,000 RU/s in total. Key Takeaway: Autoscale is great for overall capacity management, but it does not solve a fundamental hot partition problem. Tool #2: Burst Capacity (The Power Bank) Burst capacity is like a portable power bank for your phone. It’s a fantastic shock absorber for small, unexpected spikes. Here’s how it works: Charging: When a partition is idle, it saves up its unused capacity. It can bank up to 5 minutes’ worth of its provisioned throughput. Spending: When a sudden spike in traffic hits, the partition can spend its saved-up capacity to serve requests up to a hard limit of 3,000 RU/s. The Catch: This feature only works for partitions whose provisioned throughput is less than 3,000 RU/s. The Math You Can Use Example A: The Perfect Use Case (A Wide, Short Spike) You have a container with 8,000 RU/s and 4 partitions. Each partition gets 8,000 / 4 = 2,000 RU/s. This is less than 3,000, so burst is eligible! After being idle, each partition has banked plenty of capacity. A one-second spike sends 2,500 RU/s of traffic to each partition. Result: No problem! Each partition uses its base 2,000 RU/s and “bursts” by spending an extra 500 RU from its saved bank. No throttling occurs. Example B: Where Burst Can’t Help (An Extreme Hot Spot) Same setup (2,000 RU/s per partition). A massive spike sends 10,000 RU/s to a single partition. Result: The partition will serve requests up to its burst maximum of 3,000 RU/s. The remaining 10,000 - 3,000 = 7,000 RU/s is throttled immediately. Key Takeaway: Burst is a lifesaver for brief, unexpected spikes across many partitions. It cannot fix a severe hot partition. Which Should You Choose? A Quick Guide If your workload looks like this… The best tool is… 📈 Variable traffic with predictable daily peaks Autoscale. Set Tmax to handle your highest peak, and let it scale down automatically. ⚡️ Short, infrequent spikes on an otherwise quiet system Burst Capacity. It’s a free, built-in cushion that smooths out bumps without you paying for higher throughput. 🌋 Constant 429s on just one or two partition keys Neither. This is a data modeling problem. You need to fix your partition key strategy to spread the load. 🌪️ Variable traffic AND short, spiky bursts Both! Use Autoscale to manage the overall load and let Burst handle the small, second-to-second jitters. Final Thoughts: Model Your Load Correctly Stop thinking about your container’s total throughput. Start thinking about the peak load on your busiest physical partition. Autoscale is your cruise control for the whole highway. Burst Capacity is the shock absorber for individual lanes. A good partition key is what prevents one lane from getting all the traffic in the first place. By modeling the load on a per-partition basis, you can finally understand why throttling happens and use the right combination of Autoscale, Burst, and smart data modeling to eliminate those 429 errors for good. Disclaimer: The Questions and Answers provided on https://gigxp.com are for general information purposes only. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Share What's your reaction? Excited 0 Happy 0 In Love 0 Not Sure 0 Silly 0 IG Website Twitter
SQL Server SQL Server 2025 Migration Guide: EoL Changes & Deprecations The release of SQL Server 2025 represents a major strategic shift for the platform. While ...
SQL Server SQL Server HA vs. DR – Which One to Choose – Checklist & Guide Ensuring your SQL Server data is always available is critical, but navigating the complex world ...
SQL Server What is the SQL Server SA (Sys Admin) Password Policy & Default Password If you are searching for the SQL Server SA (Sys Admin) Password and Policy enforcement, ...
TSQL How to Effectively split a string using STRING_SPLIT function in SQL server Even though it’s flashy and glamorous being a Data Engineer these days, it comes with ...
SQL Server How to do SQL Server Hybrid Backup to URL on Azure Storage in 2019 The SQL Server hybrid backup to URL feature offers flexibility to customers. It provides an ...
Azure Azure SQL Connection Pooling Best Practices Pool Size & Exhaustion In this blog post, we will be sharing the Azure SQL Connection Pooling best practices. ...
SQL Server SQL Server 2019 System Requirements – Hardware & Software Prereqs In this article, we will be sharing the SQL Server 2019 System Requirements based on ...
Interview Questions T-SQL Performance Tuning on Bulk Load Data in SQL Server 2017 In this article, we are going to talk about a specific scenario where T-SQL Performance ...
SQL Server How To Check & Fix Index Fragmentation on SQL Server via Script & ssms Before we learn to fix Index fragmentation on SQL Server, let us understand why this ...
SQL Server Fixing SQL Server Max Worker Threads: error: 35217, severity: 16, state: 1 If you try fixing SQL Server Max Worker Threads for errors such as: “The thread pool ...
SQL Server How To Do GPU Offloading in SQL Server 2017 For Parallelism The idea of having GPU Offloading in SQL Server 2017 is quite attractive. It might ...
SQL Server Unable to create databases in SQL Server Management Studio 2016 For Dynamic 365 While working on the “Bring Your Own Database to Azure” functionality in Microsoft Dynamics 365 ...