OpenStack API rate limits on DEDL

Learn what API rate limits are, how throttling works, and how to design automation that avoids them.

API rate limits define how many requests a user, service, or application can send to an API within a specific time window. They protect shared infrastructure from overload and ensure fair access for all tenants.

On the DestinE Data Lake (DEDL) platform, these limits apply to all OpenStack API endpoints. Each operation you perform — listing instances, creating volumes, or deleting objects — consumes part of your request quota.

When automated scripts, CI/CD pipelines, or monitoring tools send too many requests in a short period of time, OpenStack services respond with HTTP 429 Too Many Requests, temporarily blocking further actions and interrupting your workflow.

What we are going to cover

What is throttling?

Throttling is the temporary restriction applied when the API detects that your request rate has exceeded the allowed limit. During throttling, OpenStack rejects new requests until the current time window resets.

When throttling begins, the API returns HTTP 429 Too Many Requests responses instead of processing further operations. Once the x-ratelimit-reset time has passed, the window resets and new requests are accepted.

Throttling is not an error — it’s a built-in safeguard that keeps OpenStack services stable and responsive for everyone.

When rate limits become a problem

For users performing a few actions manually through the Horizon dashboard, rate limits are rarely noticeable. They mainly affect automated or high-frequency access to the OpenStack APIs.

Typical scenarios include:

Automation and scripting

When using openstack CLI, Ansible, Terraform, or custom scripts that create, list, or delete resources in loops or batches. These tools can generate bursts of requests that exceed per-second or per-minute thresholds.

Monitoring and integrations

When external systems repeatedly poll APIs to check instance or volume status, often at short intervals.

Shared project activity

In multi-user projects, one user’s heavy API activity can consume the shared quota and slow down others working within the same project.

Troubleshooting or retries

Frequent or automated retries during debugging can quickly reach the rate limit and trigger HTTP 429 Too Many Requests responses. See section Strategies to avoid throttling for mitigation tips.

Understanding rate-limiting windows

In API rate limiting, a window is the time frame during which all your requests are counted before the counter resets. When the window resets, your request quota starts over.

Common examples include:

  • Per-second window – counts how many requests you make each second (for example, w=1).

  • Per-minute window – counts how many requests you make in 60 seconds (for example, w=60).

Some OpenStack services, such as Nova, may enforce both 1-second and 60-second windows simultaneously, while others use only one. The smaller window (for example, per-second) usually controls how quickly you can send bursts of requests.

These window types appear in the rate-limit response headers described later in this article.

What is limited, and how?

There are two layers of limits, both enforced at the same time. If you exceed either, the request is rejected (see What happens when the limit is reached).

  1. Per-Service limits (global): cap the total request rate a given OpenStack service (e.g., Nova, Neutron, Cinder, Glance, Keystone) will accept from everyone combined.

  2. Per-Project (tenant) limits: cap the request rate per OpenStack project — shared across all users, apps, and tokens scoped to that project.

Requests are classified by method:

Type

Methods

Description

read

GET

Simple read operations

read-heavy

GET (e.g., Nova /v2.1/servers/detail)

Expensive list or detail queries

mutating

POST, PUT, PATCH, DELETE

Write or modify operations

default

Fallback for other request types

Per-Service (global) limits

The first layer of limits applies globally to each OpenStack service. These limits represent the maximum combined request rate accepted across all projects, users, and tokens for a given service (Nova, Cinder, Glance, Keystone, etc.).

If the global per-service limit is exceeded, all requests to that service are throttled, even if an individual project is still below its own project-level quota.

Per-Service (global) limits

Type

Limit

Description

default

1000 req/s

Fallback if none of the above

mutating

100 req/s

POST, PUT, PATCH, DELETE

read

1000 req/s

All GET (except read-heavy below)

read-heavy

100 req/s

Nova GET /v2.1/servers/detail

Per-Project (tenant) limits

The second layer of limits applies to each individual OpenStack project (tenant). These values restrict the total request rate shared by all users, apps, and tokens scoped to the same project.

Even if the global service-level limit is not reached, you can still hit the per-project cap if your project generates too many requests in a fixed window (see section Rate-limit response headers).

Per-project limits are measured over per-minute windows unless otherwise stated.

Per-Project (tenant) limits

Type

Limit

Description

default

100 req/min

Fallback if none of the above

mutating

200 req/min

POST, PUT, PATCH, DELETE

read-heavy

300 req/min

Nova GET /v2.1/servers/detail

read

500 req/min

All GET (except read-heavy below)

Each limit applies over a specific time window (for example, one second or one minute), visible in the rate-limit response headers.

Note

Per-Project limits operate within a given service endpoint – for example, exceeding the limit for Nova does not affect the limit for Cinder.

Rate-limit response headers

Every response includes headers showing your current limit status:

x-ratelimit-limit: 300, 100;w=1, 300;w=60
x-ratelimit-remaining: 96
x-ratelimit-reset: 42

How to read these headers:

  • x-ratelimit-limit – the active limit and a comma-separated list of possible limits. * Entries with ;w=1 are per-second windows. * Entries with ;w=60 are per-minute windows. * If multiple are shown, the tightest window controls when you’ll be throttled next.

  • x-ratelimit-remaining – how many requests remain in the most constrained window.

  • x-ratelimit-reset – seconds until the current window resets.

What happens when the limit is reached?

  • The API returns 429 Too Many Requests.

  • The response includes the rate-limit headers so you can decide when to retry.

  • The response also includes x-envoy-ratelimited: true.

  • No API action is performed for that request.

Example response:

< HTTP/1.1 429 Too Many Requests
< x-envoy-ratelimited: true
< x-ratelimit-limit: 300, 100;w=1, 300;w=60
< x-ratelimit-remaining: 0
< x-ratelimit-reset: 42

After the time specified in x-ratelimit-reset has elapsed, requests can be retried. To prevent this situation, spread requests evenly and implement automatic retries based on the reset time.

Strategies to avoid throttling

To keep your automation reliable and compliant with rate limits, follow these best practices:

  1. Handle 429 responses gracefully – read the x-ratelimit-reset header and retry after the specified time.

  2. Pace your requests – add short delays between successive API calls.

  3. Avoid unnecessary polling – cache results instead of repeatedly querying the same endpoints.

  4. Distribute activity – schedule heavy automation jobs sequentially rather than concurrently.

What To Do Next

For articles using OpenStack CLI commands, see section OPENSTACK CLI.

As an introduction to automation, you may want to follow article OPENSTACK DEV.