Rate limits
How the gateway throttles requests and budgets token usage.
Last updated 18 Apr 2026
Rate limits exist at three levels — per key, per tenant, per appliance — and are enforced by token-bucket counters inside the gateway.
Defaults
| Scope | Requests | Tokens |
|---|---|---|
| Per key | 60 req / min | No default; set via budget: scope |
| Per tenant | 600 req / min | No default |
| Per appliance | 1 200 req / min | Model-specific (set at deploy) |
Tenants can raise per-key limits by editing the virtual key. Appliance-level limits are set by the deployment engineer based on the hardware SKU.
Budgets
Budgets are enforced per calendar period in the appliance’s local time
zone. The gateway increments a counter on every completion and checks the
counter before accepting the next request. When a budget is exhausted the
gateway returns HTTP 429 with X-Operayde-Budget: exceeded and an advisory
retry-after in seconds.
429 handling
HTTP/1.1 429 Too Many Requests
Retry-After: 18
X-Operayde-Budget: exceeded
Content-Type: application/json
{
"error": {
"code": "budget_exhausted",
"message": "Key 'research-team' has spent its daily budget.",
"resets_at": "2026-04-19T00:00:00Z"
}
}Well-behaved clients should exponentially back off using Retry-After. The
error code is stable; the message is for humans.