Gateway

Rate limits

How the gateway throttles requests and budgets token usage.

Last updated 18 Apr 2026

Rate limits exist at three levels — per key, per tenant, per appliance — and are enforced by token-bucket counters inside the gateway.

Defaults

Scope	Requests	Tokens
Per key	60 req / min	No default; set via `budget:` scope
Per tenant	600 req / min	No default
Per appliance	1 200 req / min	Model-specific (set at deploy)

Tenants can raise per-key limits by editing the virtual key. Appliance-level limits are set by the deployment engineer based on the hardware SKU.

Budgets

Budgets are enforced per calendar period in the appliance’s local time zone. The gateway increments a counter on every completion and checks the counter before accepting the next request. When a budget is exhausted the gateway returns HTTP 429 with X-Operayde-Budget: exceeded and an advisory retry-after in seconds.

429 handling

HTTP/1.1 429 Too Many Requests
Retry-After: 18
X-Operayde-Budget: exceeded
Content-Type: application/json
 
{
  "error": {
    "code": "budget_exhausted",
    "message": "Key 'research-team' has spent its daily budget.",
    "resets_at": "2026-04-19T00:00:00Z"
  }
}

Well-behaved clients should exponentially back off using Retry-After. The error code is stable; the message is for humans.