API quickstart
From zero to streaming chat in five minutes.
Last updated 17 May 2026
This guide gets you from a virtual key to a working integration in five minutes. You will create a virtual key, send a chat completion, stream a response, and generate embeddings.
Prerequisites
- An enrolled Operayde appliance reachable at
https://appliance.<your-domain>. - A
tenant-adminaccount to create virtual keys.
1. Get a virtual key
Open the operator portal at https://portal.operayde.com, navigate to
Keys > Create key, and create a key with the following settings:
- Label:
quickstart - Allowed models: select all available models
- RPM limit: 60
- TPD limit: 1,000,000
Copy the key secret. It looks like op_live_26f2_f9b8c4e2aa1345b7d3f27a901c55d88a.
Set it as an environment variable:
export OPERAYDE_KEY="op_live_26f2_..."
export OPERAYDE_BASE="https://appliance.example.com/v1"2. Send a chat request
curl
curl -s -X POST "$OPERAYDE_BASE/chat/completions" \
-H "Authorization: Bearer $OPERAYDE_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "operayde/instruct-13b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Operayde?"}
]
}' | jq .Python
import os
import requests
base = os.environ["OPERAYDE_BASE"]
key = os.environ["OPERAYDE_KEY"]
resp = requests.post(
f"{base}/chat/completions",
headers={"Authorization": f"Bearer {key}"},
json={
"model": "operayde/instruct-13b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Operayde?"},
],
},
)
data = resp.json()
print(data["choices"][0]["message"]["content"])TypeScript
const base = process.env.OPERAYDE_BASE!;
const key = process.env.OPERAYDE_KEY!;
const resp = await fetch(`${base}/chat/completions`, {
method: "POST",
headers: {
Authorization: `Bearer ${key}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "operayde/instruct-13b",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is Operayde?" },
],
}),
});
const data = await resp.json();
console.log(data.choices[0].message.content);3. Stream a response
Set "stream": true to receive Server-Sent Events as the model generates
tokens.
curl
curl -s -N -X POST "$OPERAYDE_BASE/chat/completions" \
-H "Authorization: Bearer $OPERAYDE_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "operayde/instruct-13b",
"messages": [{"role": "user", "content": "Write a haiku about data privacy."}],
"stream": true
}'Each SSE event contains a JSON chunk:
data: {"id":"chat-abc123","choices":[{"delta":{"content":"Your"},"index":0}]}
data: {"id":"chat-abc123","choices":[{"delta":{"content":" data"},"index":0}]}
...
data: [DONE]
Python (streaming)
import os
import requests
import json
base = os.environ["OPERAYDE_BASE"]
key = os.environ["OPERAYDE_KEY"]
resp = requests.post(
f"{base}/chat/completions",
headers={"Authorization": f"Bearer {key}"},
json={
"model": "operayde/instruct-13b",
"messages": [{"role": "user", "content": "Write a haiku about data privacy."}],
"stream": True,
},
stream=True,
)
for line in resp.iter_lines():
if not line:
continue
text = line.decode("utf-8")
if text.startswith("data: ") and text != "data: [DONE]":
chunk = json.loads(text[6:])
delta = chunk["choices"][0].get("delta", {})
if "content" in delta:
print(delta["content"], end="", flush=True)
print()TypeScript (streaming)
const resp = await fetch(`${base}/chat/completions`, {
method: "POST",
headers: {
Authorization: `Bearer ${key}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "operayde/instruct-13b",
messages: [{ role: "user", content: "Write a haiku about data privacy." }],
stream: true,
}),
});
const reader = resp.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
for (const line of text.split("\n")) {
if (line.startsWith("data: ") && line !== "data: [DONE]") {
const chunk = JSON.parse(line.slice(6));
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
}
}4. Generate embeddings
curl -s -X POST "$OPERAYDE_BASE/embeddings" \
-H "Authorization: Bearer $OPERAYDE_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "operayde/embed-bge",
"input": [
"Operayde keeps your data on-premise.",
"Virtual keys control API access."
]
}' | jq '.data[0].embedding[:5]'The response contains a dense vector for each input string:
{
"model": "operayde/embed-bge",
"data": [
{"index": 0, "embedding": [0.0123, -0.0456, ...]},
{"index": 1, "embedding": [0.0789, 0.0012, ...]}
]
}5. Using with OpenAI client libraries
Since the gateway is OpenAI-compatible, you can use existing client libraries by changing the base URL.
Python (openai library)
import os
from openai import OpenAI
client = OpenAI(
base_url=os.environ["OPERAYDE_BASE"],
api_key=os.environ["OPERAYDE_KEY"],
)
response = client.chat.completions.create(
model="operayde/instruct-13b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain data residency in one sentence."},
],
)
print(response.choices[0].message.content)TypeScript (openai library)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: process.env.OPERAYDE_BASE,
apiKey: process.env.OPERAYDE_KEY,
});
const response = await client.chat.completions.create({
model: "operayde/instruct-13b",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain data residency in one sentence." },
],
});
console.log(response.choices[0].message.content);Error handling
All errors follow a standard envelope:
{
"error": {
"code": "policy_denied",
"message": "Key 'quickstart' lacks scope for model 'operayde/large-70b'.",
"type": "policy",
"request_id": "req_01HZ..."
}
}| HTTP status | Code | Meaning |
|---|---|---|
| 400 | invalid_request | Malformed JSON or missing required fields |
| 401 | unauthorized | Missing, expired, or revoked virtual key |
| 403 | policy_denied | OPA policy denied the request |
| 429 | rate_limited | RPM or TPD budget exhausted |
| 500 | internal_error | Server error — retry with backoff |
| 503 | model_unavailable | Requested model not loaded on this appliance |
Next steps
- Explore the full Gateway API reference for all endpoints and options.
- Set up per-team virtual keys with appropriate rate limits.
- Configure PII redaction to protect sensitive data.
- Review rate limits to understand budget enforcement.