API Reference
FreeLLM exposes an OpenAI-compatible API. Use any OpenAI SDK by setting base_url to your gateway address. All endpoints are available at both /v1/... (direct) and /api/v1/... (proxied via the dashboard).
Endpoints
| Method | Endpoint | Description |
|---|---|---|
POST | /v1/chat/completions | Chat completion (streaming and non-streaming) |
GET | /v1/models | List all available models + meta-models |
GET | /v1/status | Gateway health, provider states, per-key state, token usage, recent requests |
POST | /v1/status/providers/{id}/reset | Force-reset a provider’s circuit breaker |
PATCH | /v1/status/routing | Switch between round_robin and random routing |
GET | /healthz | Simple health check (used by load balancers and Docker) |
Chat completion
POST /v1/chat/completionsContent-Type: application/json
{ "model": "free-fast", "messages": [ {"role": "user", "content": "Hello!"} ], "stream": false, "temperature": 0.7, "max_tokens": 1024}Response shape:
{ "id": "chatcmpl-...", "object": "chat.completion", "created": 1700000000, "model": "llama-3.3-70b-versatile", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hi there!" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15 }, "x_freellm_provider": "groq"}The x_freellm_provider field is FreeLLM-specific and tells you which upstream provider handled the request.
Status endpoint
GET /v1/status returns the full gateway state including provider health, per-key rate-limit windows, token usage totals, and recent requests. See Multi-Key Rotation and Token Usage Tracking for examples of the response shape.