Multi-Key Rotation
Every provider env var in FreeLLM accepts a comma-separated list of API keys. FreeLLM rotates through them with independent per-key rate-limit budgets, and concurrent requests spread across keys automatically.
This is the feature that makes FreeLLM structurally different from every other LLM gateway. No one else does this because they all assume you pay per token, where multi-key has no upside. For a free-tier gateway, it’s a 3×–10× capacity multiplier.
How it works
# Single key (works as before)GROQ_API_KEY=gsk_key1
# Four keys = 4× the free-tier capacityGROQ_API_KEY=gsk_key1,gsk_key2,gsk_key3,gsk_key4When the request comes in:
- FreeLLM picks the next key via round-robin (synchronously advanced so concurrent requests spread across keys)
- The chosen key’s rate-limit window is checked. If full, FreeLLM tries the next key
- The request is sent with the picked key’s
Authorization: Bearer <key>header - If the response is
429, only that key is cooled down. The other keys remain available - The provider is only excluded from routing when all of its keys are exhausted
Real-world math
Stack 3 keys per provider across all 5 cloud providers:
| Provider | Per-key RPM | Keys | Combined RPM |
|---|---|---|---|
| Groq | 30 | 3 | 90 |
| Gemini | 15 | 3 | 45 |
| Mistral | 5 | 3 | 15 |
| Cerebras | 30 | 3 | 90 |
| NVIDIA NIM | 40 | 3 | 120 |
| Total | ~360 req/min |
That’s ~360 requests/minute of free inference, including frontier models like Llama 3.3 70B, Gemini 2.5 Pro, and DeepSeek R1. All $0.
Concurrency safety
The killer detail: each upstream Response object is mapped to the tracking ID of the key that produced it via a WeakMap<Response, string>. When the router calls provider.onRateLimit(response), the provider looks up the exact key from that WeakMap. So even with concurrent requests, a 429 on key #0 can never accidentally cool down key #1.
Per-key observability
The GET /v1/status endpoint exposes per-key state for every provider:
{ "id": "groq", "keyCount": 4, "keysAvailable": 4, "keys": [ { "index": 0, "rateLimited": false, "requestsInWindow": 12, "maxRequests": 28 }, { "index": 1, "rateLimited": false, "requestsInWindow": 11, "maxRequests": 28 }, { "index": 2, "rateLimited": true, "requestsInWindow": 28, "maxRequests": 28, "retryAfterMs": 42000 }, { "index": 3, "rateLimited": false, "requestsInWindow": 12, "maxRequests": 28 } ]}The dashboard surfaces this as a keysAvailable / keyCount badge on each provider card.
Ethical considerations
Using multiple free-tier accounts may violate some providers’ terms of service. Check each provider’s ToS before stacking keys. FreeLLM is a tool. You’re responsible for how you use it. For most personal/dev use cases, multiple personal accounts under different emails are fine.