You shouldn't need a credit card
to call an LLM.
One OpenAI-compatible endpoint. Six free LLM providers. When one rate-limits, the next one answers. Stack 3 keys per provider and you get ~360 free requests per minute. All $0.
Everything you need to run LLMs for free
Routing, observability, and recovery built specifically for free-tier providers. Not a generic gateway with free tier as an afterthought.
Drop-in OpenAI SDK
Change one line. Your base URL. Your existing OpenAI SDK code keeps working against six providers.
Automatic failover
Groq rate-limited? Your request silently routes to Gemini, then Mistral, then Cerebras. You stop seeing 429s.
Multi-key rotation
Set GROQ_API_KEY=k1,k2,k3 and stack multiple keys per provider. Each key gets its own rate-limit budget.
Token tracking
Rolling 24-hour token counts per provider. Always know how much of your free budget is left.
Circuit breakers
Per-provider health tracking with three states. Failures stay contained, recovery is automatic.
Three meta-models
Pick a strategy, not a provider. free-fast for speed, free-smart for reasoning, free for max uptime.
Real-time dashboard
Built-in dashboard shows provider health, live request log, latency, and token usage at a glance.
Truly $0
Every provider runs on its free tier. No markup, no subscription, no surprise bills. Self-host in 2 minutes.
Response caching
Identical prompts return in ~23ms with zero provider quota burn. 9× faster than the cold path. SHA-256 keyed, LRU eviction, configurable TTL.
Stitched into one endpoint
Sign up for whichever providers you want. Paste the keys into one .env file. FreeLLM handles the routing, the rate limits, and the failover.
Groq
~30 req/min per key
Llama 3.3 70B, Llama 3.1 8B, Llama 4 Scout, Qwen3 32B
Gemini
~15 req/min per key
Gemini 2.5 Flash, 2.5 Pro, 2.0 Flash, 2.0 Flash Lite
Mistral
~5 req/min per key
Mistral Small, Medium, Nemo
Cerebras
~30 req/min per key
Llama 3.1 8B, Qwen3 235B, GPT-OSS 120B
NVIDIA NIM
~40 req/min per key
Llama 3.3 70B, Llama 3.1 405B, Nemotron 70B, DeepSeek R1
Ollama
Unlimited (local)
Any local model on your hardware
With 3 keys per provider: ~360 req/min. All $0.
Change one line. Keep your code.
Any OpenAI-compatible SDK works. Swap your base URL. That's it.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3000/v1",
api_key="unused",
)
response = client.chat.completions.create(
model="free-fast",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
print("Provider used:", response.x_freellm_provider) import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:3000/v1",
apiKey: "unused",
});
const response = await client.chat.completions.create({
model: "free-smart",
messages: [{ role: "user", content: "Explain recursion." }],
});
console.log(response.choices[0].message.content); curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "free",
"messages": [{"role": "user", "content": "Hello!"}]
}' The free-tier-first LLM gateway
Other gateways assume you pay per token. FreeLLM assumes you don't. That's why multi-key rotation and zero-markup pricing exist here and nowhere else.
| Feature | FreeLLM | LiteLLM | OpenRouter | Portkey |
|---|---|---|---|---|
| Truly $0 (no markup, no subscription) | ✓ | Self-host | — | — |
| Multi-key rotation per provider | ✓ | — | — | — |
| OpenAI-compatible | ✓ | ✓ | ✓ | ✓ |
| Automatic failover | ✓ | ✓ | ✓ | ✓ |
| Built-in real-time dashboard | ✓ | — | ✓ | ✓ |
| Response caching (zero quota burn) | ✓ | Plugin | — | ✓ |
| Per-provider token tracking | ✓ | ✓ | ✓ | ✓ |
| Circuit breakers | ✓ | Partial | ✓ | ✓ |
| Self-hosted | ✓ | ✓ | — | Both |
| TypeScript codebase (auditable) | ✓ | — | ? | — |
| One-click cloud deploy | ✓ | — | n/a | — |
Stop paying for prototypes.
One-click deploy to Railway or Render. Bring your own free-tier keys. Live in 2 minutes.