FreeLLM FreeLLM ⭐ Star on GitHub
v1.6.0 · JSON schema validation + tool-call routing live

You shouldn't need a credit card
to call an LLM.

One OpenAI-compatible endpoint. Eight free LLM providers. One rate-limits, the next one answers. Stack 3 keys per provider and hit ~450 free requests per minute. All $0.

MIT licensed · No credit card · 32+ models · Self-hosted
FreeLLM dashboard showing live request tracking across 8 LLM providers
8
Providers
32+
Models
450
Req/min, free
with 3 keys/provider
$0
Forever
Built for free tiers

Everything you need to run LLMs for free

Routing, failover, caching, and observability built around the constraints of free-tier providers. Every feature solves a real free-tier problem.

Drop-in OpenAI SDK

Change one line. Your base URL. Your existing OpenAI SDK code keeps working against eight providers.

Automatic failover

Groq rate-limited? Your request silently routes to Gemini, then Mistral, then Cerebras. You stop seeing 429s.

Multi-key rotation

Set GROQ_API_KEY=k1,k2,k3 and stack multiple keys per provider. Each key gets its own rate-limit budget.

Token tracking

Rolling 24-hour token counts per provider. Always know how much of your free budget is left.

Circuit breakers

Per-provider health tracking with three states. Failures stay contained, recovery is automatic.

Three meta-models

Pick a strategy, not a provider. free-fast for speed, free-smart for reasoning, free for max uptime.

Real-time dashboard

Built-in dashboard shows provider health, live request log, latency, and token usage at a glance.

Truly $0

Every provider runs on its free tier. No markup, no subscription, no surprise bills. Self-host in 2 minutes.

Response caching

Identical prompts return in ~23ms with zero provider quota burn. 9× faster than the cold path. SHA-256 keyed, LRU eviction, configurable TTL.

Eight free tiers

Stitched into one endpoint

Sign up for whichever providers you want. Paste the keys into one .env file. FreeLLM handles the routing, the rate limits, and the failover.

Groq

~30 req/min per key

Llama 3.3 70B, Llama 3.1 8B, Llama 4 Scout, Qwen3 32B

Gemini

~15 req/min per key

Gemini 2.5 Flash, 2.5 Pro, 2.0 Flash, 2.0 Flash Lite

Mistral

~5 req/min per key

Mistral Small, Medium, Nemo

Cerebras

~30 req/min per key

Llama 3.1 8B, Qwen3 235B, GPT-OSS 120B

NVIDIA NIM

~40 req/min per key

Llama 3.3 70B, Llama 3.1 405B, Nemotron 70B, DeepSeek R1

GitHub Models

~15 req/min per key

GPT-4o-mini, Phi-4, Llama 3.3 70B, Mistral Large

Cloudflare AI

~10 req/min per key

Llama 3.3 70B, Llama 3.1 8B, Mistral 7B

Ollama

Unlimited (local)

Any local model on your hardware

Baseline: ~150 req/min with 1 key each.
With 3 keys per provider: ~450 req/min. All $0.
Drop-in replacement

Change one line. Keep your code.

Any OpenAI-compatible SDK works. Swap your base URL. That's it.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="unused",
)

response = client.chat.completions.create(
    model="free-fast",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)
print("Provider used:", response.x_freellm_provider)
vs the alternatives

The free-tier-first LLM gateway

Other gateways assume you pay per token. FreeLLM assumes you don't. That's why multi-key rotation and zero-markup pricing exist here and nowhere else.

Feature FreeLLM LiteLLM OpenRouter Portkey
Truly $0 (no markup, no subscription) Self-host
Multi-key rotation per provider
OpenAI-compatible
Automatic failover
Built-in real-time dashboard
Response caching (zero quota burn) Plugin
Per-provider token tracking
Circuit breakers Partial
Self-hosted Both
TypeScript codebase (auditable) ?
One-click cloud deploy n/a

Worst case: you delete it.

Two minutes to deploy, $0 to run. The decision takes longer than the setup.