FreeLLM FreeLLM ⭐ Star on GitHub
v1.3.0 · Response caching is live

You shouldn't need a credit card
to call an LLM.

One OpenAI-compatible endpoint. Six free LLM providers. When one rate-limits, the next one answers. Stack 3 keys per provider and you get ~360 free requests per minute. All $0.

MIT licensed · No credit card · Self-hosted
FreeLLM dashboard showing live request tracking across 6 LLM providers
6
Providers
25+
Models
360
Req/min, free
with 3 keys/provider
$0
Forever
Built for free tiers

Everything you need to run LLMs for free

Routing, observability, and recovery built specifically for free-tier providers. Not a generic gateway with free tier as an afterthought.

Drop-in OpenAI SDK

Change one line. Your base URL. Your existing OpenAI SDK code keeps working against six providers.

Automatic failover

Groq rate-limited? Your request silently routes to Gemini, then Mistral, then Cerebras. You stop seeing 429s.

Multi-key rotation

Set GROQ_API_KEY=k1,k2,k3 and stack multiple keys per provider. Each key gets its own rate-limit budget.

Token tracking

Rolling 24-hour token counts per provider. Always know how much of your free budget is left.

Circuit breakers

Per-provider health tracking with three states. Failures stay contained, recovery is automatic.

Three meta-models

Pick a strategy, not a provider. free-fast for speed, free-smart for reasoning, free for max uptime.

Real-time dashboard

Built-in dashboard shows provider health, live request log, latency, and token usage at a glance.

Truly $0

Every provider runs on its free tier. No markup, no subscription, no surprise bills. Self-host in 2 minutes.

Response caching

Identical prompts return in ~23ms with zero provider quota burn. 9× faster than the cold path. SHA-256 keyed, LRU eviction, configurable TTL.

Six free tiers

Stitched into one endpoint

Sign up for whichever providers you want. Paste the keys into one .env file. FreeLLM handles the routing, the rate limits, and the failover.

Groq

~30 req/min per key

Llama 3.3 70B, Llama 3.1 8B, Llama 4 Scout, Qwen3 32B

Gemini

~15 req/min per key

Gemini 2.5 Flash, 2.5 Pro, 2.0 Flash, 2.0 Flash Lite

Mistral

~5 req/min per key

Mistral Small, Medium, Nemo

Cerebras

~30 req/min per key

Llama 3.1 8B, Qwen3 235B, GPT-OSS 120B

NVIDIA NIM

~40 req/min per key

Llama 3.3 70B, Llama 3.1 405B, Nemotron 70B, DeepSeek R1

Ollama

Unlimited (local)

Any local model on your hardware

Baseline: ~120 req/min with 1 key each.
With 3 keys per provider: ~360 req/min. All $0.
Drop-in replacement

Change one line. Keep your code.

Any OpenAI-compatible SDK works. Swap your base URL. That's it.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="unused",
)

response = client.chat.completions.create(
    model="free-fast",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)
print("Provider used:", response.x_freellm_provider)
vs the alternatives

The free-tier-first LLM gateway

Other gateways assume you pay per token. FreeLLM assumes you don't. That's why multi-key rotation and zero-markup pricing exist here and nowhere else.

Feature FreeLLM LiteLLM OpenRouter Portkey
Truly $0 (no markup, no subscription) Self-host
Multi-key rotation per provider
OpenAI-compatible
Automatic failover
Built-in real-time dashboard
Response caching (zero quota burn) Plugin
Per-provider token tracking
Circuit breakers Partial
Self-hosted Both
TypeScript codebase (auditable) ?
One-click cloud deploy n/a

Stop paying for prototypes.

One-click deploy to Railway or Render. Bring your own free-tier keys. Live in 2 minutes.