v1.3.0 · Response caching is live

You shouldn't need a credit card
to call an LLM.

One OpenAI-compatible endpoint. Six free LLM providers. When one rate-limits, the next one answers. Stack 3 keys per provider and you get ~360 free requests per minute. All $0.

Deploy in 2 minutes Star on GitHub

MIT licensed · No credit card · Self-hosted

Providers

25+

Models

360

Req/min, free

with 3 keys/provider

Forever

Built for free tiers

Everything you need to run LLMs for free

Routing, observability, and recovery built specifically for free-tier providers. Not a generic gateway with free tier as an afterthought.

Drop-in OpenAI SDK

Change one line. Your base URL. Your existing OpenAI SDK code keeps working against six providers.

Automatic failover

Groq rate-limited? Your request silently routes to Gemini, then Mistral, then Cerebras. You stop seeing 429s.

Multi-key rotation

Set GROQ_API_KEY=k1,k2,k3 and stack multiple keys per provider. Each key gets its own rate-limit budget.

Token tracking

Rolling 24-hour token counts per provider. Always know how much of your free budget is left.

Circuit breakers

Per-provider health tracking with three states. Failures stay contained, recovery is automatic.

Three meta-models

Pick a strategy, not a provider. free-fast for speed, free-smart for reasoning, free for max uptime.

Real-time dashboard

Built-in dashboard shows provider health, live request log, latency, and token usage at a glance.

Truly $0

Every provider runs on its free tier. No markup, no subscription, no surprise bills. Self-host in 2 minutes.

Response caching

Identical prompts return in ~23ms with zero provider quota burn. 9× faster than the cold path. SHA-256 keyed, LRU eviction, configurable TTL.

Six free tiers

Stitched into one endpoint

Sign up for whichever providers you want. Paste the keys into one .env file. FreeLLM handles the routing, the rate limits, and the failover.

Groq

~30 req/min per key

Llama 3.3 70B, Llama 3.1 8B, Llama 4 Scout, Qwen3 32B

Gemini

~15 req/min per key

Gemini 2.5 Flash, 2.5 Pro, 2.0 Flash, 2.0 Flash Lite

Mistral

~5 req/min per key

Mistral Small, Medium, Nemo

Cerebras

~30 req/min per key

Llama 3.1 8B, Qwen3 235B, GPT-OSS 120B

NVIDIA NIM

~40 req/min per key

Llama 3.3 70B, Llama 3.1 405B, Nemotron 70B, DeepSeek R1

Ollama

Unlimited (local)

Any local model on your hardware

Baseline: ~120 req/min with 1 key each.
With 3 keys per provider: ~360 req/min. All $0.

Drop-in replacement

Change one line. Keep your code.

Any OpenAI-compatible SDK works. Swap your base URL. That's it.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="unused",
)

response = client.chat.completions.create(
    model="free-fast",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)
print("Provider used:", response.x_freellm_provider)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:3000/v1",
  apiKey: "unused",
});

const response = await client.chat.completions.create({
  model: "free-smart",
  messages: [{ role: "user", content: "Explain recursion." }],
});

console.log(response.choices[0].message.content);

curl http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "free",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

vs the alternatives

The free-tier-first LLM gateway

Other gateways assume you pay per token. FreeLLM assumes you don't. That's why multi-key rotation and zero-markup pricing exist here and nowhere else.

Feature	FreeLLM	LiteLLM	OpenRouter	Portkey
Truly $0 (no markup, no subscription)	✓	Self-host	—	—
Multi-key rotation per provider	✓	—	—	—
OpenAI-compatible	✓	✓	✓	✓
Automatic failover	✓	✓	✓	✓
Built-in real-time dashboard	✓	—	✓	✓
Response caching (zero quota burn)	✓	Plugin	—	✓
Per-provider token tracking	✓	✓	✓	✓
Circuit breakers	✓	Partial	✓	✓
Self-hosted	✓	✓	—	Both
TypeScript codebase (auditable)	✓	—	?	—
One-click cloud deploy	✓	—	n/a	—

Stop paying for prototypes.

One-click deploy to Railway or Render. Bring your own free-tier keys. Live in 2 minutes.

You shouldn't need a credit cardto call an LLM.