Browser integration
FreeLLM ships a browser token flow that lets you call the gateway from the browser without shipping a master key. Your backend mints a short-lived, stateless HMAC token bound to a specific origin, and the browser uses it as a normal Authorization: Bearer credential until it expires.
The short version
The gateway exposes a POST /v1/tokens/issue endpoint that takes a master or virtual key and returns a string of the form flt.<payload>.<signature>. The payload encodes an origin, an optional identifier, and an expiry of at most 15 minutes. The gateway verifies the signature and the browser’s Origin header on every request.
This is for apps that want to talk to an LLM from the browser (chat widgets, playgrounds, client-side tools) without running their own proxy. Your backend keeps the real key. The browser never sees it.
How it works
- Minter. Your backend holds a FreeLLM master or virtual key. When an authenticated user loads the page, your backend calls
/v1/tokens/issuewith the user’s origin and an identifier that maps to that user. - Token. The gateway signs a JSON payload with
FREELLM_TOKEN_SECRETusing HMAC and returnsflt.<payload>.<signature>. It is stateless. The gateway does not store it. - Browser. Your frontend gets the token, calls
https://your-gateway/v1/chat/completionswithAuthorization: Bearer flt.xxx.yyy, and the browser automatically sets theOriginheader. - Verify. The gateway decodes the token, checks the HMAC, confirms the expiry, and compares the browser’s
Originheader to the origin embedded in the token. If any check fails the request is rejected with401. If they pass, the gateway applies the identifier’s per-user rate-limit bucket and proxies to the upstream provider.
Minting a token from your backend
The contract is the same regardless of language. POST to /v1/tokens/issue with a bearer key, pass origin, and optionally identifier and ttlSeconds.
Node.js (serverless function, no dependencies)
// POST /api/freellm-token// Returns { token, expiresAt } to the logged-in user.export default async function handler(req, res) { const session = await getSession(req); // your auth if (!session) return res.status(401).json({ error: "not signed in" });
const response = await fetch("https://your-gateway.example.com/v1/tokens/issue", { method: "POST", headers: { "Authorization": `Bearer ${process.env.FREELLM_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ origin: "https://yoursite.com", identifier: `user:${session.userId}`, ttlSeconds: 900, }), });
if (!response.ok) { const body = await response.text(); return res.status(502).json({ error: "token mint failed", detail: body }); }
const { token, expiresAt } = await response.json(); res.status(200).json({ token, expiresAt });}Python (Flask)
import osimport requestsfrom flask import Flask, jsonify, request, abort
app = Flask(__name__)
@app.post("/api/freellm-token")def freellm_token(): session = get_session(request) # your auth if not session: abort(401)
r = requests.post( "https://your-gateway.example.com/v1/tokens/issue", headers={ "Authorization": f"Bearer {os.environ['FREELLM_API_KEY']}", "Content-Type": "application/json", }, json={ "origin": "https://yoursite.com", "identifier": f"user:{session['user_id']}", "ttlSeconds": 900, }, timeout=10, )
if r.status_code != 200: return jsonify({"error": "token mint failed", "detail": r.text}), 502
data = r.json() return jsonify({"token": data["token"], "expiresAt": data["expiresAt"]})Both responses look like:
{ "token": "flt.eyJ2Ijox...aBcD", "expiresAt": "2026-04-09T07:15:00.000Z", "identifier": "user:42", "origin": "https://yoursite.com"}Using the token in the browser
Here is a minimal page that loads the official openai SDK from esm.sh, fetches a token from your backend, and re-mints when the token expires. The SDK refuses to run in the browser without dangerouslyAllowBrowser: true, so that flag is required.
<!doctype html><html> <head> <meta charset="utf-8" /> <title>FreeLLM browser demo</title> </head> <body> <button id="ask">Ask</button> <pre id="out"></pre>
<script type="module"> import OpenAI from "https://esm.sh/openai@4";
const GATEWAY = "https://your-gateway.example.com/v1"; let token = null; let expiresAt = 0;
async function getToken() { const now = Date.now(); if (token && now < expiresAt - 30_000) return token;
const r = await fetch("/api/freellm-token", { credentials: "include" }); if (!r.ok) throw new Error("failed to mint token"); const data = await r.json();
token = data.token; expiresAt = new Date(data.expiresAt).getTime(); return token; }
function makeClient(bearer) { return new OpenAI({ apiKey: bearer, baseURL: GATEWAY, dangerouslyAllowBrowser: true, }); }
document.getElementById("ask").addEventListener("click", async () => { const out = document.getElementById("out"); out.textContent = "..."; try { let client = makeClient(await getToken()); let res; try { res = await client.chat.completions.create({ model: "free-fast", messages: [{ role: "user", content: "Say hi in one word." }], }); } catch (err) { // Token may have expired mid-flight. Mint a fresh one and retry once. if (err?.status === 401) { token = null; client = makeClient(await getToken()); res = await client.chat.completions.create({ model: "free-fast", messages: [{ role: "user", content: "Say hi in one word." }], }); } else { throw err; } } out.textContent = res.choices[0].message.content; } catch (err) { out.textContent = String(err); } }); </script> </body></html>The browser sets the Origin header automatically. You do not need to touch it.
Security model
- Max token lifetime is 15 minutes.
ttlSecondsdefaults to 900 and is capped at 900. The gateway will not mint longer-lived tokens regardless of what you pass. - Origin binding. Every token embeds exactly one origin. The gateway compares it against the browser’s
Originheader on every request. A mismatch is a401. - Per-identifier rate limiting. If you set
identifier, the gateway infers it from the token on each request and applies the same per-user bucket it would for a normalX-FreeLLM-Identifierheader. You do not set the identifier header in the browser. - No cross-origin replay, within the honor system of the browser. A token leaked from
https://yoursite.comcannot be replayed fromhttps://evil.example.comthrough a standards-compliant browser, because the browser will stamp the realOrigin. A non-browser client can forge anyOriginheader it wants. Treat the origin check as a browser-side safety belt, not a server-side authentication factor. The 15 minute TTL and the identifier bucket are the real damage limit. - Secret rotation invalidates outstanding tokens. Rotating
FREELLM_TOKEN_SECRETinvalidates every unexpired token immediately, because the HMAC will no longer verify. Use this as your emergency kill switch. - If
FREELLM_TOKEN_SECRETis unset or shorter than 32 bytes, the gateway refuses to mint tokens on/v1/tokens/issueand refuses to accept anyflt.*bearer token. Browser integration is off until you configure the secret.
Common pitfalls
- The origin must match exactly. Protocol, host, and port all have to line up.
https://yoursite.comandhttps://www.yoursite.comare different origins.http://localhost:3000andhttp://localhost:5173are different origins. Mint separate tokens for each. dangerouslyAllowBrowser: trueis required with the OpenAI SDK. Without it the SDK throws at construction time. The name is the SDK authors’ warning about shipping raw OpenAI keys to the browser. With FreeLLM browser tokens the risk surface is already bounded, but the flag is still required.- Do not let the browser pick its own
origin. Your backend is the source of truth for which origin a token is bound to. Hardcode it or look it up from a server-side allowlist. Never forward a user-controlled string into theoriginfield of the mint request. - CORS still has to allow the origin. Browser tokens do not bypass CORS. The gateway’s
ALLOWED_ORIGINSenvironment variable must include every origin you mint tokens for. See the Configuration reference for the exact format. - A leaked token is bounded but not harmless. The blast radius is at most 15 minutes, one origin, and the rate-limit budget of one identifier. Set identifier buckets tight enough that a leak is annoying, not catastrophic.