Skip to content

Browser integration

FreeLLM ships a browser token flow that lets you call the gateway from the browser without shipping a master key. Your backend mints a short-lived, stateless HMAC token bound to a specific origin, and the browser uses it as a normal Authorization: Bearer credential until it expires.

The short version

The gateway exposes a POST /v1/tokens/issue endpoint that takes a master or virtual key and returns a string of the form flt.<payload>.<signature>. The payload encodes an origin, an optional identifier, and an expiry of at most 15 minutes. The gateway verifies the signature and the browser’s Origin header on every request.

This is for apps that want to talk to an LLM from the browser (chat widgets, playgrounds, client-side tools) without running their own proxy. Your backend keeps the real key. The browser never sees it.

How it works

  1. Minter. Your backend holds a FreeLLM master or virtual key. When an authenticated user loads the page, your backend calls /v1/tokens/issue with the user’s origin and an identifier that maps to that user.
  2. Token. The gateway signs a JSON payload with FREELLM_TOKEN_SECRET using HMAC and returns flt.<payload>.<signature>. It is stateless. The gateway does not store it.
  3. Browser. Your frontend gets the token, calls https://your-gateway/v1/chat/completions with Authorization: Bearer flt.xxx.yyy, and the browser automatically sets the Origin header.
  4. Verify. The gateway decodes the token, checks the HMAC, confirms the expiry, and compares the browser’s Origin header to the origin embedded in the token. If any check fails the request is rejected with 401. If they pass, the gateway applies the identifier’s per-user rate-limit bucket and proxies to the upstream provider.

Minting a token from your backend

The contract is the same regardless of language. POST to /v1/tokens/issue with a bearer key, pass origin, and optionally identifier and ttlSeconds.

Node.js (serverless function, no dependencies)

// POST /api/freellm-token
// Returns { token, expiresAt } to the logged-in user.
export default async function handler(req, res) {
const session = await getSession(req); // your auth
if (!session) return res.status(401).json({ error: "not signed in" });
const response = await fetch("https://your-gateway.example.com/v1/tokens/issue", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.FREELLM_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
origin: "https://yoursite.com",
identifier: `user:${session.userId}`,
ttlSeconds: 900,
}),
});
if (!response.ok) {
const body = await response.text();
return res.status(502).json({ error: "token mint failed", detail: body });
}
const { token, expiresAt } = await response.json();
res.status(200).json({ token, expiresAt });
}

Python (Flask)

import os
import requests
from flask import Flask, jsonify, request, abort
app = Flask(__name__)
@app.post("/api/freellm-token")
def freellm_token():
session = get_session(request) # your auth
if not session:
abort(401)
r = requests.post(
"https://your-gateway.example.com/v1/tokens/issue",
headers={
"Authorization": f"Bearer {os.environ['FREELLM_API_KEY']}",
"Content-Type": "application/json",
},
json={
"origin": "https://yoursite.com",
"identifier": f"user:{session['user_id']}",
"ttlSeconds": 900,
},
timeout=10,
)
if r.status_code != 200:
return jsonify({"error": "token mint failed", "detail": r.text}), 502
data = r.json()
return jsonify({"token": data["token"], "expiresAt": data["expiresAt"]})

Both responses look like:

{
"token": "flt.eyJ2Ijox...aBcD",
"expiresAt": "2026-04-09T07:15:00.000Z",
"identifier": "user:42",
"origin": "https://yoursite.com"
}

Using the token in the browser

Here is a minimal page that loads the official openai SDK from esm.sh, fetches a token from your backend, and re-mints when the token expires. The SDK refuses to run in the browser without dangerouslyAllowBrowser: true, so that flag is required.

<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<title>FreeLLM browser demo</title>
</head>
<body>
<button id="ask">Ask</button>
<pre id="out"></pre>
<script type="module">
import OpenAI from "https://esm.sh/openai@4";
const GATEWAY = "https://your-gateway.example.com/v1";
let token = null;
let expiresAt = 0;
async function getToken() {
const now = Date.now();
if (token && now < expiresAt - 30_000) return token;
const r = await fetch("/api/freellm-token", { credentials: "include" });
if (!r.ok) throw new Error("failed to mint token");
const data = await r.json();
token = data.token;
expiresAt = new Date(data.expiresAt).getTime();
return token;
}
function makeClient(bearer) {
return new OpenAI({
apiKey: bearer,
baseURL: GATEWAY,
dangerouslyAllowBrowser: true,
});
}
document.getElementById("ask").addEventListener("click", async () => {
const out = document.getElementById("out");
out.textContent = "...";
try {
let client = makeClient(await getToken());
let res;
try {
res = await client.chat.completions.create({
model: "free-fast",
messages: [{ role: "user", content: "Say hi in one word." }],
});
} catch (err) {
// Token may have expired mid-flight. Mint a fresh one and retry once.
if (err?.status === 401) {
token = null;
client = makeClient(await getToken());
res = await client.chat.completions.create({
model: "free-fast",
messages: [{ role: "user", content: "Say hi in one word." }],
});
} else {
throw err;
}
}
out.textContent = res.choices[0].message.content;
} catch (err) {
out.textContent = String(err);
}
});
</script>
</body>
</html>

The browser sets the Origin header automatically. You do not need to touch it.

Security model

  • Max token lifetime is 15 minutes. ttlSeconds defaults to 900 and is capped at 900. The gateway will not mint longer-lived tokens regardless of what you pass.
  • Origin binding. Every token embeds exactly one origin. The gateway compares it against the browser’s Origin header on every request. A mismatch is a 401.
  • Per-identifier rate limiting. If you set identifier, the gateway infers it from the token on each request and applies the same per-user bucket it would for a normal X-FreeLLM-Identifier header. You do not set the identifier header in the browser.
  • No cross-origin replay, within the honor system of the browser. A token leaked from https://yoursite.com cannot be replayed from https://evil.example.com through a standards-compliant browser, because the browser will stamp the real Origin. A non-browser client can forge any Origin header it wants. Treat the origin check as a browser-side safety belt, not a server-side authentication factor. The 15 minute TTL and the identifier bucket are the real damage limit.
  • Secret rotation invalidates outstanding tokens. Rotating FREELLM_TOKEN_SECRET invalidates every unexpired token immediately, because the HMAC will no longer verify. Use this as your emergency kill switch.
  • If FREELLM_TOKEN_SECRET is unset or shorter than 32 bytes, the gateway refuses to mint tokens on /v1/tokens/issue and refuses to accept any flt.* bearer token. Browser integration is off until you configure the secret.

Common pitfalls

  • The origin must match exactly. Protocol, host, and port all have to line up. https://yoursite.com and https://www.yoursite.com are different origins. http://localhost:3000 and http://localhost:5173 are different origins. Mint separate tokens for each.
  • dangerouslyAllowBrowser: true is required with the OpenAI SDK. Without it the SDK throws at construction time. The name is the SDK authors’ warning about shipping raw OpenAI keys to the browser. With FreeLLM browser tokens the risk surface is already bounded, but the flag is still required.
  • Do not let the browser pick its own origin. Your backend is the source of truth for which origin a token is bound to. Hardcode it or look it up from a server-side allowlist. Never forward a user-controlled string into the origin field of the mint request.
  • CORS still has to allow the origin. Browser tokens do not bypass CORS. The gateway’s ALLOWED_ORIGINS environment variable must include every origin you mint tokens for. See the Configuration reference for the exact format.
  • A leaked token is bounded but not harmless. The blast radius is at most 15 minutes, one origin, and the rate-limit budget of one identifier. Set identifier buckets tight enough that a leak is annoying, not catastrophic.