Deployment
Production checklist — sizing, scaling, idle eviction, observability, secrets.
This page is the punchlist for shipping agent-webkit-server to production. None of it is exotic; all of it matters.
Sizing one host
Each session owns:
- One
ClaudeSDKClientPython process and one Claude Code CLI subprocess (~50–150 MB resident). - One
EventLogring buffer (~100 KB at 1000 events). - Zero or more SSE subscriber tasks.
CPU is bursty (active turns) but cheap (waiting on the model). The hard cap is memory. As a rule of thumb:
- 4 vCPU, 8 GB: 30–50 concurrent active sessions comfortably.
- 8 vCPU, 16 GB: 80–120 concurrent active sessions.
These are starting points. Profile your workload — sessions with heavy MCP servers attached use more memory.
Idle eviction
The default idle_timeout_s is 300 (5 minutes). A session is evicted when:
- 5 minutes have passed since the last input and
- It has zero current subscribers.
A live SSE stream counts as a subscriber, so a tab in the foreground keeps the session warm even if the user is AFK.
Tune based on your UX:
SessionRegistry(sdk_factory, idle_timeout_s=900.0) # 15 minutesLonger timeouts use more memory; shorter ones cause more "session expired" UX. Start at 300, raise if users complain about losing context.
Concurrency limits
Don't accept unlimited session creation. A reasonable backstop:
from fastapi import HTTPException
MAX_SESSIONS_PER_HOST = 100
@app.middleware("http")
async def cap_sessions(request, call_next):
if request.url.path == "/sessions" and request.method == "POST":
if registry.live_count() >= MAX_SESSIONS_PER_HOST:
raise HTTPException(503, "Server at capacity")
return await call_next(request)Pair this with autoscaling on your platform. The signal isn't CPU — it's session count.
Reverse proxy
SSE-friendly settings (Nginx example):
location /sessions/ {
proxy_pass http://upstream;
proxy_http_version 1.1;
proxy_buffering off;
proxy_request_buffering off;
proxy_read_timeout 600s;
proxy_set_header Last-Event-ID $http_last_event_id;
}For ALB/NLB: bump idle timeout to 600s+. The 15s :keepalive frame keeps connections warm but only if the proxy doesn't strip comment frames (most don't).
Auth in production
AuthConfig.from_env() and a static token is fine for service-to-service. For browser clients, terminate JWT/OIDC at the edge and pass through a stable token, or write a FastAPI dependency that validates the user's JWT and attaches the user to request.state.
Don't disable auth in production. AuthConfig(disabled=True) is dev only.
Observability
Useful spans / labels:
session_id(in path)correlation_id(for permission/question RPC)event_seq(the SSEid, for resume diagnostics)protocol_version(always1.0, but pin it for forward-compat)
Useful metrics:
agent_webkit.sessions.live(gauge)agent_webkit.sessions.created(counter)agent_webkit.sessions.evicted(counter, by reason:idle,closed,error)agent_webkit.permissions.first_reply_winner(counter, bysubscriber_id) — surfaces multi-tab races.agent_webkit.resume.412(counter) — buffer-too-small signal.
Frequent agent_webkit.resume.412 means your ring buffer is undersized for your typical disconnect duration.
Multi-host
Out of the box: pin sessions to hosts via sticky routing. Use the Postgres adapter for failover.
Don't try to share sessions across hosts via Redis pub/sub or similar — there's no clean way to share a ClaudeSDKClient subprocess. The session is anchored to the host that owns the subprocess.
Secrets
The Claude Agent SDK needs credentials. The standard knobs:
CLAUDE_CODE_OAUTH_TOKEN— generated byclaude setup-token.ANTHROPIC_API_KEY— direct API auth.
Inject via your platform's secret manager. Don't bake into the image.
agent-webkit-server itself only needs AGENT_WEBKIT_AUTH_TOKEN (the bearer token clients present). Generate per-environment.
Health checks
Add your own:
@app.get("/healthz")
def healthz():
return {"ok": True, "live_sessions": registry.live_count()}A liveness probe should hit /healthz. A readiness probe should additionally check Postgres connectivity if you're using the adapter.
Deploys
ClaudeSDKClient subprocesses don't survive process restart. To deploy without losing in-flight sessions:
- Roll new instances behind sticky routing.
- Drain old instances by stopping new session creation but keeping streams alive until they idle out.
- Force-kill anything left after the drain window (e.g. 10 minutes).
A blunt restart is also fine for many workloads — users see "session expired", refresh, and re-send. Pick the trade based on UX expectations.
Where to next
- Postgres sessions — for failover.
- Resume & reconnect — for client-side resilience.
- FAQ — for things that bite you in production.