Deployment

Production checklist — sizing, scaling, idle eviction, observability, secrets.

This page is the punchlist for shipping agent-webkit-server to production. None of it is exotic; all of it matters.

Sizing one host

Each session owns:

One ClaudeSDKClient Python process and one Claude Code CLI subprocess (~50–150 MB resident).
One EventLog ring buffer (~100 KB at 1000 events).
Zero or more SSE subscriber tasks.

CPU is bursty (active turns) but cheap (waiting on the model). The hard cap is memory. As a rule of thumb:

4 vCPU, 8 GB: 30–50 concurrent active sessions comfortably.
8 vCPU, 16 GB: 80–120 concurrent active sessions.

These are starting points. Profile your workload — sessions with heavy MCP servers attached use more memory.

Idle eviction

The default idle_timeout_s is 300 (5 minutes). A session is evicted when:

5 minutes have passed since the last input and
It has zero current subscribers.

A live SSE stream counts as a subscriber, so a tab in the foreground keeps the session warm even if the user is AFK.

Tune based on your UX:

SessionRegistry(sdk_factory, idle_timeout_s=900.0)  # 15 minutes

Longer timeouts use more memory; shorter ones cause more "session expired" UX. Start at 300, raise if users complain about losing context.

Concurrency limits

Don't accept unlimited session creation. A reasonable backstop:

from fastapi import HTTPException

MAX_SESSIONS_PER_HOST = 100

@app.middleware("http")
async def cap_sessions(request, call_next):
    if request.url.path == "/sessions" and request.method == "POST":
        if registry.live_count() >= MAX_SESSIONS_PER_HOST:
            raise HTTPException(503, "Server at capacity")
    return await call_next(request)

Pair this with autoscaling on your platform. The signal isn't CPU — it's session count.

Reverse proxy

SSE-friendly settings (Nginx example):

location /sessions/ {
    proxy_pass http://upstream;
    proxy_http_version 1.1;
    proxy_buffering off;
    proxy_request_buffering off;
    proxy_read_timeout 600s;
    proxy_set_header Last-Event-ID $http_last_event_id;
}

For ALB/NLB: bump idle timeout to 600s+. The 15s :keepalive frame keeps connections warm but only if the proxy doesn't strip comment frames (most don't).

AuthConfig.from_env() and a static token is fine for service-to-service. For browser clients, terminate JWT/OIDC at the edge and pass through a stable token, or write a FastAPI dependency that validates the user's JWT and attaches the user to request.state.

Don't disable auth in production. AuthConfig(disabled=True) is dev only.

Observability

Useful spans / labels:

session_id (in path)
correlation_id (for permission/question RPC)
event_seq (the SSE id, for resume diagnostics)
protocol_version (always 1.0, but pin it for forward-compat)

Useful metrics:

agent_webkit.sessions.live (gauge)
agent_webkit.sessions.created (counter)
agent_webkit.sessions.evicted (counter, by reason: idle, closed, error)
agent_webkit.permissions.first_reply_winner (counter, by subscriber_id) — surfaces multi-tab races.
agent_webkit.resume.412 (counter) — buffer-too-small signal.

Frequent agent_webkit.resume.412 means your ring buffer is undersized for your typical disconnect duration.

Multi-host

Out of the box: pin sessions to hosts via sticky routing. Use the Postgres adapter for failover.

Don't try to share sessions across hosts via Redis pub/sub or similar — there's no clean way to share a ClaudeSDKClient subprocess. The session is anchored to the host that owns the subprocess.

Secrets

The Claude Agent SDK needs credentials. The standard knobs:

CLAUDE_CODE_OAUTH_TOKEN — generated by claude setup-token.
ANTHROPIC_API_KEY — direct API auth.

Inject via your platform's secret manager. Don't bake into the image.

agent-webkit-server itself only needs AGENT_WEBKIT_AUTH_TOKEN (the bearer token clients present). Generate per-environment.

Health checks

Add your own:

@app.get("/healthz")
def healthz():
    return {"ok": True, "live_sessions": registry.live_count()}

A liveness probe should hit /healthz. A readiness probe should additionally check Postgres connectivity if you're using the adapter.

Deploys

ClaudeSDKClient subprocesses don't survive process restart. To deploy without losing in-flight sessions:

Roll new instances behind sticky routing.
Drain old instances by stopping new session creation but keeping streams alive until they idle out.
Force-kill anything left after the drain window (e.g. 10 minutes).

A blunt restart is also fine for many workloads — users see "session expired", refresh, and re-send. Pick the trade based on UX expectations.

Where to next

Postgres sessions — for failover.
Resume & reconnect — for client-side resilience.
FAQ — for things that bite you in production.