Concepts

The mental model — sessions, fan-out, permission RPC, AskUserQuestion, resume.

agent-webkit is small, but the wire protocol has a few load-bearing ideas. This page walks through them at the level of "what's actually happening on the wire" — useful before reading any of the guides.

Sessions are long-lived

Each call to createAgentClient(...) (or useAgentSession()) hits POST /sessions, which spawns a Claude Agent SDK ClaudeSDKClient. That client owns a Claude Code CLI subprocess. One client per session, not per request — connect() is expensive, and re-paying that cost on every turn would be brutal.

A session lives until:

The client calls DELETE /sessions/{id}, or
The server idle-evicts it (default: 5 minutes since last input or subscriber).

While alive, the session has:

A single inbound queue (user messages, interrupts, permission/question replies).
One in-flight ClaudeSDKClient.query() at a time.
A bounded ring buffer of past SSE events for resume (default: 1000).
Zero or more subscribers tailing the stream.

Two ID schemes coexist

This trips people up so it's worth being explicit:

ID	Source	Used for
`id` (top-level SSE id)	server	`Last-Event-ID` resume only
`message_id`	SDK	reconciling `message_delta` → `message_complete`
`tool_use_id`	SDK	matching `tool_use` to `tool_result`
`correlation_id`	server / SDK	matching `permission_request` to `permission_response`

The top-level id is a server-only monotonic seq — that's what the SSE protocol uses to resume. The other IDs identify content, not events; they survive resume and are how the L2 client knows that two events are about the same message.

Multi-subscriber fan-out

Any number of clients can GET /sessions/{id}/stream simultaneously. Each gets its own cursor into the ring buffer. Reasons this matters:

A user opens the app in two tabs — both should see the same agent stream.
A reconnect after Wi-Fi blip is just a new subscriber with Last-Event-ID: <last_seq>.
A debug tool can attach without disturbing the user's session.

Fan-out is independent at the cursor level — one slow subscriber can't backpressure another.

Permission RPC, on the wire

When the SDK calls can_use_tool(...) mid-query():

sequenceDiagram
    participant SDK as Claude SDK
    participant Server
    participant Client

    SDK->>Server: can_use_tool(tool, input, ctx)
    Server->>Client: SSE: permission_request<br/>{correlation_id, tool, input}
    Note over Server: server holds an<br/>asyncio.Future
    Client->>Server: POST /input<br/>{type: permission_response,<br/>correlation_id, behavior: "allow"}
    Server->>SDK: PermissionResultAllow / Deny
    SDK->>Server: continue query()

If two subscribers race to reply (multi-tab), first reply wins. The loser's POST gets 409 Conflict. The L2 React hook surfaces this as a state machine: requested → responding → resolved | conflict.

AskUserQuestion as a first-class event

AskUserQuestion is an SDK tool, so it technically arrives as a tool_use block. But it's the only tool whose semantics are "stop and ask the user something". Forcing every UI to special-case a particular tool name would be ugly. So agent-webkit pulls it up to its own event:

SSE: ask_user_question { correlation_id, questions }
POST /input { type: question_response, correlation_id, answers }

Same race semantics as permissions.

Resume after disconnect

The SSE stream is a pure derived view of the session's event log. The server holds the last N events (default 1000) in a ring buffer. When a client reconnects:

GET /sessions/{id}/stream
Last-Event-ID: 247

…the server replays everything with seq > 247, then transitions to live tailing. If 247 has been evicted from the ring buffer (the buffer is full and has rolled past), the server returns 412 Precondition Failed — the client must reset its view.

Sizing the ring buffer is a tradeoff: bigger buffers tolerate longer disconnects but use more memory. See Resume & reconnect.

Interrupts don't drain

POST /input { type: "interrupt" } calls client.interrupt(). Important: this does not drain the SDK's receive loop. The server must finish draining receive_messages() before the next query() starts. The L2 hook handles this for you — interrupts go into a cancelling state and resolve to idle once the trailing events arrive.

Where to read next

Wire protocol reference — the full event/message catalog.
Architecture — why it's shaped this way.
Permissions guide — production patterns for the permission UI.

Concepts

On this page