Build a session block · gonkablocks guides

What a session is

A session block is an HTTP app — a Python (or Node, or Go) web server, packaged as a container — that the platform starts on demand for a specific user, proxies their browser traffic through, and stops once they've been idle long enough.

From the user's perspective it's just a URL on blocks.gonka.gg that renders an interactive thing. From the developer's perspective it's a normal web app, but with $OPENAI_BASE_URL and $OPENAI_API_KEY already injected and a known idle deadline you can rely on.

user clicks Start
       │
       ▼
 platform starts container
       │   (cold start: pull + boot)
       ▼
 container binds 0.0.0.0:<expose_port>
       │
       ▼
 user's browser opens /sessions/<id>
       │
       ▼
 every request → session-proxy → 127.0.0.1:<hostPort>/<path>
       │
       ▼
 (when user idle for idle_ttl_seconds) → reaper stops the container

Sessions are per user. If two people open the same session block, each gets their own container. Containers don't share filesystem, memory, or LLM context — that's the whole point.

When to pick session

The user interaction is multi-turn — chat, IDE, notebook.
You want per-user state: chat history, scratch files, an in-memory vector store keyed to this user's session.
The block has a UI: HTML/CSS/JS, or something like Streamlit / Gradio rendered as HTML.
Examples: chatbots with custom prompts, doc Q&A over uploaded PDFs, quick interactive notebooks for analysts, repl-style coding sandboxes.

When NOT to pick session

The endpoint is shared by everyone — pick a service. Sessions waste a fresh container on every viewer; services run one container for the whole world.
You're doing one shot of work and returning a result — that's a job. Don't pay 2-5 s of cold start to do something a job can do in 50 ms.
You need to expose a public REST API for other systems to call (CI, webhooks, CRM integrations) — also a service. Sessions are user-scoped.

What you'll build

A tiny one-page chat called e2e-session: HTML + fetch() calling a single /say POST endpoint that replies via Qwen3-235B. ~80 lines of Python total. The whole loop — deploy → start session → chat through the proxy → stop — runs in under a minute.

manifest.yaml

name: e2e-session
version: 0.1.0
type: session
description: Tiny one-page chat that the platform spins up per-user, on demand.
category: misc

runtime:
  build: dockerfile
  entrypoint: uvicorn app:app --host 0.0.0.0 --port 8080
  expose_port: 8080
  env:
    MODEL: qwen3-235b

resources:
  cpu: 1
  memory_mb: 512
  timeout_seconds: 1800
  network: deny
  idle_ttl_seconds: 600

pricing:
  type: per_run
  base_price_cents: 0
  rate_cents_per_minute: 0
  inference_pass_through: true
  inference_markup_pct: 0

Field-by-field, what's new vs. a job

type: session — required. Tells the platform to keep the container alive across requests and proxy HTTP through to it.
runtime.entrypoint — must start a long-running HTTP server, not exit. If your entrypoint terminates (because, say, you wrote python app.py where app.py only defines app = FastAPI() and never starts a server), the container exits with code 0 and the session is marked failed. This is the bug I actually hit on first try. Use uvicorn …, gunicorn …, or node server.js — a command that blocks.
runtime.expose_port — the TCP port your server binds inside the container. Must match what your server actually listens on. The platform maps it to a random ephemeral host port and proxies to it.
resources.timeout_seconds — wall-clock maximum even if the user is active. Use a few hours (1800-7200) for long-form chats; for short interactions a tighter cap is fine.
resources.idle_ttl_seconds — the moment the platform stops bumping the timer because no requests arrived for this long, the container is reaped. 300-900 seconds is a sensible range; longer wastes resources, much shorter and a user reading a long message gets cut off.

Dockerfile

FROM python:3.12-slim
WORKDIR /workspace
RUN pip install --no-cache-dir 'openai==1.55.0' 'httpx<0.28' \
                              'fastapi==0.115.5' 'uvicorn[standard]==0.32.1'
COPY . /workspace
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

Same shape as a job's Dockerfile, plus FastAPI + uvicorn. For Node sessions, an npm ci && CMD ["node", "server.js"] does the equivalent.

app.py

"""One-page chat for e2e-session.

GET /        -> renders a chat HTML page (no JS framework, just a form).
POST /say    -> JSON {message: "..."}, returns {reply: "..."}.

The platform spins up one container per user on demand; once they stop
hitting it for idle_ttl_seconds, it's reaped.
"""
from __future__ import annotations
import os

from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from pydantic import BaseModel
from openai import OpenAI

MODEL = os.environ.get("MODEL", "qwen3-235b")
_client: OpenAI | None = None

def oai() -> OpenAI:
    global _client
    if _client is None:
        _client = OpenAI(
            base_url=os.environ["OPENAI_BASE_URL"],
            api_key=os.environ["OPENAI_API_KEY"],
        )
    return _client

class SayReq(BaseModel):
    message: str

app = FastAPI()

PAGE = """<!doctype html>
<html><head><meta charset="utf-8"><title>e2e-session</title></head>
<body style="font-family:system-ui;padding:32px;max-width:640px;margin:auto">
<h1>e2e-session</h1>
<div id="log"></div>
<form id="f"><input id="m" autofocus><button>send</button></form>
<script>
const log = document.getElementById('log');
const f = document.getElementById('f');
const m = document.getElementById('m');
let history = [];
function render(){ log.innerText = history.map(([w,t]) => '[' + w + '] ' + t).join('\n'); }
f.addEventListener('submit', async (e) => {
  e.preventDefault();
  const text = m.value.trim(); if (!text) return;
  history.push(['me', text]); m.value = ''; history.push(['ai','...']); render();
  const r = await fetch('say', {method:'POST', headers:{'content-type':'application/json'}, body: JSON.stringify({message: text})});
  const j = await r.json();
  history[history.length-1] = ['ai', j.reply || j.error || '(no reply)'];
  render();
});
</script></body></html>
"""

@app.get("/", response_class=HTMLResponse)
def home(): return HTMLResponse(PAGE)

@app.post("/say")
def say(body: SayReq) -> dict:
    msg = (body.message or "").strip()
    if not msg:
        return {"error": "empty"}
    resp = oai().chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": "Friendly, concise. Reply in <= 2 short sentences."},
            {"role": "user", "content": msg},
        ],
        temperature=0.7,
    )
    return {"reply": (resp.choices[0].message.content or "").strip()}

Note: the form's fetch('say') is a relative URL — no leading slash. The platform's proxy injects a <base href="/api/sessions/<id>/proxy/"> into your HTML so relative URLs resolve under the proxy prefix automatically. Use relative URLs for fetch / script src / css href / etc. Absolute paths like /say would skip the prefix and 404.

Deploy the block

$ gonkablocks deploy
packaging /private/tmp/gbk-e2e/e2e-session → e2e-session@0.1.0…
uploading 2 files…
building…
 • building
 • ready
publishing…
✓ published. View at https://blocks.gonka.gg/blocks/admin/e2e-session

At this point the block is published — but no container is running. Each user starts their own.

Start a session

The user-facing way

Open the block's page and click Start session. The platform creates a BlockSession row, boots a container, and redirects the user to /sessions/<sessionId> — the page hosts your HTML inside an iframe-shaped chrome with a stop button and idle countdown.

From code (REST)

# 1. resolve the block id
curl -H "Authorization: Bearer $GONKA_API_KEY" \
  "https://blocks.gonka.gg/api/blocks/resolve?author=admin&slug=e2e-session"
# {"blockId":"cmooh3fhq00484a1dvugvkywn"}

# 2. start the session
curl -X POST https://blocks.gonka.gg/api/sessions \
  -H "Authorization: Bearer $GONKA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "blockId": "cmooh3fhq00484a1dvugvkywn",
    "title": "my chat",
    "spendCapCents": 50,
    "idleTtlSecs": 600
  }'
# {"id":"cmooh3si6004c4a1d5iunm8ny","url":"/sessions/cmooh3si6004c4a1d5iunm8ny"}

Talking to it (proxy paths)

Once the container is running, every HTTP request to /api/sessions/<id>/proxy/<path> is forwarded to your container at 127.0.0.1:<hostPort>/<path>. The same proxy serves the public /sessions/<id> URL.

# render the chat HTML
curl -H "Authorization: Bearer $GONKA_API_KEY" -L \
  "https://blocks.gonka.gg/api/sessions/$SID/proxy/"
# <!doctype html><html><head><base href="/api/sessions/.../proxy/">...

# call the JSON endpoint
curl -X POST -H "Authorization: Bearer $GONKA_API_KEY" \
  -H "Content-Type: application/json" \
  "https://blocks.gonka.gg/api/sessions/$SID/proxy/say" \
  -d '{"message":"hi! who are you in one sentence?"}'
# {"reply":"I'm Qwen, a helpful AI assistant created by Alibaba Cloud."}

Things the proxy does for you, transparently:

Streams responses (good for SSE / chunked transfer; chat tokens flow through cleanly).
Rewrites HTML to inject a <base href> so relative paths work.
Adds x-gonkablocks-session-id and x-gonkablocks-user-id headers — your code can read these to know which user is talking to it.
Bumps lastSeenAt on every request — keeps the idle reaper at bay.

Things the proxy does not support yet:

WebSocket upgrades. The Next.js route handler can't see the raw socket. SSE works fine (it's just a long HTTP response); for true WS, you'd need to wire it through the Node upgrade listener — not available today.

Lifecycle & idle reaping

A session container has three lifetime caps, and the earliest one wins:

resources.timeout_seconds — absolute wall-clock cap. The container stops at this point even if the user is actively chatting. Set it to the longest plausible session length.
resources.idle_ttl_seconds — time since the last proxy hit. Reset on every request. Default 600 s; tune if your users have long read times.
Spend cap — when inferenceSpentCents crosses the cap, the platform stops the container. Same mechanic as for jobs.

When the container is reaped, its BlockSession row is marked stopped with a reason. The user sees a friendly "session ended" page; clicking Start again creates a fresh container.

Stop a session

Stop button in the UI, or:

curl -X DELETE https://blocks.gonka.gg/api/sessions/<sessionId> \
  -H "Authorization: Bearer $GONKA_API_KEY"
# {"ok":true}

DELETE is a graceful stop — the platform sends SIGTERM and waits ~10s before SIGKILL. If your app needs to flush state, hook the signal:

import signal, sys
def handle_term(signum, frame):
    print("==> stopping cleanly", flush=True)
    # flush state, close DB connections, etc.
    sys.exit(0)
signal.signal(signal.SIGTERM, handle_term)

The DELETE call itself takes ~10 s if your code doesn't handle SIGTERM — that's the docker stop grace period waiting for your process. Hooking it makes the stop instant.

Auth & user headers

Inside the container you can rely on these headers being present on every proxied request:

x-gonkablocks-session-id — the BlockSession id. Useful if you want to call back out to the platform's API to attach files, billing metadata, etc.
x-gonkablocks-user-id — the cuid of the logged-in user. Use this to scope state (e.g. as a primary key in a per-user dict).
x-gonkablocks-kind — always session here; handy if you reuse the same code across session and service.
x-forwarded-host, x-forwarded-proto — set to the public-facing host so frameworks rendering absolute URLs (Flask, FastAPI, Starlette) get them right.

Don't trust these from a non-platform caller, of course — only the platform's proxy can set them, since direct access to 127.0.0.1:<hostPort> from outside the platform host is closed. But within the container you can treat them as authoritative.

Pitfalls I actually hit

Entrypoint that doesn't start the server. If runtime.entrypoint runs a Python file that defines app = FastAPI() but never starts a server, the container exits with code 0 and the session is marked failed with the cryptic container exited (code=0) message. Fix: set entrypoint to uvicorn app:app --host 0.0.0.0 --port 8080.
Absolute paths in the HTML. If your form posts to /say (leading slash), it skips the proxy prefix and 404s. Use relative paths or read document.baseURI.
Stdlib http.server drops POST bodies. The proxy strips Content-Length and forwards the body chunked; Python's stdlib HTTP server doesn't decode chunked encoding. Use FastAPI / Flask / Starlette / aiohttp / Express / Fastify — they all handle this transparently.
Long uploads timing out. The proxy sets a generous timeout but very long uploads (large file attachments) can still hit it. Stream large files through your own object storage and have the session fetch the path, rather than uploading a 50 MB body through the proxy.
WebSockets aren't available. SSE works. For real-time tokens, stream a normal HTTP response: stream=True on the OpenAI client and yield from a FastAPI StreamingResponse.
Container state is per-session, not per-block. A user closing their browser and reopening it the next day gets a new container with empty state. If you need persistence, write to the platform's key-value store (POST /api/kv) or to your own backend.

Next steps

Same code, but shared by everyone (one container for the whole world)? Service blocks.
Embed the chat in your own website? Iframe embed, in the external-access guide.
Stream tokens as they arrive? Set stream=True on the OpenAI call and return a StreamingResponse from FastAPI. SSE flows through the proxy unchanged.