The magic, in 30 seconds
Every tool that talks to OpenAI speaks the same protocol: OPENAI_BASE_URL + OPENAI_API_KEY. gonkablocks owns those two environment variables. Whether you're running a local script, a one-shot cloud job, or a long-lived chat session, the platform mints a fresh per-run key, points your code at its metering proxy, and bills the tokens against your account.
Result: any code that does openai.chat.completions.create(...) — or uses Anthropic, LangChain, llama-index, the Vercel AI SDK,curl https://api.openai.com/..., or anything else — works unchanged. No SDK to install, no metering code to write.
# 1. one-shot: log in, mint a long-lived key gonkablocks connect # 2. wrap any local subprocess as an observed cloud Run gonkablocks exec -- python my_agent.py # 3. ship the same script as a real block (auto-detects + builds) gonkablocks deploy # 4. now invoke it from anywhere gonkablocks run admin/my-agent topic="AI safety"
Steps 1–2 work the moment you have an account: no manifest, no Dockerfile, just gonkablocks exec -- <cmd>. Steps 3–4 turn the same code into a published, callable, shareable block.
Block anatomy
A block is two files in a directory:
Dockerfile— how to build the image. The platform supports any base image and any language.manifest.yaml— declares what the block is, what inputs it takes, what outputs it produces, and how it should be sandboxed.
Optionally, anything else — Python sources, Node modules, assets, configs. The CLI tars up the directory (excluding node_modules, .git, and .venv) and uploads it to the platform, which builds the image with the Docker daemon and stores it under a versioned tag like gonkablocks/<author>/<slug>:<version>.
Five block types
One image can be invoked in any of five ways. Pick the type that matches the block's lifecycle:
Runs to completion, returns outputs.json. Default for scripts and agents that finish.
Same code as a job, but the platform invokes it on a cron schedule you set.
Long-running container with a port exposed. The user gets an iframe + shell. One container per user.
Long-running container, public URL. One container shared across all callers.
No image — a graph that calls other published blocks and pipes their outputs together.
What blocks see at runtime
When the platform spawns your container, these environment variables are injected automatically:
OPENAI_BASE_URL = https://<platform>/v1 OPENAI_API_KEY = sk-run-<uuid> # scoped to THIS run ANTHROPIC_BASE_URL = https://<platform>/v1 ANTHROPIC_API_KEY = sk-run-<uuid> GONKA_INFERENCE_URL= https://<platform>/v1 GONKA_INFERENCE_KEY= sk-run-<uuid> GONKA_OUTPUTS_DIR = /out # write outputs here GONKA_RUN_ID = <runId> # for logging / correlation INPUT_<KEY> = <value> # one per declared input
The scoped key is bound to the current Run (or BlockSession). When the run ends, the key dies. The metering proxy enforces a per-run spend cap, so a runaway loop costs you cents, not dollars.
For sessions and services, the same env vars are present, but the scoped key lives as long as the container does, and spend accumulates against the session's cap.
Pack & ship with the CLI
The CLI auto-detects your project (Python, Node, Go, Rust, or existing Dockerfile) and scaffolds a Dockerfile + manifest.yaml if there isn't one. Then it uploads, builds, and publishes — all in one shot.
# from the cli/ folder, once cd cli && npm install && npm run build && npm link # from any project directory gonkablocks connect # one-time login + key mint gonkablocks deploy # auto-detect + scaffold + build + publish # → http://<platform>/blocks/<you>/<slug>
For Python projects with requirements.txt or pyproject.toml, the scaffolder also sniffs your argparse, click, and os.environ["INPUT_*"]usages and seeds the manifest with corresponding inputs: entries — so you don't have to write them by hand.
Manifest reference
manifest.yaml is parsed with the schema in src/lib/manifest.ts. A complete example:
name: my-agent # lowercase + dashes only, ≤60 chars
version: 0.1.0 # semver
type: job # job | worker | session | service | workflow
description: One-line summary shown in the catalog.
category: misc
inputs:
topic:
type: string # string | integer | number | boolean | secret | file
required: true
description: The thing to research.
depth:
type: integer
required: false
default: 3
min: 1
max: 10
outputs:
text:
type: string
description: Final research summary.
report:
type: file # surfaces as a download in the run viewer
mime: application/pdf
runtime:
build: dockerfile # we build from your Dockerfile
entrypoint: python main.py
env: # static env vars (added on top of the magic ones)
LOG_LEVEL: info
outputs_dir: /out # where to write outputs.json (default /out)
expose_port: 8080 # session/service only
resources:
cpu: 1 # 0.1 – 8 cores
memory_mb: 1024 # 128 – 16384
timeout_seconds: 600 # 10 – 7200
network: allow # allow | deny | ["host1.com", "host2.com"]
pricing:
type: per_run # per_run | per_minute | subscription
base_price_cents: 0 # what callers pay you per invocation
rate_cents_per_minute: 0 # only for per_minute
inference_pass_through: true # caller pays inference, not you
inference_markup_pct: 0 # add a markup on inference cost (0–100%)Inputs & outputs
Inputs become environment variables prefixed with INPUT_:
# Python
import os
topic = os.environ["INPUT_TOPIC"]
depth = int(os.environ.get("INPUT_DEPTH", "3"))
# Node
const topic = process.env.INPUT_TOPIC;
const depth = parseInt(process.env.INPUT_DEPTH ?? "3", 10);Write structured outputs to $GONKA_OUTPUTS_DIR/outputs.json (default /out/outputs.json). The platform parses it and shows the keys in the run viewer. Anything else in that directory becomes a downloadable artifact.
import json, os
out = {"text": "Result body…", "score": 0.92}
with open(os.path.join(os.environ["GONKA_OUTPUTS_DIR"], "outputs.json"), "w") as f:
json.dump(out, f)Per-language templates
gonkablocks deploy picks a template based on the files it sees in your directory:
- Dockerfile present → used as-is. Highest priority.
pyproject.toml/requirements.txt/setup.py→ Python onpython:3.12-slim, entry = first ofmain.py, app.py, run.py.package.json→ Node onnode:20-slim, entry =mainornpm startif defined.go.mod→ Go ongolang:1.22-alpine.Cargo.toml→ Rust onrust:1-slim.
Build in the browser
If you don't want to use the CLI, the Build page has a full Monaco editor + manifest editor + live build logs. Pick a starter (job, worker, session, service, or workflow), edit, save, build, publish. Forking from any public block does the same — gives you an editable copy with the original's files pre-populated.
Invoking a job
From the CLI, the API, or the web UI:
# CLI
gonkablocks run admin/hive-research topic="AI safety in 2026" depth=3
# → live URL printed; CLI tails the run
# REST API
curl -X POST https://<platform>/api/runs \
-H "authorization: Bearer $GONKA_API_KEY" \
-H "content-type: application/json" \
--data '{"blockId":"<id>","inputs":{"topic":"AI safety"}}'
# Web UI
# Block detail page → fill the form → "Run"The platform spawns the container, streams stdout/stderr + inference calls to the live run viewer at /runs/<id>, and (for jobs) parses outputs.json when the container exits.
Sessions (interactive)
Set type: session and runtime.expose_port: 8080 in the manifest. When a user starts a session, the platform spins up a private container and proxies their browser to it through /api/sessions/<id>/proxy/. Each user gets their own isolated container; idle sessions are reaped on resources.timeout_seconds.
The session viewer also gives you a built-in xterm.js shell into the running container — handy for debugging or for blocks where the terminal IS the UI.
# manifest.yaml (session) name: my-chat version: 0.1.0 type: session description: Long-running chat UI. runtime: build: dockerfile entrypoint: python main.py expose_port: 8080 resources: cpu: 1 memory_mb: 1024 timeout_seconds: 1800 # max idle before reap network: allow
The platform also supplies these per-request headers to your container, so you can render user-aware UIs:
x-gonkablocks-session-id: <id> x-gonkablocks-kind: session | service x-gonkablocks-user-id: <userId> # session only x-forwarded-host: <public host> x-forwarded-proto: https
See admin/zeroclaw-chat for a working FastAPI + chat-UI session that wraps the zeroclaw agent CLI.
Services (HTTP)
Same shape as a session, but type: service. There's one container shared across all callers, and a public URL at /services/<author>/<slug>. Use this for translation APIs, embeddings, vector search, or anything that fits a stateless RPC shape.
# Caller-side, treat it like any HTTP API:
curl -X POST https://<platform>/services/admin/translate-service/translate \
-H "content-type: application/json" \
--data '{"text":"Hello world","target":"ru"}'Workers (cron)
A worker is an existing block (any image) wrapped with a cron schedule. The platform's scheduler invokes it as a regular run on every tick, with the inputs you configure. Manage them at /workers.
# Example cron strings "0 */1 * * *" # top of every hour "0 9 * * 1-5" # 09:00 weekdays "*/5 * * * *" # every 5 minutes
Workflows (compose blocks)
Workflows have no Dockerfile of their own. They're a graph (drag-drop in /build) that runs other published blocks in topological order, piping outputs from one into the inputs of the next. The runtime lives in src/server/orchestrator.ts:executeWorkflow.
Useful for things like:
- fetch → chunk → embed → upsert (RAG ingestion in 4 blocks)
- translate → summarize → tweet (post-process pipelines)
- research-agent → critique-agent → revise (multi-stage agents)
The inference proxy
Every container talks to $OPENAI_BASE_URL = the platform's /v1/*endpoint. The proxy:
- Validates the scoped key against the active run/session and looks up the user.
- Forwards the request to
proxy.gonka.gg/v1(or your own gateway, see below) using a master key the container never sees. - Streams the upstream response back as-is — including SSE for streaming chat completions.
- Records token counts and computed cost on the way out, then decrements the user's credits in a single transaction with the run/session update.
- Refuses the request with HTTP 402 if the run is over its spend cap or the user is out of credits.
Metering & spend caps
- Per-run cap. Every job and session ships with a spend cap (default $5.00). Once exceeded, every subsequent inference call gets HTTP 402.
- Per-user balance. New users get a signup credit (default $5.00). Calls fail when the balance hits zero unless the block's pricing model passes inference through to the caller.
- Per-call audit. Every inference call is stored in
InferenceCallwith prompt/completion/total tokens, latency, model name, and the computed cost. Visible in the run viewer's Inference drawer and the session viewer's Invocations link. - Markup. Block authors can take a percentage on top of the underlying inference cost via
pricing.inference_markup_pct. Useful for productized agents.
BYO gateway
Don't want to use the default Gonka backend? In Settings → Gateway you can point all of your runs at any OpenAI-compatible endpoint (vLLM, llama.cpp, Ollama, OpenRouter, raw OpenAI, your own deployment). The platform still meters tokens for visibility, but the upstream provider bills you directly — no platform credit decrement.
CLI reference
Install once:cd cli && npm install && npm run build && npm link. Then:
gonkablocks connect [--server URL] [--username U] [-y]
✨ log in + mint a long-lived gk-live-… key
writes ~/.gonkablocks/config.json + ~/.gonka/credentials
gonkablocks env
print shell exports for the active key
usage: source <(gonkablocks env)
gonkablocks magic
guided walkthrough — connect, then run a sample or deploy this dir
gonkablocks exec [--title T] -- <cmd> [args...]
run a local subprocess as an observed cloud Run.
injects OPENAI_BASE_URL/KEY (and Anthropic/Gonka mirrors) into env.
streams stdout/stderr to the run viewer.
examples:
gonkablocks exec -- python my_agent.py
gonkablocks exec --title "build" -- node build.js
gonkablocks proxy [--port 11434]
local OpenAI-compatible HTTP forwarder on 127.0.0.1.
point any OpenAI client at it (no env-var change needed in your code).
gonkablocks init [dir]
scaffold a blank block (Dockerfile + manifest + main.py).
gonkablocks deploy [dir] [--auto] [--name slug]
auto-detect project + scaffold + upload + build + publish.
--auto skips confirmation prompts (useful in CI).
gonkablocks run <author>/<slug> [k=v ...]
invoke a published block; CLI tails the run.
gonkablocks runs
list your recent runs.
gonkablocks login
legacy cookie-only login (use `connect` instead in 99% of cases).Troubleshooting
HTTP 402 from the proxy
Your scoped key is invalid, expired, or over budget. Check the run viewer for the spend cap and the user's remaining credits at /settings.
Container build fails with timeout_seconds too low
The minimum is 10 seconds. For session blocks, set it to the maximum tolerable idle time (e.g. 1800 = 30 min).
Session iframe can't reach relative URLs
The session proxy auto-injects a <base href="/api/sessions/<id>/proxy/"> into HTML responses, so fetch("chat") resolves correctly. If your app uses absolute URLs (e.g. fetch("/chat")), switch them to relative.
Long inferences look like "Failed to fetch"
Some upstream LLM calls take 30+ seconds and write nothing in the meantime. The browser's fetch sees no activity and either the user refreshes or an intermediary aborts. Stream tokens, or emit a tiny heartbeat byte every few seconds (the chat UI in admin/zeroclaw-chat does this with a null byte that the client filters out).
Container started but the platform reports it failed
The orchestrator overrides WORKDIR to /workspace. If your main.py lives elsewhere (e.g. /app/main.py), use an absolute path in the manifest entrypoint: cd /app && python main.py.
Want a worked example? Read the source of hive-research (job + agent), translate-service (service + FastAPI), and zeroclaw-chat (session + interactive chat UI). Each is <200 lines.