guide

Package any project as a block.

A block is a Docker image plus a small manifest.yaml. The platform wires it up to authenticated, metered LLM inference and gives you five ways to invoke it: one-shot jobs, scheduled workers, interactive sessions, persistent services, and visual workflows — all from the same image.

The magic, in 30 seconds

Every tool that talks to OpenAI speaks the same protocol: OPENAI_BASE_URL + OPENAI_API_KEY. gonkablocks owns those two environment variables. Whether you're running a local script, a one-shot cloud job, or a long-lived chat session, the platform mints a fresh per-run key, points your code at its metering proxy, and bills the tokens against your account.

Result: any code that does openai.chat.completions.create(...) — or uses Anthropic, LangChain, llama-index, the Vercel AI SDK,curl https://api.openai.com/..., or anything else — works unchanged. No SDK to install, no metering code to write.

# 1. one-shot: log in, mint a long-lived key
gonkablocks connect

# 2. wrap any local subprocess as an observed cloud Run
gonkablocks exec -- python my_agent.py

# 3. ship the same script as a real block (auto-detects + builds)
gonkablocks deploy

# 4. now invoke it from anywhere
gonkablocks run admin/my-agent topic="AI safety"

Steps 1–2 work the moment you have an account: no manifest, no Dockerfile, just gonkablocks exec -- <cmd>. Steps 3–4 turn the same code into a published, callable, shareable block.

Block anatomy

A block is two files in a directory:

  • Dockerfile — how to build the image. The platform supports any base image and any language.
  • manifest.yaml — declares what the block is, what inputs it takes, what outputs it produces, and how it should be sandboxed.

Optionally, anything else — Python sources, Node modules, assets, configs. The CLI tars up the directory (excluding node_modules, .git, and .venv) and uploads it to the platform, which builds the image with the Docker daemon and stores it under a versioned tag like gonkablocks/<author>/<slug>:<version>.

Five block types

One image can be invoked in any of five ways. Pick the type that matches the block's lifecycle:

jobone-shot

Runs to completion, returns outputs.json. Default for scripts and agents that finish.

workerscheduled

Same code as a job, but the platform invokes it on a cron schedule you set.

sessioninteractive

Long-running container with a port exposed. The user gets an iframe + shell. One container per user.

servicepublic HTTP

Long-running container, public URL. One container shared across all callers.

workflowcomposable

No image — a graph that calls other published blocks and pipes their outputs together.

What blocks see at runtime

When the platform spawns your container, these environment variables are injected automatically:

OPENAI_BASE_URL    = https://<platform>/v1
OPENAI_API_KEY     = sk-run-<uuid>          # scoped to THIS run
ANTHROPIC_BASE_URL = https://<platform>/v1
ANTHROPIC_API_KEY  = sk-run-<uuid>
GONKA_INFERENCE_URL= https://<platform>/v1
GONKA_INFERENCE_KEY= sk-run-<uuid>
GONKA_OUTPUTS_DIR  = /out                   # write outputs here
GONKA_RUN_ID       = <runId>                # for logging / correlation
INPUT_<KEY>        = <value>                # one per declared input

The scoped key is bound to the current Run (or BlockSession). When the run ends, the key dies. The metering proxy enforces a per-run spend cap, so a runaway loop costs you cents, not dollars.

For sessions and services, the same env vars are present, but the scoped key lives as long as the container does, and spend accumulates against the session's cap.

Pack & ship with the CLI

The CLI auto-detects your project (Python, Node, Go, Rust, or existing Dockerfile) and scaffolds a Dockerfile + manifest.yaml if there isn't one. Then it uploads, builds, and publishes — all in one shot.

# from the cli/ folder, once
cd cli && npm install && npm run build && npm link

# from any project directory
gonkablocks connect          # one-time login + key mint
gonkablocks deploy           # auto-detect + scaffold + build + publish
# → http://<platform>/blocks/<you>/<slug>

For Python projects with requirements.txt or pyproject.toml, the scaffolder also sniffs your argparse, click, and os.environ["INPUT_*"]usages and seeds the manifest with corresponding inputs: entries — so you don't have to write them by hand.

Manifest reference

manifest.yaml is parsed with the schema in src/lib/manifest.ts. A complete example:

name: my-agent              # lowercase + dashes only, ≤60 chars
version: 0.1.0              # semver
type: job                   # job | worker | session | service | workflow
description: One-line summary shown in the catalog.
category: misc

inputs:
  topic:
    type: string            # string | integer | number | boolean | secret | file
    required: true
    description: The thing to research.
  depth:
    type: integer
    required: false
    default: 3
    min: 1
    max: 10

outputs:
  text:
    type: string
    description: Final research summary.
  report:
    type: file              # surfaces as a download in the run viewer
    mime: application/pdf

runtime:
  build: dockerfile         # we build from your Dockerfile
  entrypoint: python main.py
  env:                      # static env vars (added on top of the magic ones)
    LOG_LEVEL: info
  outputs_dir: /out         # where to write outputs.json (default /out)
  expose_port: 8080         # session/service only

resources:
  cpu: 1                    # 0.1 – 8 cores
  memory_mb: 1024           # 128 – 16384
  timeout_seconds: 600      # 10 – 7200
  network: allow            # allow | deny | ["host1.com", "host2.com"]

pricing:
  type: per_run             # per_run | per_minute | subscription
  base_price_cents: 0       # what callers pay you per invocation
  rate_cents_per_minute: 0  # only for per_minute
  inference_pass_through: true   # caller pays inference, not you
  inference_markup_pct: 0   # add a markup on inference cost (0–100%)

Inputs & outputs

Inputs become environment variables prefixed with INPUT_:

# Python
import os
topic = os.environ["INPUT_TOPIC"]
depth = int(os.environ.get("INPUT_DEPTH", "3"))

# Node
const topic = process.env.INPUT_TOPIC;
const depth = parseInt(process.env.INPUT_DEPTH ?? "3", 10);

Write structured outputs to $GONKA_OUTPUTS_DIR/outputs.json (default /out/outputs.json). The platform parses it and shows the keys in the run viewer. Anything else in that directory becomes a downloadable artifact.

import json, os
out = {"text": "Result body…", "score": 0.92}
with open(os.path.join(os.environ["GONKA_OUTPUTS_DIR"], "outputs.json"), "w") as f:
    json.dump(out, f)

Per-language templates

gonkablocks deploy picks a template based on the files it sees in your directory:

  • Dockerfile present → used as-is. Highest priority.
  • pyproject.toml / requirements.txt / setup.py → Python on python:3.12-slim, entry = first of main.py, app.py, run.py.
  • package.json → Node on node:20-slim, entry = main or npm start if defined.
  • go.mod → Go on golang:1.22-alpine.
  • Cargo.toml → Rust on rust:1-slim.

Build in the browser

If you don't want to use the CLI, the Build page has a full Monaco editor + manifest editor + live build logs. Pick a starter (job, worker, session, service, or workflow), edit, save, build, publish. Forking from any public block does the same — gives you an editable copy with the original's files pre-populated.

Invoking a job

From the CLI, the API, or the web UI:

# CLI
gonkablocks run admin/hive-research topic="AI safety in 2026" depth=3
# → live URL printed; CLI tails the run

# REST API
curl -X POST https://<platform>/api/runs \
  -H "authorization: Bearer $GONKA_API_KEY" \
  -H "content-type: application/json" \
  --data '{"blockId":"<id>","inputs":{"topic":"AI safety"}}'

# Web UI
# Block detail page → fill the form → "Run"

The platform spawns the container, streams stdout/stderr + inference calls to the live run viewer at /runs/<id>, and (for jobs) parses outputs.json when the container exits.

Sessions (interactive)

Set type: session and runtime.expose_port: 8080 in the manifest. When a user starts a session, the platform spins up a private container and proxies their browser to it through /api/sessions/<id>/proxy/. Each user gets their own isolated container; idle sessions are reaped on resources.timeout_seconds.

The session viewer also gives you a built-in xterm.js shell into the running container — handy for debugging or for blocks where the terminal IS the UI.

# manifest.yaml (session)
name: my-chat
version: 0.1.0
type: session
description: Long-running chat UI.

runtime:
  build: dockerfile
  entrypoint: python main.py
  expose_port: 8080
resources:
  cpu: 1
  memory_mb: 1024
  timeout_seconds: 1800     # max idle before reap
  network: allow

The platform also supplies these per-request headers to your container, so you can render user-aware UIs:

x-gonkablocks-session-id: <id>
x-gonkablocks-kind:       session | service
x-gonkablocks-user-id:    <userId>     # session only
x-forwarded-host:         <public host>
x-forwarded-proto:        https

See admin/zeroclaw-chat for a working FastAPI + chat-UI session that wraps the zeroclaw agent CLI.

Services (HTTP)

Same shape as a session, but type: service. There's one container shared across all callers, and a public URL at /services/<author>/<slug>. Use this for translation APIs, embeddings, vector search, or anything that fits a stateless RPC shape.

# Caller-side, treat it like any HTTP API:
curl -X POST https://<platform>/services/admin/translate-service/translate \
  -H "content-type: application/json" \
  --data '{"text":"Hello world","target":"ru"}'

Workers (cron)

A worker is an existing block (any image) wrapped with a cron schedule. The platform's scheduler invokes it as a regular run on every tick, with the inputs you configure. Manage them at /workers.

# Example cron strings
"0 */1 * * *"     # top of every hour
"0 9 * * 1-5"     # 09:00 weekdays
"*/5 * * * *"     # every 5 minutes

Workflows (compose blocks)

Workflows have no Dockerfile of their own. They're a graph (drag-drop in /build) that runs other published blocks in topological order, piping outputs from one into the inputs of the next. The runtime lives in src/server/orchestrator.ts:executeWorkflow.

Useful for things like:

  • fetch → chunk → embed → upsert (RAG ingestion in 4 blocks)
  • translate → summarize → tweet (post-process pipelines)
  • research-agent → critique-agent → revise (multi-stage agents)

The inference proxy

Every container talks to $OPENAI_BASE_URL = the platform's /v1/*endpoint. The proxy:

  1. Validates the scoped key against the active run/session and looks up the user.
  2. Forwards the request to proxy.gonka.gg/v1 (or your own gateway, see below) using a master key the container never sees.
  3. Streams the upstream response back as-is — including SSE for streaming chat completions.
  4. Records token counts and computed cost on the way out, then decrements the user's credits in a single transaction with the run/session update.
  5. Refuses the request with HTTP 402 if the run is over its spend cap or the user is out of credits.

Metering & spend caps

  • Per-run cap. Every job and session ships with a spend cap (default $5.00). Once exceeded, every subsequent inference call gets HTTP 402.
  • Per-user balance. New users get a signup credit (default $5.00). Calls fail when the balance hits zero unless the block's pricing model passes inference through to the caller.
  • Per-call audit. Every inference call is stored in InferenceCall with prompt/completion/total tokens, latency, model name, and the computed cost. Visible in the run viewer's Inference drawer and the session viewer's Invocations link.
  • Markup. Block authors can take a percentage on top of the underlying inference cost via pricing.inference_markup_pct. Useful for productized agents.

BYO gateway

Don't want to use the default Gonka backend? In Settings → Gateway you can point all of your runs at any OpenAI-compatible endpoint (vLLM, llama.cpp, Ollama, OpenRouter, raw OpenAI, your own deployment). The platform still meters tokens for visibility, but the upstream provider bills you directly — no platform credit decrement.

CLI reference

Install once:cd cli && npm install && npm run build && npm link. Then:

gonkablocks connect [--server URL] [--username U] [-y]
        ✨ log in + mint a long-lived gk-live-… key
        writes ~/.gonkablocks/config.json + ~/.gonka/credentials

gonkablocks env
        print shell exports for the active key
        usage:  source <(gonkablocks env)

gonkablocks magic
        guided walkthrough — connect, then run a sample or deploy this dir

gonkablocks exec [--title T] -- <cmd> [args...]
        run a local subprocess as an observed cloud Run.
        injects OPENAI_BASE_URL/KEY (and Anthropic/Gonka mirrors) into env.
        streams stdout/stderr to the run viewer.
        examples:
          gonkablocks exec -- python my_agent.py
          gonkablocks exec --title "build" -- node build.js

gonkablocks proxy [--port 11434]
        local OpenAI-compatible HTTP forwarder on 127.0.0.1.
        point any OpenAI client at it (no env-var change needed in your code).

gonkablocks init [dir]
        scaffold a blank block (Dockerfile + manifest + main.py).

gonkablocks deploy [dir] [--auto] [--name slug]
        auto-detect project + scaffold + upload + build + publish.
        --auto skips confirmation prompts (useful in CI).

gonkablocks run <author>/<slug> [k=v ...]
        invoke a published block; CLI tails the run.

gonkablocks runs
        list your recent runs.

gonkablocks login
        legacy cookie-only login (use `connect` instead in 99% of cases).

Troubleshooting

HTTP 402 from the proxy

Your scoped key is invalid, expired, or over budget. Check the run viewer for the spend cap and the user's remaining credits at /settings.

Container build fails with timeout_seconds too low

The minimum is 10 seconds. For session blocks, set it to the maximum tolerable idle time (e.g. 1800 = 30 min).

Session iframe can't reach relative URLs

The session proxy auto-injects a <base href="/api/sessions/<id>/proxy/"> into HTML responses, so fetch("chat") resolves correctly. If your app uses absolute URLs (e.g. fetch("/chat")), switch them to relative.

Long inferences look like "Failed to fetch"

Some upstream LLM calls take 30+ seconds and write nothing in the meantime. The browser's fetch sees no activity and either the user refreshes or an intermediary aborts. Stream tokens, or emit a tiny heartbeat byte every few seconds (the chat UI in admin/zeroclaw-chat does this with a null byte that the client filters out).

Container started but the platform reports it failed

The orchestrator overrides WORKDIR to /workspace. If your main.py lives elsewhere (e.g. /app/main.py), use an absolute path in the manifest entrypoint: cd /app && python main.py.


Want a worked example? Read the source of hive-research (job + agent), translate-service (service + FastAPI), and zeroclaw-chat (session + interactive chat UI). Each is <200 lines.