Guide · 01

Create your first block

Take a Python script that does something useful with an LLM, package it as a real block, deploy it, and run it from anywhere. By the end you'll have a published block on Gonkablocks that anyone can run.

What you'll build

A small autonomous research agent — give it a topic, it generates sub-questions, drafts answers using a model, and writes a markdown report. We'll mirror the open-source hive-research seed block so you can compare your code to a known-working reference at any point.

The same recipe works for anything that needs to call an LLM:

  • summarisers, translators, extractors
  • data-cleaning agents (CSV-in, Markdown-out)
  • code reviewers, doc generators
  • multi-step research / planning agents

Before you start

You'll need three things on your machine:

  • Docker running locally (only needed for gonkablocks exec — the deploy path builds remotely on the platform).
  • Python 3.10+ to run the example. The block itself runs in a container the platform builds, so the host version doesn't matter at deploy time.
  • Gonkablocks CLI:
# install the CLI (one-line installer)
curl -fsSL https://blocks.gonka.gg/install.sh | sh

# or with npm
npm install -g @gonkalabs/blocks-cli

# log in once — opens a browser, mints a long-lived API key
gonkablocks connect --server https://blocks.gonka.gg

connect stores the key in ~/.config/gonkablocks/config.json. It's the same key the CLI uses for deploy, exec, run, etc — there's nothing else to set up.

Pick a block type

Every block is one of five types — the same source code can be invoked in any of them, but you pick one as the "canonical" interface in the manifest:

typelifecycleuse it for
jobone-shot, exitsscripts, agents, batch processors
workerscheduled (cron)daily summaries, periodic ingest
sessionlong-running, per-userchats, notebooks, IDE-style tools
servicelong-running, sharedHTTP endpoints, MCP servers
workflowcomposes other blocksvisual DAGs in the canvas

For our research agent we want type: job — it takes inputs, does its thing, finishes.

Project layout

Make a directory anywhere on your machine:

mkdir my-research && cd my-research
touch manifest.yaml Dockerfile main.py

That's the entire skeleton. Three files. We'll fill them in next.

Optionally: anything else you need — extra Python sources, config files, prompts, sample data. The CLI tars up the whole directory on deploy, excluding node_modules, .git, and .venv.

Write manifest.yaml

The manifest declares what the block is. Inputs (what the user types in), outputs (what the block returns), what runtime, how much memory and CPU, network policy, pricing. Strict schema — invalid manifests fail at deploy time, not at run time.

name: my-research
version: 0.1.0
type: job
description: Multi-round research agent — generates sub-questions, drafts answers, synthesises a markdown report.
category: research

inputs:
  topic:
    type: string
    required: true
    description: The research topic to investigate.
  depth:
    type: integer
    required: false
    default: 3
    description: Number of sub-questions to explore (1-8).
    min: 1
    max: 8

outputs:
  report_path:
    type: string
    description: Path inside /out where the markdown report was written.
  sub_questions:
    type: json
    description: The list of sub-questions actually investigated.

runtime:
  build: dockerfile
  entrypoint: python main.py
  outputs_dir: /out
  env:
    MODEL: qwen3-235b

resources:
  cpu: 1
  memory_mb: 1024
  timeout_seconds: 600
  network: allow

pricing:
  type: per_run
  base_price_cents: 0
  rate_cents_per_minute: 0
  inference_pass_through: true
  inference_markup_pct: 0

Field reference (the bits worth knowing)

  • name — lowercase letters, digits, dashes only. Becomes the slug at /blocks/<you>/<name>. Once published, don't rename — fork instead.
  • version — strict semver x.y.z. Bump it for every deploy or the platform refuses the upload.
  • type — see the table above. We use job.
  • inputs — each entry is { type, required, default, description, enum?, min?, max? }. Types: string, integer, number, boolean, secret, file. The platform surfaces these as a form on the block's public page.
  • outputs — declarative description of what your block produces. Types include json (any JSON-serialisable value) and file (a path inside outputs_dir).
  • runtime.entrypoint — the command run inside the container. It runs as PID 1 — no shell expansion unless you wrap in sh -c.
  • runtime.env — extra env vars baked into every run. Use this for non-secret config (model id, prompt versions). Secrets go through the secrets vault — see below.
  • resources.timeout_seconds — hard kill if the run exceeds this. 10..7200, default 600. Add headroom for slow LLM calls.
  • resources.network deny (default — only the inference proxy is reachable), allow (any host), or a list of host allowlists. Use allow sparingly; the tighter the better for security review.
  • pricing.inference_pass_through — if true (the default), the user is charged exactly what Gonka charges for their tokens, with no markup. Set false + inference_markup_pct if you publish your block as a paid service.

Write the Dockerfile

Anything goes — Python, Node, Go, Rust, an existing image. The platform's build daemon pulls your base, copies the source, and tags it gonkablocks/<you>/<slug>:<version>.

FROM python:3.12-slim
WORKDIR /workspace

# pin the SDK and httpx — newer httpx breaks the openai 1.55 path
RUN pip install --no-cache-dir 'openai==1.55.0' 'httpx<0.28'

COPY . /workspace
CMD ["python", "main.py"]

Tips that save time:

  • Use -slim bases (or even -alpine) — image size is billed and cold starts are faster on smaller layers.
  • Pin every dependency (pip, npm, etc). Floating versions break builds days later.
  • Don't install build tools just to run the block. Use a builder stage if you need a compiler.
  • You don't need to EXPOSE a port for jobs — that's only for sessions/services.

Write the code

Three things every block does: read inputs, call the platform's inference proxy, write outputs.

Reading inputs

Each manifest input becomes an env var INPUT_<KEY_UPPERCASED>. That's it — no SDK to install just for input plumbing.

import os, sys
topic = os.environ.get("INPUT_TOPIC", "").strip()
depth = int(os.environ.get("INPUT_DEPTH", "3"))

if not topic:
    print("ERROR: topic is required", file=sys.stderr)
    sys.exit(1)

Calling inference

The platform pre-injects OPENAI_BASE_URL and OPENAI_API_KEY pointed at its metering proxy. Any OpenAI-compatible client works — no SDK from us, just the regular openai package.

from openai import OpenAI

client = OpenAI(
    base_url=os.environ["OPENAI_BASE_URL"],
    api_key=os.environ["OPENAI_API_KEY"],
)

MODEL = os.environ.get("MODEL", "qwen3-235b")

def chat(system: str, user: str) -> str:
    resp = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": user},
        ],
        temperature=0.4,
    )
    return resp.choices[0].message.content or ""

Every call is metered — prompt + completion tokens, cost in cents, status, duration — and surfaced as a row in the run viewer. The user sees them in real time. There's nothing you need to do to wire this up; it's all in the proxy.

Available models today: qwen3-235b (alias for the full HuggingFace ID). Soon: Kimi K2.5 and K2.6. Image, video, embeddings, fine-tunes are on the Gonka roadmap — when they land, you swap the model id.

The agent loop (full main.py)

import json, os, sys
from openai import OpenAI

client = OpenAI(
    base_url=os.environ["OPENAI_BASE_URL"],
    api_key=os.environ["OPENAI_API_KEY"],
)

MODEL    = os.environ.get("MODEL", "qwen3-235b")
OUT_DIR  = os.environ.get("GONKA_OUTPUTS_DIR", "/out")

topic = os.environ.get("INPUT_TOPIC", "").strip()
depth = int(os.environ.get("INPUT_DEPTH", "3"))

if not topic:
    print("ERROR: topic is required", file=sys.stderr); sys.exit(1)

print(f"==> Research topic: {topic}")
print(f"==> Depth: {depth}")

def chat(system: str, user: str) -> str:
    resp = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": user},
        ],
        temperature=0.4,
    )
    return resp.choices[0].message.content or ""

print("==> Step 1: generating sub-questions…")
raw = chat(
    "You are a research planner. Given a topic, list distinct"
    " sub-questions that, taken together, would form a comprehensive"
    " view. Output as a numbered list, one per line, no preamble.",
    f"Topic: {topic}\n\nGenerate exactly {depth} sub-questions.",
)
questions = [
    line.split(".", 1)[-1].strip().lstrip(")").strip()
    for line in raw.splitlines()
    if line.strip() and any(c.isdigit() for c in line[:3])
][:depth] or [topic]
for i, q in enumerate(questions, 1):
    print(f"   Q{i}: {q}")

print("==> Step 2: answering sub-questions…")
answers = []
for i, q in enumerate(questions, 1):
    print(f"   answering Q{i}…")
    a = chat(
        "You are a careful research analyst. Answer with concrete"
        " details, examples, and nuance. Use markdown. 200-400 words.",
        f"Topic: {topic}\n\nQuestion: {q}",
    )
    answers.append({"q": q, "a": a})

print("==> Step 3: synthesising report…")
synth = chat(
    "You are an editor. Combine the question/answer pairs into a"
    " single, coherent markdown report on the topic. Add a short"
    " executive summary. Keep all the detail.",
    json.dumps({"topic": topic, "answers": answers}, indent=2),
)

# --- write outputs --------------------------------------------------
os.makedirs(OUT_DIR, exist_ok=True)
report_path = os.path.join(OUT_DIR, "report.md")
with open(report_path, "w") as f:
    f.write(synth)

# outputs.json: how the platform reads structured outputs back
with open(os.path.join(OUT_DIR, "outputs.json"), "w") as f:
    json.dump({
        "report_path": "report.md",            # relative to OUT_DIR
        "sub_questions": [a["q"] for a in answers],
    }, f, indent=2)

print(f"==> Done. Report: {report_path}")

Outputs & files

The platform reads two things at the end of a successful run:

  1. <outputs_dir>/outputs.json — a JSON object whose keys must match the outputs: map in your manifest. Missing keys appear as null.
  2. any other file in outputs_dir — kept around as run artifacts. The user can download them from the run viewer. Reference them from outputs.json by relative path (like report.md above) and the platform turns them into download links.

For a job-type block, write outputs and exit zero. For sessions and services, you write outputs continuously over the lifetime of the container and the platform tails them.

Test it locally

Two options. The fast path is gonkablocks exec — it runs your script as a real cloud Run with the same metering and key-minting, but the code stays on your machine. Perfect for the inner dev loop.

# run main.py the way the cloud will, but locally
INPUT_TOPIC="agent swarms" INPUT_DEPTH=2 \
  gonkablocks exec -- python main.py

You'll see a runs/ link in the output — open it to inspect every inference call, token count, cost, and the streamed stdout exactly as a real run would render. No deploy required.

The slower path is to actually build the Docker image locally:

docker build -t my-research-local .
docker run --rm \
  -e OPENAI_BASE_URL=$OPENAI_BASE_URL \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  -e INPUT_TOPIC="agent swarms" \
  -e INPUT_DEPTH=2 \
  -e GONKA_OUTPUTS_DIR=/out \
  -v $(pwd)/out:/out \
  my-research-local

This catches Dockerfile issues (missing dependencies, wrong Python version, etc.) before the platform's build daemon does. gonkablocks env prints the env exports for OPENAI_BASE_URL/OPENAI_API_KEY so you can source them.

Deploy

From the project directory:

gonkablocks deploy

What happens:

  1. The CLI validates your manifest.yaml against the schema. Bad fields fail here — fast.
  2. It tars the directory (skipping .git, node_modules, .venv) and uploads it.
  3. The platform builds the Docker image with its build daemon and stores it as a versioned tag. Live build logs stream to your terminal.
  4. When the build is ready, the new version becomes the "current" one for your block.

The block is now live at:

https://blocks.gonka.gg/blocks/<your-username>/my-research

Run it from anywhere:

gonkablocks run <your-username>/my-research \
  topic="agent swarms" depth=3

Iterate & version

Every deploy needs a unique version. We use semver, but the platform doesn't enforce conventions — just don't re-use a version. Bump the patch in manifest.yaml before each deploy:

# manifest.yaml
version: 0.1.1   # was 0.1.0

The previous version stays in the registry — old runs that referenced it will still work. The block's public page always serves the latest ready version unless the user explicitly picks an older one.

Forgot to bump? deploy errors with "version 0.1.0 already published — bump and try again".

Forking a block

If you started from someone else's public block (the Fork button on its page), it's now @yourusername/<slug> — same manifest, your account, your edits, your billing. The lineage is preserved on the block page.

Secrets

Don't put API keys in runtime.env or in your code. The manifest supports a dedicated secret input type that connects to the per-user secrets vault.

inputs:
  github_token:
    type: secret
    required: true
    description: A GitHub PAT for repo access.

At run time, the platform resolves the secret (auto-matching by name from the user's vault, or letting them paste a literal in the form) and injects it as INPUT_GITHUB_TOKEN — same env-var convention as any other input. The actual value never appears in logs, run events, or the manifest.

Users manage their secrets at /secrets; the input form auto-suggests a saved secret whose name matches the input key (case-insensitive).

Limits & guardrails

  • Spend cap — every run has a per-run cap (default 50¢ for anonymous, slider-controlled for signed-in users). The proxy refuses inference calls beyond the cap and marks the run failed. A runaway loop costs cents, not dollars.
  • Wall-clock timeout — set in the manifest (10s..7200s). Hard-kill on overrun.
  • Memory — 128MB to 16GB. OOM-kill on overrun. Default 2GB is fine for most LLM-only blocks.
  • Network policy deny by default; only the inference proxy is reachable. Set network: allow for fetching arbitrary URLs (e.g. a web-scraping block).
  • Sandbox — every run is a fresh container, no host filesystem access, no privileged mode. Optional gVisor isolation is configurable per-deployment.

Common pitfalls

  • openai.APIConnectionError: Connection refused when running locally — usually means you forgot to source gonkablocks env or to launch via gonkablocks exec (which sets OPENAI_BASE_URL / OPENAI_API_KEY for you).
  • httpx errors after openai upgrade — pin 'httpx<0.28' alongside your openai version. The newer httpx breaks the SDK's connection-pool path.
  • 502 "all wallets exhausted" or 504 timeouts — Gonka's validator pool transiently rate-limits. Wrap your inference calls in a 3-attempt exponential backoff. The agent-swarm source is a good copy/paste reference.
  • Manifest validation fails with cryptic Zod errors — check the field names and types match the table above. Common slip-ups: type: integer not type: int; semver 0.1.0 not 0.1.
  • Outputs come back as null — the run finished but didn't write outputs.json, or the keys don't match the manifest. Print the resolved path and the JSON you wrote at the end of your block to debug.
  • Container exits 0 but block reports failed — the orchestrator only marks success if outputs.json exists. Even for blocks with no declared outputs, write an empty {}.

Next steps

You shipped a block. Now make it useful to other people:

  • Mark it public from the block page settings — anyone can run it (anonymous quota: 10 runs/week per IP, signed-in users have their own quota).
  • Embed it on your own website with a four-line iframe — see the external-access guide.
  • Wire it into a multi-block workflow in the visual builder.
  • Promote it to a long-running service with a public HTTP endpoint, OpenAI-compatible streaming and per-route auth.