What you'll build
A small autonomous research agent — give it a topic, it generates sub-questions, drafts answers using a model, and writes a markdown report. We'll mirror the open-source hive-research seed block so you can compare your code to a known-working reference at any point.
The same recipe works for anything that needs to call an LLM:
- summarisers, translators, extractors
- data-cleaning agents (CSV-in, Markdown-out)
- code reviewers, doc generators
- multi-step research / planning agents
Before you start
You'll need three things on your machine:
- Docker running locally (only needed for
gonkablocks exec— the deploy path builds remotely on the platform). - Python 3.10+ to run the example. The block itself runs in a container the platform builds, so the host version doesn't matter at deploy time.
- Gonkablocks CLI:
# install the CLI (one-line installer) curl -fsSL https://blocks.gonka.gg/install.sh | sh # or with npm npm install -g @gonkalabs/blocks-cli # log in once — opens a browser, mints a long-lived API key gonkablocks connect --server https://blocks.gonka.gg
connect stores the key in ~/.config/gonkablocks/config.json. It's the same key the CLI uses for deploy, exec, run, etc — there's nothing else to set up.
Pick a block type
Every block is one of five types — the same source code can be invoked in any of them, but you pick one as the "canonical" interface in the manifest:
| type | lifecycle | use it for |
|---|---|---|
job | one-shot, exits | scripts, agents, batch processors |
worker | scheduled (cron) | daily summaries, periodic ingest |
session | long-running, per-user | chats, notebooks, IDE-style tools |
service | long-running, shared | HTTP endpoints, MCP servers |
workflow | composes other blocks | visual DAGs in the canvas |
For our research agent we want type: job — it takes inputs, does its thing, finishes.
Project layout
Make a directory anywhere on your machine:
mkdir my-research && cd my-research touch manifest.yaml Dockerfile main.py
That's the entire skeleton. Three files. We'll fill them in next.
Optionally: anything else you need — extra Python sources, config files, prompts, sample data. The CLI tars up the whole directory on deploy, excluding node_modules, .git, and .venv.
Write manifest.yaml
The manifest declares what the block is. Inputs (what the user types in), outputs (what the block returns), what runtime, how much memory and CPU, network policy, pricing. Strict schema — invalid manifests fail at deploy time, not at run time.
name: my-research
version: 0.1.0
type: job
description: Multi-round research agent — generates sub-questions, drafts answers, synthesises a markdown report.
category: research
inputs:
topic:
type: string
required: true
description: The research topic to investigate.
depth:
type: integer
required: false
default: 3
description: Number of sub-questions to explore (1-8).
min: 1
max: 8
outputs:
report_path:
type: string
description: Path inside /out where the markdown report was written.
sub_questions:
type: json
description: The list of sub-questions actually investigated.
runtime:
build: dockerfile
entrypoint: python main.py
outputs_dir: /out
env:
MODEL: qwen3-235b
resources:
cpu: 1
memory_mb: 1024
timeout_seconds: 600
network: allow
pricing:
type: per_run
base_price_cents: 0
rate_cents_per_minute: 0
inference_pass_through: true
inference_markup_pct: 0
Field reference (the bits worth knowing)
- name — lowercase letters, digits, dashes only. Becomes the slug at
/blocks/<you>/<name>. Once published, don't rename — fork instead. - version — strict semver
x.y.z. Bump it for every deploy or the platform refuses the upload. - type — see the table above. We use
job. - inputs — each entry is
{ type, required, default, description, enum?, min?, max? }. Types:string,integer,number,boolean,secret,file. The platform surfaces these as a form on the block's public page. - outputs — declarative description of what your block produces. Types include
json(any JSON-serialisable value) andfile(a path insideoutputs_dir). - runtime.entrypoint — the command run inside the container. It runs as PID 1 — no shell expansion unless you wrap in
sh -c. - runtime.env — extra env vars baked into every run. Use this for non-secret config (model id, prompt versions). Secrets go through the secrets vault — see below.
- resources.timeout_seconds — hard kill if the run exceeds this. 10..7200, default 600. Add headroom for slow LLM calls.
- resources.network —
deny(default — only the inference proxy is reachable),allow(any host), or a list of host allowlists. Useallowsparingly; the tighter the better for security review. - pricing.inference_pass_through — if true (the default), the user is charged exactly what Gonka charges for their tokens, with no markup. Set false +
inference_markup_pctif you publish your block as a paid service.
Write the Dockerfile
Anything goes — Python, Node, Go, Rust, an existing image. The platform's build daemon pulls your base, copies the source, and tags it gonkablocks/<you>/<slug>:<version>.
FROM python:3.12-slim WORKDIR /workspace # pin the SDK and httpx — newer httpx breaks the openai 1.55 path RUN pip install --no-cache-dir 'openai==1.55.0' 'httpx<0.28' COPY . /workspace CMD ["python", "main.py"]
Tips that save time:
- Use
-slimbases (or even-alpine) — image size is billed and cold starts are faster on smaller layers. - Pin every dependency (pip, npm, etc). Floating versions break builds days later.
- Don't install build tools just to run the block. Use a builder stage if you need a compiler.
- You don't need to
EXPOSEa port for jobs — that's only for sessions/services.
Write the code
Three things every block does: read inputs, call the platform's inference proxy, write outputs.
Reading inputs
Each manifest input becomes an env var INPUT_<KEY_UPPERCASED>. That's it — no SDK to install just for input plumbing.
import os, sys
topic = os.environ.get("INPUT_TOPIC", "").strip()
depth = int(os.environ.get("INPUT_DEPTH", "3"))
if not topic:
print("ERROR: topic is required", file=sys.stderr)
sys.exit(1)Calling inference
The platform pre-injects OPENAI_BASE_URL and OPENAI_API_KEY pointed at its metering proxy. Any OpenAI-compatible client works — no SDK from us, just the regular openai package.
from openai import OpenAI
client = OpenAI(
base_url=os.environ["OPENAI_BASE_URL"],
api_key=os.environ["OPENAI_API_KEY"],
)
MODEL = os.environ.get("MODEL", "qwen3-235b")
def chat(system: str, user: str) -> str:
resp = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user},
],
temperature=0.4,
)
return resp.choices[0].message.content or ""Every call is metered — prompt + completion tokens, cost in cents, status, duration — and surfaced as a row in the run viewer. The user sees them in real time. There's nothing you need to do to wire this up; it's all in the proxy.
Available models today: qwen3-235b (alias for the full HuggingFace ID). Soon: Kimi K2.5 and K2.6. Image, video, embeddings, fine-tunes are on the Gonka roadmap — when they land, you swap the model id.
The agent loop (full main.py)
import json, os, sys
from openai import OpenAI
client = OpenAI(
base_url=os.environ["OPENAI_BASE_URL"],
api_key=os.environ["OPENAI_API_KEY"],
)
MODEL = os.environ.get("MODEL", "qwen3-235b")
OUT_DIR = os.environ.get("GONKA_OUTPUTS_DIR", "/out")
topic = os.environ.get("INPUT_TOPIC", "").strip()
depth = int(os.environ.get("INPUT_DEPTH", "3"))
if not topic:
print("ERROR: topic is required", file=sys.stderr); sys.exit(1)
print(f"==> Research topic: {topic}")
print(f"==> Depth: {depth}")
def chat(system: str, user: str) -> str:
resp = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user},
],
temperature=0.4,
)
return resp.choices[0].message.content or ""
print("==> Step 1: generating sub-questions…")
raw = chat(
"You are a research planner. Given a topic, list distinct"
" sub-questions that, taken together, would form a comprehensive"
" view. Output as a numbered list, one per line, no preamble.",
f"Topic: {topic}\n\nGenerate exactly {depth} sub-questions.",
)
questions = [
line.split(".", 1)[-1].strip().lstrip(")").strip()
for line in raw.splitlines()
if line.strip() and any(c.isdigit() for c in line[:3])
][:depth] or [topic]
for i, q in enumerate(questions, 1):
print(f" Q{i}: {q}")
print("==> Step 2: answering sub-questions…")
answers = []
for i, q in enumerate(questions, 1):
print(f" answering Q{i}…")
a = chat(
"You are a careful research analyst. Answer with concrete"
" details, examples, and nuance. Use markdown. 200-400 words.",
f"Topic: {topic}\n\nQuestion: {q}",
)
answers.append({"q": q, "a": a})
print("==> Step 3: synthesising report…")
synth = chat(
"You are an editor. Combine the question/answer pairs into a"
" single, coherent markdown report on the topic. Add a short"
" executive summary. Keep all the detail.",
json.dumps({"topic": topic, "answers": answers}, indent=2),
)
# --- write outputs --------------------------------------------------
os.makedirs(OUT_DIR, exist_ok=True)
report_path = os.path.join(OUT_DIR, "report.md")
with open(report_path, "w") as f:
f.write(synth)
# outputs.json: how the platform reads structured outputs back
with open(os.path.join(OUT_DIR, "outputs.json"), "w") as f:
json.dump({
"report_path": "report.md", # relative to OUT_DIR
"sub_questions": [a["q"] for a in answers],
}, f, indent=2)
print(f"==> Done. Report: {report_path}")Outputs & files
The platform reads two things at the end of a successful run:
<outputs_dir>/outputs.json— a JSON object whose keys must match theoutputs:map in your manifest. Missing keys appear asnull.- any other file in
outputs_dir— kept around as run artifacts. The user can download them from the run viewer. Reference them fromoutputs.jsonby relative path (likereport.mdabove) and the platform turns them into download links.
For a job-type block, write outputs and exit zero. For sessions and services, you write outputs continuously over the lifetime of the container and the platform tails them.
Test it locally
Two options. The fast path is gonkablocks exec — it runs your script as a real cloud Run with the same metering and key-minting, but the code stays on your machine. Perfect for the inner dev loop.
# run main.py the way the cloud will, but locally INPUT_TOPIC="agent swarms" INPUT_DEPTH=2 \ gonkablocks exec -- python main.py
You'll see a runs/ link in the output — open it to inspect every inference call, token count, cost, and the streamed stdout exactly as a real run would render. No deploy required.
The slower path is to actually build the Docker image locally:
docker build -t my-research-local . docker run --rm \ -e OPENAI_BASE_URL=$OPENAI_BASE_URL \ -e OPENAI_API_KEY=$OPENAI_API_KEY \ -e INPUT_TOPIC="agent swarms" \ -e INPUT_DEPTH=2 \ -e GONKA_OUTPUTS_DIR=/out \ -v $(pwd)/out:/out \ my-research-local
This catches Dockerfile issues (missing dependencies, wrong Python version, etc.) before the platform's build daemon does. gonkablocks env prints the env exports for OPENAI_BASE_URL/OPENAI_API_KEY so you can source them.
Deploy
From the project directory:
gonkablocks deploy
What happens:
- The CLI validates your
manifest.yamlagainst the schema. Bad fields fail here — fast. - It tars the directory (skipping
.git,node_modules,.venv) and uploads it. - The platform builds the Docker image with its build daemon and stores it as a versioned tag. Live build logs stream to your terminal.
- When the build is
ready, the new version becomes the "current" one for your block.
The block is now live at:
https://blocks.gonka.gg/blocks/<your-username>/my-research
Run it from anywhere:
gonkablocks run <your-username>/my-research \ topic="agent swarms" depth=3
Iterate & version
Every deploy needs a unique version. We use semver, but the platform doesn't enforce conventions — just don't re-use a version. Bump the patch in manifest.yaml before each deploy:
# manifest.yaml version: 0.1.1 # was 0.1.0
The previous version stays in the registry — old runs that referenced it will still work. The block's public page always serves the latest ready version unless the user explicitly picks an older one.
Forgot to bump? deploy errors with "version 0.1.0 already published — bump and try again".
Forking a block
If you started from someone else's public block (the Fork button on its page), it's now @yourusername/<slug> — same manifest, your account, your edits, your billing. The lineage is preserved on the block page.
Secrets
Don't put API keys in runtime.env or in your code. The manifest supports a dedicated secret input type that connects to the per-user secrets vault.
inputs:
github_token:
type: secret
required: true
description: A GitHub PAT for repo access.At run time, the platform resolves the secret (auto-matching by name from the user's vault, or letting them paste a literal in the form) and injects it as INPUT_GITHUB_TOKEN — same env-var convention as any other input. The actual value never appears in logs, run events, or the manifest.
Users manage their secrets at /secrets; the input form auto-suggests a saved secret whose name matches the input key (case-insensitive).
Limits & guardrails
- Spend cap — every run has a per-run cap (default 50¢ for anonymous, slider-controlled for signed-in users). The proxy refuses inference calls beyond the cap and marks the run failed. A runaway loop costs cents, not dollars.
- Wall-clock timeout — set in the manifest (10s..7200s). Hard-kill on overrun.
- Memory — 128MB to 16GB. OOM-kill on overrun. Default 2GB is fine for most LLM-only blocks.
- Network policy —
denyby default; only the inference proxy is reachable. Setnetwork: allowfor fetching arbitrary URLs (e.g. a web-scraping block). - Sandbox — every run is a fresh container, no host filesystem access, no privileged mode. Optional gVisor isolation is configurable per-deployment.
Common pitfalls
openai.APIConnectionError: Connection refusedwhen running locally — usually means you forgot to sourcegonkablocks envor to launch viagonkablocks exec(which setsOPENAI_BASE_URL/OPENAI_API_KEYfor you).httpxerrors afteropenaiupgrade — pin'httpx<0.28'alongside youropenaiversion. The newer httpx breaks the SDK's connection-pool path.- 502 "all wallets exhausted" or 504 timeouts — Gonka's validator pool transiently rate-limits. Wrap your inference calls in a 3-attempt exponential backoff. The agent-swarm source is a good copy/paste reference.
- Manifest validation fails with cryptic Zod errors — check the field names and types match the table above. Common slip-ups:
type: integernottype: int; semver0.1.0not0.1. - Outputs come back as
null— the run finished but didn't writeoutputs.json, or the keys don't match the manifest. Print the resolved path and the JSON you wrote at the end of your block to debug. - Container exits 0 but block reports failed — the orchestrator only marks success if
outputs.jsonexists. Even for blocks with no declared outputs, write an empty{}.
Next steps
You shipped a block. Now make it useful to other people:
- Mark it public from the block page settings — anyone can run it (anonymous quota: 10 runs/week per IP, signed-in users have their own quota).
- Embed it on your own website with a four-line iframe — see the external-access guide.
- Wire it into a multi-block workflow in the visual builder.
- Promote it to a long-running
servicewith a public HTTP endpoint, OpenAI-compatible streaming and per-route auth.