Local-first · No cloud · No API keys

Cofiswarm

A thirteen-agent coding swarm that lives entirely on your own machine.

Dispatch a single prompt to specialized agents architect, programmer, security, reviewer and more orchestrated across four modes against local inference servers (llama.cpp · MLX · vLLM). Privacy-first, air-gappable, and instant.

Coficube — a cube-shaped coffee mug filled with black coffee
Coficube · the cube that powers the swarm
13
Specialized Agents
4
Orchestration Modes
3
Inference Engines
0
Cloud Calls

Your machine, your model, your code

Everything runs on Apple Silicon (MLX) or NVIDIA (vLLM) — nothing leaves the box.

13 Specialized Roles

Architect, foreman, programmer, security, database, frontend, reviewer, tester, debugger, optimizer, scout, mlx-scout and synthesis — each with a tuned system prompt.

Mix Backends Per Agent

Point any agent at any model. Blend MLX, llama.cpp and vLLM in a single swarm, with per-agent model overrides from the configure panel.

Multi-Turn Threads

Sessions auto-continue after the first broadcast. The conversation thread panel keeps full turn history, persisted to disk.

pgvector RAG Context

Per-agent codebase retrieval injects relevant context so every agent reasons over your repository, not a stale snapshot.

Live Metrics Dashboard

Monitor popout shows KV-cache pressure, unified-memory gauges and per-port MLX telemetry in real time.

Built for Privacy

No cloud dependency, no API keys, air-gappable. A React UI talking to a C++ coordinator over local ports.

Four ways to orchestrate

Selectable from the UI mode menu — broadcast, chain, reduce or route.

Flat
Broadcast the prompt to every agent in parallel.
Pipeline
Sequential chain; each agent builds on the last.
Cascade
Mixture-of-agents reduced by a synthesizer.
Router
A classifier picks the right agents by live load.