multipi Guide: Multi-Agent Orchestration for Local Open-Source LLMs in Pi CLI

Run multi-agent workflows with local open-source LLMs using multipi for Pi CLI. Capability-based model routing, SearXNG web search, zero API costs. Complete guide.

multipi is a multi-agent orchestration system for Pi CLI built specifically for users running local open-source LLMs via Ollama. It solves a fundamental problem: no single local model excels at everything. multipi provides capability-based model routing — different cognitive roles map to different models. The orchestrator and research roles use a 1T MoE model for deep reasoning. The planner and implementer use a 123B model for structured output. The reviewer uses a fast adversarial model for cheap but effective QA. All agents get web search via a local SearXNG metasearch instance. (Source: github.com/Ch3w3y/multipi)

The Real Problem

No single open-source model excels at everything. A 123B model might excel at structured planning but struggle with creative coding. A 1T MoE might be excellent at reasoning but too slow for rapid tool calls. According to Ch3w3y's analysis, 70% of multi-turn Pi sessions running local models fail because the assigned model is a poor fit for at least one subtask. The standard approach — one model per session — forces a compromise. multipi treats model selection as a routing decision.

[ STAT ] 70% of multi-turn Pi sessions with local models fail due to model-task mismatch. — Ch3w3y analysis, June 2026

What This Workflow Actually Does

multipi provides a state-machine pipeline with 7 agents, each routing to the model best suited for its cognitive role. All agents get web search via local SearXNG.

[TOOL: Orchestrator Agent] Uses kimi-k2.6 (1T MoE). State-machine conductor, gate enforcement, S0→S6 pipeline tracking.

[TOOL: Implementer Agent] Uses devstral-2 (123B). Code, tests, Docker, Makefile. Can fan out parallel instances.

[TOOL: Reviewer Agent] Uses deepseek-v4-flash. Fast, adversarial, cheap. Catches corner cases in budget.

Who This Is Built For

For Pi CLI developers running local Ollama models: multipi routes automatically — research uses the 1T MoE, planning uses the 123B, review uses the fast model.

For privacy-conscious developers: everything runs locally. SearXNG provides web search without external API calls.

For developers building complex software engineering pipelines: research → planning → implementation → review → verification, each stage with the optimal model.

How It Runs Step by Step

Pipeline Init (S0): The orchestrator analyzes the task and initializes the pipeline state machine.
Research (S1): The research agent queries SearXNG, fetches pages, and synthesizes findings with 256K context.
Planning (S2): The planner designs architecture, schema, and phased implementation plan.
Implementation (S3): The implementer writes code, tests, Dockerfiles. Can fan out parallel instances.
Review (S4): The reviewer performs adversarial QA — trying to break the implementation.
Verification (S5): The verifier runs tests and checks against acceptance criteria. Failures loop back to S3.
Completion (S6): The orchestrator synthesizes a completion report.

Setup and Tools

multipi: pi install npm:@chewey182/multipi. Requires Ollama with 5+ models. Gotcha: without recommended models, agents fall back with degraded performance.

SearXNG: docker run -d -p 8888:8080 searxng/searxng. Local metasearch. Gotcha: requires 2GB RAM minimum.

The Numbers

▸ Session failure rate: 70% single-model → under 15% with capability routing ▸ API cost: $5-20/session cloud → $0.00 self-hosted ▸ Research capability: 0% without web → 100% with SearXNG ▸ Pipeline completion: 30-40% single-model → 85%+ with S0→S6 ▸ Time to first ROI: first zero-cost multi-model pipeline run (Source: multipi docs, June 2026)

What It Cannot Do

Requires 5+ Ollama models consuming 50-200GB disk.
SearXNG adds infrastructure complexity — must be running before use.
Local inference is 5-20x slower than cloud APIs for equivalent quality.

Start in 10 Minutes

(2 min) Ensure Ollama is installed and pull your first model: ollama pull qwen2.5:7b
(5 min) Install multipi: pi install npm:@chewey182/multipi
(5 min) Start SearXNG: docker run -d -p 8888:8080 searxng/searxng

Frequently Asked Questions

Q: What models does multipi require? A: Recommended: kimi-k2.6 (orchestrator/research), devstral-2 (planner/implementer), deepseek-v4-flash (reviewer), gemini-3-flash (scout), glm-5.1 (worker). The routing config is customizable.

Q: Can multipi work without SearXNG? A: Yes, but research agents lose web search capability. Without web search, local models cannot find current information. SearXNG is highly recommended.

Q: Is multipi fully air-gap compatible? A: Yes. Ollama + SearXNG run entirely locally. No data leaves your network. No API keys required.