← Writing
Research

Villager, inside out: FastAPI control plane + LLM task graph + MCP tool runner

Jan 6, 2026 · 12 min read

Scope & intent: This is a defender‑focused technical teardown of the Python package villager (latest pre‑release commonly referenced: 0.2.1rc1). The goal is to understand architecture and risk, not to enable abuse. Use only in authorized environments.

This blog uses publicly available package metadata (PyPI), a reverse‑engineered GitHub mirror intended “for analysis,” and MCP documentation to explain how the moving parts fit together.


0) Why Villager is interesting

Villager is positioned as an “experimental technology project” on PyPI, shipped as a Python package with dependencies that look like a typical agent stack (FastAPI, Typer, LangChain, OpenAI client libs, MCP/FastMCP, etc.).

At a high level it resembles a common “agentic automation” pattern:

This architecture is the important part: agentic orchestration changes your risk model because any tool exposed to the agent becomes part of the agent’s attack surface.


1) The big picture architecture

The layers

  1. Interface layer (FastAPI + CLI)
    A server exposes endpoints like “create task,” “get status,” “get tree,” “stop task,” “get context.”

  2. Scheduler layer (task nodes + branching)
    A task is represented as a node in a task graph/tree, which may branch into subtasks based on LLM output.

  3. Execution layer (MCP client)
    The “hands” are not inside Villager itself; instead it calls MCP servers (e.g., browser automation, a controlled environment runner, etc.).

  4. (Optional) Local tool-call layer
    Many agent frameworks also implement internal tool execution (e.g., “call a function by name with JSON args”). In Villager-like systems, this is where risk spikes if the tool set includes subprocess or eval primitives.

Request lifecycle (conceptual flow)

Client
  |
  |  POST /task   (abstract, description, verification)
  v
FastAPI app
  |
  |  creates TaskNode and runs it in background
  v
TaskNode.execute()
  |
  |-- (branching?) -> create child TaskNodes and execute them
  |
  |-- else -> run_mcp_agent() -> McpClient.execute(prompt)
  |
  |-- judge step -> DONE / TURNING / IMPOSSIBLE (repeat or stop)
  v
Client polls:
  - status
  - task tree
  - context transcript

2) What MCP is and why it’s central here

MCP (Model Context Protocol) is an open protocol for connecting LLM applications to external tools and data sources through a standardized interface. It defines a host/client/server model and protocol semantics. (Think: a “universal adapter” for tools.)

Villager uses MCP to separate the orchestrator from the actual tool execution environment. This is a big deal:


3) The API control plane (FastAPI)

This is the “front door” pattern seen in many agent frameworks:

Example client requests (generic)

# create a task (example: innocuous compliance doc generation)
curl -s -X POST "http://127.0.0.1:37695/task"   -H "Content-Type: application/json"   -d '{
    "abstract": "Summarize our incident response runbook",
    "description": "Use only our internal documents. Output Markdown.",
    "verification": "Includes owners, escalation flow, and contact matrix."
  }'

# poll status
curl -s "http://127.0.0.1:37695/get/task/status?task_id=<TASK_ID>"

# fetch graph/tree
curl -s "http://127.0.0.1:37695/task/<TASK_ID>/tree"

# fetch context transcript
curl -s "http://127.0.0.1:37695/task/<TASK_ID>/context"

Engineering note: In prototype designs like this, tasks are often stored in memory (a module‑level dict). In production you’d persist tasks in Redis/DB and make IDs per request.


4) TaskNode: the planner / executor / judge loop

The core idea is a controller loop like:

1) Ask the LLM: Should I break this into subtasks?
2) If yes: create subtasks and run them.
3) If no: run a “do work” step via tools/MCP.
4) Ask the LLM: Is it done?
5) If not: iterate until max retries or failure.

Pseudocode you can recognize in many agent codebases

class TaskNode:
    def execute(self):
        self.status = "PROCESSING"

        branch = self.llm_should_branch(self.abstract, self.description)
        if branch.need_branching:
            self.children = [TaskNode(t) for t in branch.tasks]
            for child in self.children:
                child.execute()

        output = self.run_mcp_agent()
        verdict = self.llm_judge(output, self.verification)

        if verdict == "DONE":
            self.status = "DONE"
            return output
        elif verdict == "TURNING":
            return self.retry_loop()
        else:
            self.status = "IMPOSSIBLE"
            return output

This pattern is what matters, even if variable names differ.


5) MCP client: streaming tool execution

A common MCP gateway pattern is:

Toy MCP streaming consumer (safe example)

import json
import requests

def stream_mcp(base_url: str, payload: dict) -> str:
    """
    Safe-to-read example of how an HTTP streaming MCP-like gateway might be consumed.
    (Exact endpoints and schemas vary by implementation.)
    """
    out = []
    with requests.post(base_url, json=payload, stream=True, timeout=60) as r:
        r.raise_for_status()
        for line in r.iter_lines(decode_unicode=True):
            if not line:
                continue
            evt = json.loads(line)
            if evt.get("content"):
                out.append(evt["content"])
            if evt.get("done"):
                break
    return "".join(out)

Why defenders care: once this is wired to real tool servers, the orchestrator can cause real-world side effects. Your security posture becomes “how safe are the MCP servers, and how tight is authz?”


6) The “in-band tool call” pattern (%%{json}%%)

Some agent frameworks support tool calls by asking the model to emit a JSON blob in the middle of text, e.g.:

%%{"name":"SearchFastMcp","parameters":{"query":"auth"}}%%

Then code extracts the JSON and runs the corresponding function.

Minimal parser + allowlist executor (safe pattern)

import json, re
from typing import Any, Callable

TOOL_RX = re.compile(r"%%\s*(\{.*?\})\s*%%", re.DOTALL)

def extract_tool_calls(text: str) -> list[dict]:
    calls = []
    for m in TOOL_RX.finditer(text):
        calls.append(json.loads(m.group(1)))
    return calls

def run_tools(calls: list[dict], registry: dict[str, Callable[..., Any]]):
    results = []
    for c in calls:
        name = c.get("name")
        args = c.get("parameters") or {}
        if name not in registry:
            raise ValueError(f"Tool not allowed: {name}")
        results.append(registry[name](**args))
    return results

Defender takeaway: the parser is not scary; the tool registry is. If the registry includes shell execution, file access, credential access, network scanning, etc., the system must be treated like privileged code.


7) The reverse-engineered mirror’s warning: “callbacks” and data egress

The GitHub mirror you referenced was created “for analysis” and explicitly warns about multiple potential egress paths (proxying and webhooks).

Even if you never “use those tools,” defenders should assume:

Rule of thumb: if it can talk to the network, it can leak data—unless you lock down egress.


8) How to analyze Villager-like packages safely (with code)

A) Pin exact artifacts and verify hashes

1) Download wheels/sdists without executing code. 2) Verify file hashes against the package index metadata.

pip download --no-deps villager==0.2.1rc1
sha256sum villager-0.2.1rc1*.tar.gz villager-0.2.1rc1*.whl

B) Static scan for dangerous primitives

# static_scan.py
import pathlib, re

ROOT = pathlib.Path(".")  # point at the extracted sdist folder
PATTERNS = {
    "network": re.compile(r"\b(requests|httpx|urllib3|socket)\b"),
    "process": re.compile(r"\b(subprocess\.run|Popen|os\.system)\b"),
    "eval_exec": re.compile(r"\b(eval|exec)\b"),
    "secrets": re.compile(r"\b(api[_-]?key|token|secret|passwd|password)\b", re.I),
}

hits = {k: [] for k in PATTERNS}
for py in ROOT.rglob("*.py"):
    s = py.read_text(errors="ignore")
    for k, rx in PATTERNS.items():
        if rx.search(s):
            hits[k].append(str(py))

for k, files in hits.items():
    print(f"\n[{k}] {len(files)} files")
    for f in files[:40]:
        print(" ", f)

C) If you must execute: isolate hard


9) Building a safer “Villager-like” system (recommended hardening)

If you’re rebuilding this idea (and you probably should, instead of installing unknown packages):

1) Separate “planner” from “executor” with clear boundaries
2) Never expose “danger primitives” (shell, eval) directly to the model
3) Use structured tool calling (strict JSON schema, validation, allowlists)
4) Implement per-tool authz (who can call what, with what parameters)
5) Add an audit trail (tool called, args, time, outcome; redact secrets)
6) Default-deny egress and require explicit network allowlists
7) Make tasks durable (DB/queue), enforce per-request UUIDs, add auth to API


10) References

← All writing