Villager, inside out: FastAPI control plane + LLM task graph + MCP tool runner
Scope & intent: This is a defender‑focused technical teardown of the Python package
villager(latest pre‑release commonly referenced: 0.2.1rc1). The goal is to understand architecture and risk, not to enable abuse. Use only in authorized environments.This blog uses publicly available package metadata (PyPI), a reverse‑engineered GitHub mirror intended “for analysis,” and MCP documentation to explain how the moving parts fit together.
0) Why Villager is interesting
Villager is positioned as an “experimental technology project” on PyPI, shipped as a Python package with dependencies that look like a typical agent stack (FastAPI, Typer, LangChain, OpenAI client libs, MCP/FastMCP, etc.).
At a high level it resembles a common “agentic automation” pattern:
- a web API to submit work,
- an LLM planner/judge loop to decide steps and completion,
- and a tool bridge (MCP) to execute actions in external systems.
This architecture is the important part: agentic orchestration changes your risk model because any tool exposed to the agent becomes part of the agent’s attack surface.
1) The big picture architecture
The layers
-
Interface layer (FastAPI + CLI)
A server exposes endpoints like “create task,” “get status,” “get tree,” “stop task,” “get context.” -
Scheduler layer (task nodes + branching)
A task is represented as a node in a task graph/tree, which may branch into subtasks based on LLM output. -
Execution layer (MCP client)
The “hands” are not inside Villager itself; instead it calls MCP servers (e.g., browser automation, a controlled environment runner, etc.). -
(Optional) Local tool-call layer
Many agent frameworks also implement internal tool execution (e.g., “call a function by name with JSON args”). In Villager-like systems, this is where risk spikes if the tool set includessubprocessorevalprimitives.
Request lifecycle (conceptual flow)
Client
|
| POST /task (abstract, description, verification)
v
FastAPI app
|
| creates TaskNode and runs it in background
v
TaskNode.execute()
|
|-- (branching?) -> create child TaskNodes and execute them
|
|-- else -> run_mcp_agent() -> McpClient.execute(prompt)
|
|-- judge step -> DONE / TURNING / IMPOSSIBLE (repeat or stop)
v
Client polls:
- status
- task tree
- context transcript
2) What MCP is and why it’s central here
MCP (Model Context Protocol) is an open protocol for connecting LLM applications to external tools and data sources through a standardized interface. It defines a host/client/server model and protocol semantics. (Think: a “universal adapter” for tools.)
Villager uses MCP to separate the orchestrator from the actual tool execution environment. This is a big deal:
- It enables powerful workflows without bundling tools inside the package.
- It also means the orchestrator can become a “universal remote” for anything the MCP servers expose.
3) The API control plane (FastAPI)
This is the “front door” pattern seen in many agent frameworks:
- submit a task,
- poll its status,
- fetch the task tree/graph,
- fetch the transcript/context,
- stop/interrupt execution.
Example client requests (generic)
# create a task (example: innocuous compliance doc generation)
curl -s -X POST "http://127.0.0.1:37695/task" -H "Content-Type: application/json" -d '{
"abstract": "Summarize our incident response runbook",
"description": "Use only our internal documents. Output Markdown.",
"verification": "Includes owners, escalation flow, and contact matrix."
}'
# poll status
curl -s "http://127.0.0.1:37695/get/task/status?task_id=<TASK_ID>"
# fetch graph/tree
curl -s "http://127.0.0.1:37695/task/<TASK_ID>/tree"
# fetch context transcript
curl -s "http://127.0.0.1:37695/task/<TASK_ID>/context"
Engineering note: In prototype designs like this, tasks are often stored in memory (a module‑level dict). In production you’d persist tasks in Redis/DB and make IDs per request.
4) TaskNode: the planner / executor / judge loop
The core idea is a controller loop like:
1) Ask the LLM: Should I break this into subtasks?
2) If yes: create subtasks and run them.
3) If no: run a “do work” step via tools/MCP.
4) Ask the LLM: Is it done?
5) If not: iterate until max retries or failure.
Pseudocode you can recognize in many agent codebases
class TaskNode:
def execute(self):
self.status = "PROCESSING"
branch = self.llm_should_branch(self.abstract, self.description)
if branch.need_branching:
self.children = [TaskNode(t) for t in branch.tasks]
for child in self.children:
child.execute()
output = self.run_mcp_agent()
verdict = self.llm_judge(output, self.verification)
if verdict == "DONE":
self.status = "DONE"
return output
elif verdict == "TURNING":
return self.retry_loop()
else:
self.status = "IMPOSSIBLE"
return output
This pattern is what matters, even if variable names differ.
5) MCP client: streaming tool execution
A common MCP gateway pattern is:
- send a request (prompt + config),
- consume a stream (SSE / newline‑delimited JSON),
- keep a transcript of messages and tool results.
Toy MCP streaming consumer (safe example)
import json
import requests
def stream_mcp(base_url: str, payload: dict) -> str:
"""
Safe-to-read example of how an HTTP streaming MCP-like gateway might be consumed.
(Exact endpoints and schemas vary by implementation.)
"""
out = []
with requests.post(base_url, json=payload, stream=True, timeout=60) as r:
r.raise_for_status()
for line in r.iter_lines(decode_unicode=True):
if not line:
continue
evt = json.loads(line)
if evt.get("content"):
out.append(evt["content"])
if evt.get("done"):
break
return "".join(out)
Why defenders care: once this is wired to real tool servers, the orchestrator can cause real-world side effects. Your security posture becomes “how safe are the MCP servers, and how tight is authz?”
6) The “in-band tool call” pattern (%%{json}%%)
Some agent frameworks support tool calls by asking the model to emit a JSON blob in the middle of text, e.g.:
%%{"name":"SearchFastMcp","parameters":{"query":"auth"}}%%
Then code extracts the JSON and runs the corresponding function.
Minimal parser + allowlist executor (safe pattern)
import json, re
from typing import Any, Callable
TOOL_RX = re.compile(r"%%\s*(\{.*?\})\s*%%", re.DOTALL)
def extract_tool_calls(text: str) -> list[dict]:
calls = []
for m in TOOL_RX.finditer(text):
calls.append(json.loads(m.group(1)))
return calls
def run_tools(calls: list[dict], registry: dict[str, Callable[..., Any]]):
results = []
for c in calls:
name = c.get("name")
args = c.get("parameters") or {}
if name not in registry:
raise ValueError(f"Tool not allowed: {name}")
results.append(registry[name](**args))
return results
Defender takeaway: the parser is not scary; the tool registry is. If the registry includes shell execution, file access, credential access, network scanning, etc., the system must be treated like privileged code.
7) The reverse-engineered mirror’s warning: “callbacks” and data egress
The GitHub mirror you referenced was created “for analysis” and explicitly warns about multiple potential egress paths (proxying and webhooks).
Even if you never “use those tools,” defenders should assume:
- imports can have side effects,
- configuration drift happens,
- agents sometimes “discover” and call tools you forgot existed.
Rule of thumb: if it can talk to the network, it can leak data—unless you lock down egress.
8) How to analyze Villager-like packages safely (with code)
A) Pin exact artifacts and verify hashes
1) Download wheels/sdists without executing code. 2) Verify file hashes against the package index metadata.
pip download --no-deps villager==0.2.1rc1
sha256sum villager-0.2.1rc1*.tar.gz villager-0.2.1rc1*.whl
B) Static scan for dangerous primitives
# static_scan.py
import pathlib, re
ROOT = pathlib.Path(".") # point at the extracted sdist folder
PATTERNS = {
"network": re.compile(r"\b(requests|httpx|urllib3|socket)\b"),
"process": re.compile(r"\b(subprocess\.run|Popen|os\.system)\b"),
"eval_exec": re.compile(r"\b(eval|exec)\b"),
"secrets": re.compile(r"\b(api[_-]?key|token|secret|passwd|password)\b", re.I),
}
hits = {k: [] for k in PATTERNS}
for py in ROOT.rglob("*.py"):
s = py.read_text(errors="ignore")
for k, rx in PATTERNS.items():
if rx.search(s):
hits[k].append(str(py))
for k, files in hits.items():
print(f"\n[{k}] {len(files)} files")
for f in files[:40]:
print(" ", f)
C) If you must execute: isolate hard
- run in a throwaway VM/container
- no host mounts (
~/.ssh, cloud creds, browser profiles) - run unprivileged user
- block outbound network except to a controlled MCP server
- record egress (pcap) for validation
9) Building a safer “Villager-like” system (recommended hardening)
If you’re rebuilding this idea (and you probably should, instead of installing unknown packages):
1) Separate “planner” from “executor” with clear boundaries
2) Never expose “danger primitives” (shell, eval) directly to the model
3) Use structured tool calling (strict JSON schema, validation, allowlists)
4) Implement per-tool authz (who can call what, with what parameters)
5) Add an audit trail (tool called, args, time, outcome; redact secrets)
6) Default-deny egress and require explicit network allowlists
7) Make tasks durable (DB/queue), enforce per-request UUIDs, add auth to API
10) References
- PyPI: https://pypi.org/project/villager/
- GitHub mirror (“for analysis”): https://github.com/gregcmartin/villager
- MCP (official): https://modelcontextprotocol.io/
- MCP spec (dated): https://modelcontextprotocol.io/specification/2025-11-25
- FastMCP docs: https://gofastmcp.com/
- FastMCP repo: https://github.com/jlowin/fastmcp