WorkFort Alpha Design
Date: 2026-02-17 Status: Approved
Goal
One agent producing code in a VM. After this milestone, WorkFort dogfoods itself — the most needed tools to develop WorkFort further are built next, and the self-improvement loop continues until the full vision is realized.
System Topology
graph TB
systemd["systemd --user"] -->|starts| Nexus["nexusd"]
Nexus -->|spawns & manages| FC1["Firecracker: Portal VM (CID 3)"]
Nexus -->|spawns & manages| FC2["Firecracker: Work VM (CID 4)"]
subgraph Portal["Portal VM"]
AgentRT["Agent Runtime (LLM client)"]
end
subgraph WorkVM["Work VM"]
GA["guest-agent"]
MCP["MCP Server (JSON-RPC 2.0)"]
PTY["PTY Manager"]
end
AgentRT -->|"MCP tool calls"| Nexus
Nexus -->|"vsock route"| GA
GA --- MCP
GA --- PTY
Nexus --- State["SQLite"]
Nexus --- Storage["btrfs subvolumes"]
Nexus --- Net["nexbr0 + nftables"]
Nexus is the vsock router — all inter-VM communication flows through it. Firecracker’s vsock only supports host-guest communication, so Nexus mediates on the host via Unix domain sockets with the CONNECT <port> protocol.
Nexus Daemon
nexusd is a single Rust binary, started by systemd at the user level:
systemctl --user start nexus.service
The binary gets CAP_NET_ADMIN via setcap for tap/bridge/nftables operations. btrfs subvolume operations work unprivileged via ioctls.
Responsibilities
- Firecracker process lifecycle (spawn, monitor, kill)
- vsock routing between VMs
- btrfs workspace management (create/snapshot/destroy subvolumes)
- PTY management per VM, exposed over WebSocket via the ttyd protocol
- State tracking in SQLite
- HTTP API on
127.0.0.1:9600 - Reconciliation loop: reads desired state, compares to running VMs, converges
Configuration
# /etc/nexus/nexus.yaml
storage:
root: /var/lib/nexus
workspaces: /var/lib/nexus/workspaces
network:
bridge: nexbr0
cidr: 172.16.0.0/24 # user-configurable, must be RFC 1918
api:
listen: 127.0.0.1:9600
firecracker:
binary: /usr/bin/firecracker
kernel: /var/lib/nexus/images/vmlinux
State (SQLite)
vms: id, name, role, cid, status, config_json, created_at
workspaces: id, vm_id, subvolume_path, base_image, created_at
routes: id, source_vm_id, target_vm_id, source_port, target_port
An abstraction layer over SQLite allows swapping to Postgres or etcd for clustering.
Guest Agent
A small Rust binary baked into work VM images. It is the VM’s interface to Nexus.
vsock Port Allocation
The guest-agent listens on well-known vsock ports. Each service gets its own independent connection — no application-layer multiplexing is needed because Firecracker’s vsock natively supports multiple concurrent connections via the CONNECT <port> protocol.
| Port | Purpose | Direction |
|---|---|---|
| 100 | Control channel (image metadata, health) | host → guest |
| 200 | MCP server (JSON-RPC 2.0) | host → guest |
| 300-399 | PTY sessions (one port per terminal attach) | host → guest |
| 500 | MCP client outbound (portal VMs) | guest → host |
Direction matters. Host-to-guest connections use the VM’s UDS with CONNECT <port>\n. Guest-to-host connections trigger Firecracker to connect to <uds_path>_<port> on the host, where Nexus must be listening.
The MCP routing chain for a portal VM calling a work VM:
graph LR
Agent["Agent Runtime"] -->|"vsock CID 2, port 500"| Portal["Portal VM"]
Portal -->|"uds_path_500"| Nexus
Nexus -->|"CONNECT 200"| Work["Work VM"]
Work --> GA["guest-agent (MCP)"]
Connection Pooling
Nexus maintains a connection pool per VM per port. The first vsock connection and initial message are ~50-100x slower than subsequent messages on an established connection (validated through cracker-barrel vsock benchmarking). Connections are established eagerly at boot and kept alive for the VM’s lifetime. Reconnection is automatic on failure.
MCP Tools (Alpha)
| Tool | Description |
|---|---|
file_read | Read file contents at path |
file_write | Write content to path |
file_delete | Delete file at path |
run_command | Execute command, return stdout/stderr/exit code |
Long-running commands stream stdout/stderr incrementally over the MCP channel.
Boot Sequence
- VM kernel boots, systemd starts
guest-agent.service - Guest-agent listens on vsock ports 100, 200, and 300+ (
VMADDR_CID_ANY) - Nexus connects via UDS +
CONNECT 100\n(control channel) - Guest-agent sends image metadata (parsed from
/etc/nexus/image.yaml) - Nexus registers the VM as ready, sets up routes
- Nexus opens additional connections as needed (MCP on port 200, PTY on 300+)
Image Metadata Standard
Each VM image declares its access contract:
# /etc/nexus/image.yaml (inside the VM rootfs)
name: workfort/code-agent
version: 0.1.0
access:
terminal: vsock-pty # or: ssh, none
mcp: vsock # tool-calling interface
ports:
http: 8080
metrics: 9090
Nexus reads this at boot and routes accordingly. If terminal: vsock-pty, Nexus can expose a ttyd WebSocket session with access control. If terminal: ssh, Nexus proxies SSH over WebSocket. If terminal: none, no terminal access.
A human developer can attach to any VM with terminal access:
nexusctl attach agent-code-1
The full chain:
graph LR
CLI["Browser / CLI"] -->|"WebSocket (ttyd)"| Nexus
Nexus -->|"vsock port 300+"| GA["guest-agent"]
GA -->|PTY| Shell["/bin/bash"]
Agent Runtimes
The MCP interface is agent-runtime agnostic. Any runtime that can speak MCP (directly or via adapter) works in a portal VM.
OpenClaw Integration
OpenClaw does not natively support MCP as a client. Integration is via an OpenClaw tool plugin:
Portal VM
└─ OpenClaw gateway (ws://127.0.0.1:18789)
└─ workfort-tools plugin
└─ Translates OpenClaw tool calls → MCP JSON-RPC
└─ Sends over vsock → Nexus → Work VM
Each MCP tool on the work VM gets registered as an OpenClaw tool via api.registerTool(). The plugin acts as a thin MCP client.
If OpenClaw ships native MCP client support, the plugin becomes unnecessary.
Portal VM Image Metadata
# portal VM image.yaml
name: workfort/portal-openclaw
version: 0.1.0
runtime: openclaw
access:
terminal: vsock-pty
mcp: vsock-client # this VM consumes MCP, not serves it
ports:
gateway: 18789
Networking
Each VM gets a tap device bridged through nexbr0.
graph TB
Internet -->|NAT| Host["Host (eth0)"]
Host --- Bridge["nexbr0 (172.16.0.0/24)"]
Bridge --- tap0["tap0 → Portal VM (172.16.0.10)"]
Bridge --- tap1["tap1 → Work VM (172.16.0.11)"]
Bridge --- tap2["tap2 → Service VM (172.16.0.12)"]
Nexus manages
- Bridge creation/teardown
- Tap device lifecycle (one per VM)
- IP assignment from the configured CIDR (stored in SQLite)
- NAT masquerade for outbound internet access
- nftables rules per VM for isolation
Configuration
network:
bridge: nexbr0
cidr: 172.16.0.0/24 # default from 172.16.0.0/12, user-configurable
gateway: 172.16.0.1 # derived from cidr
Nexus validates that the chosen block falls within RFC 1918 space (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16).
Service Discovery
VMs discover services by asking Nexus over the vsock control channel. Data flows over the bridge network.
// request
{"method": "service_lookup", "params": {"name": "git-server", "port": "http"}}
// response
{"result": {"address": "172.16.0.12", "port": 3000}}
Storage
btrfs subvolumes as workspaces with CoW snapshots.
Layout
/var/lib/nexus/
├── workspaces/
│ ├── @base-agent/ ← read-only base subvolume
│ ├── @work-code-1/ ← CoW snapshot of @base-agent
│ └── @portal-openclaw/ ← CoW snapshot of portal base
├── images/
│ └── vmlinux ← kernel
└── state/
└── nexus.db ← SQLite
How It Works
- Base images are btrfs subvolumes marked read-only
- New workspace =
btrfs subvolume snapshot(instant, zero disk cost) - Exposed to Firecracker as a block device via dm/nbd
- Workspace grows only as the agent writes — CoW keeps shared blocks shared
- Destroy =
btrfs subvolume delete
Operations
| Operation | What it does |
|---|---|
workspace create | Snapshot from a base image |
workspace snapshot | Checkpoint a running workspace |
workspace restore | Roll back to a previous snapshot |
workspace destroy | Delete subvolume |
workspace list | List all workspaces with disk usage |
API
HTTP REST on 127.0.0.1:9600. WebSocket upgrade for terminal sessions.
GET /v1/vms # list all VMs
POST /v1/vms # create VM
GET /v1/vms/:id # get VM status
DELETE /v1/vms/:id # destroy VM
POST /v1/vms/:id/start # start VM
POST /v1/vms/:id/stop # stop VM
GET /v1/workspaces # list workspaces
POST /v1/workspaces # create workspace
POST /v1/workspaces/:id/snapshot # checkpoint
POST /v1/workspaces/:id/restore # roll back
DELETE /v1/workspaces/:id # destroy
GET /v1/services # list registered services
GET /v1/routes # list vsock routes
GET /v1/vms/:id/terminal # WebSocket upgrade → ttyd session
CLI
nexusctl is a thin HTTP client:
nexusctl vm list
nexusctl vm create --image workfort/code-agent --name agent-1
nexusctl vm start agent-1
nexusctl attach agent-1
nexusctl workspace snapshot agent-1 --name "before-refactor"
Technology Stack
| Component | Crate / Tool | Version |
|---|---|---|
| Async runtime | tokio | 1.x |
| vsock | tokio-vsock | 0.7.x |
| cgroups | cgroups-rs | 0.5.x |
| Networking rules | nftables (JSON API) | 0.6.x |
| btrfs | libbtrfsutil | latest |
| PTY | nix::pty + AsyncFd | via nix |
| Terminal WS | ttyd protocol (DIY) | — |
| State store | rusqlite | latest |
| Serialization | serde + serde_json | 1.x |
| HTTP API | axum | latest |
| PGP signing | rpgp (pgp crate) | 0.19.x |
Repo Structure
WorkFort/
├── codex/ ← mdBook, documentation and plans
├── cracker-barrel/ ← Go, kernel build tool
└── nexus/ ← Rust workspace (to be created)
├── Cargo.toml ← workspace root
├── nexusd/ ← daemon binary
├── guest-agent/ ← MCP server binary for work VMs
└── nexus-lib/ ← shared types, vsock protocol, storage
Package Repository
A signed pacman mirror at packages.workfort.dev:
# /etc/pacman.conf
[workfort]
Server = https://packages.workfort.dev/$arch
SigLevel = Required DatabaseOptional
Installable on the host via pacman. Packages include nexus, guest-agent, and kernel images built by cracker-barrel.