WorkFort Codex
WorkFort is an Arch Linux distribution purpose-built as an office for AI agents. Each agent gets its own Firecracker microVM — a private workspace with full system access — managed by the Nexus daemon.
This codex contains design documents, specifications, and plans for the WorkFort project.
Repositories
| Repo | Language | Purpose |
|---|---|---|
| codex | mdBook | Documentation and design plans |
| cracker-barrel | Go | Firecracker kernel build tool |
| nexus | Rust | VM management daemon + guest agent |
Architecture Overview
graph TB
subgraph Host["Host (Arch Linux, btrfs)"]
Nexus["nexusd"]
SQLite["SQLite"]
Bridge["nexbr0 (172.16.0.0/24)"]
subgraph Portal["Portal VM"]
Agent["Agent Runtime"]
end
subgraph Work["Work VM"]
GA["guest-agent (MCP server)"]
Tools["file R/W/D, run command"]
end
subgraph Services["Service VMs (later)"]
Git["Git Server"]
Tracker["Project Tracker"]
end
end
Nexus -->|vsock| Portal
Nexus -->|vsock| Work
Nexus -->|vsock| Services
Agent -->|MCP via Nexus| GA
GA --> Tools
Portal --- Bridge
Work --- Bridge
Services --- Bridge
Nexus --- SQLite
Core Principles
Scientific Method
This project is driven by the scientific method. A plan or hypothesis is the origin of any actions taken.
All work follows this process:
- Hypothesis/Plan: Define what you intend to build and why. State assumptions explicitly.
- Design: Specify the approach and expected outcomes
- Experimentation: Execute through deliberate experiments designed to test assumptions
- Observation: Measure and record results objectively
- Analysis: Evaluate whether results prove or disprove assumptions
- Iteration: Refine hypothesis based on experimental findings
Experimentation
Experimentation is the primary methodology for proving or disproving assumptions.
- Every assumption must be tested through experimentation
- Design experiments that can clearly validate or invalidate hypotheses
- Document expected outcomes before running experiments
- Record actual results, even if they contradict expectations
- Failed experiments are valuable — they disprove assumptions and guide better solutions
- Successful experiments validate assumptions and provide confidence to proceed
Before implementing:
- What assumptions are you making?
- How will you test these assumptions?
- What experiments can prove or disprove them?
After experimenting:
- What did the experiment reveal?
- Which assumptions were validated?
- Which assumptions were invalidated?
- What new questions emerged?
Non-Negotiables
- No code without a clear plan
- No changes without understanding purpose and expected impact
- No assumptions without testing
Security Through Isolation
WorkFort’s security model prioritizes hardware-enforced isolation over software-level controls:
- Work VMs execute agent tool calls in sandboxed environments
- Credentials never enter VMs — all credential operations happen in the host (Nexus)
- Communication via vsock eliminates network attack surface between Nexus and VMs
- Firecracker provides hypervisor-level isolation stronger than containers
- Portal/Work VM separation ensures the agent runtime cannot interfere with its own execution environment
The security boundary is the hypervisor, not kernel namespaces or cgroups.
Provider Independence
WorkFort’s architecture maintains independence from specific AI providers:
- MCP (JSON-RPC 2.0) is the tool-calling interface — not provider-specific formats
- Agent runtimes are pluggable — any runtime that speaks MCP works in a portal VM
- Same guest-agent binary works with any AI provider
- No provider SDKs in Work VMs — keeps VMs simple and focused
This separation ensures Work VMs remain generic execution environments while portal VMs and agent runtimes handle provider-specific concerns.
Centralized Control
All operations flow through Nexus for observability and policy enforcement:
- Token usage tracking across all AI conversations
- Cost attribution per user, project, or VM
- Rate limiting and quota management
- Security policies enforced at a single point
- Audit trail for all VM and credential operations
Nexus is the vsock router — all inter-VM communication flows through it. Distributed access to AI APIs would require instrumenting every VM and prevent centralized control features.
Clear Boundaries
The system is organized into distinct components with well-defined responsibilities:
- nexusctl: Thin CLI client, uses HTTP to communicate with Nexus
- nexusd: Control plane, orchestration, vsock routing, state persistence
- guest-agent: Execution interface within Work VMs (MCP server)
- nexus-lib: Shared types, vsock protocol, storage abstractions
This separation:
- Reduces cognitive load during development
- Minimizes AI assistant context (work on CLI without loading daemon code)
- Enables focused testing and debugging
- Allows independent evolution of components
Drives
Overview
Drives are persistent block devices that provide storage for VMs. They serve as both the bootable root filesystem and data storage mechanism within WorkFort’s architecture.
On the host, drives are backed by btrfs subvolumes — enabling instant CoW snapshots, zero-cost cloning, and efficient storage sharing. Firecracker VMs see them as standard block devices, exposed via dm/nbd.
Responsibilities
Bootable Root Filesystem
Drives created from base subvolumes provide the root filesystem for VMs:
- Work VMs boot from drives containing the execution environment and guest-agent
- Portal VMs boot from drives containing the agent runtime
- Service VMs boot from drives containing their application stack
Data Storage
Drives provide persistent storage independent of VM lifecycle:
- Data persists after VM shutdown
- Can be reused across multiple VM sessions
- Support both read-write and read-only modes
Data Movement Between VMs
Drives enable sequential data transfer between VMs:
- VM completes work and shuts down
- Drive detaches from terminated VM
- Drive attaches to new VM at boot
- New VM accesses data written by previous VM
This sequential access pattern is imposed by Firecracker’s security model — concurrent host/guest access is not supported.
Design
btrfs-Backed Storage
Unlike traditional ext4 image files, WorkFort’s drives are backed by btrfs subvolumes:
| Operation | Traditional | WorkFort (btrfs) |
|---|---|---|
| Create workspace | Copy full image (slow, full disk cost) | btrfs subvolume snapshot (instant, zero disk cost) |
| Storage sharing | OverlayFS layers | CoW — shared blocks stay shared |
| Cleanup | Delete image file | btrfs subvolume delete |
| Checkpoint | Copy image or OverlayFS snapshot | btrfs subvolume snapshot (instant) |
| Rollback | Restore from backup | Switch to previous snapshot |
Drive Types
Drives are distinguished by purpose:
- Boot drives: Created from read-only master image snapshots, contain bootable root filesystem with init system, tools, and (for work VMs) guest-agent
- Data drives: Created empty or populated with project data, used for workspace storage and transfer between VMs
Both are btrfs subvolumes exposed to Firecracker as block devices via dm/nbd.
Access Patterns
Drives follow a sequential access model:
1. Host prepares drive → Snapshot base subvolume or create empty
2. VM boots with drive → Drive attached before VM starts
3. VM operates on drive → Read/write within VM
4. VM shuts down → Drive detaches
5. Host or next VM uses → Snapshot, inspect, or attach to new VM
Constraint: No concurrent access. Host and guest cannot access the same drive simultaneously. This is a Firecracker security design decision, not a limitation being addressed.
Multiple Drives Per VM
VMs support multiple drive attachments:
- One bootable drive (required for VM boot)
- Additional data drives (workspace, shared datasets, outputs)
Example: Work VM with boot drive + workspace drive containing project source code.
Persistence Model
Drives are persistent resources:
- Survive VM termination
- Reusable across multiple VM sessions
- Managed independently of VM lifecycle
- Can accumulate data across multiple VM executions
Host Layout
/var/lib/nexus/
├── workspaces/
│ ├── @base-agent/ ← read-only master image
│ ├── @work-code-1/ ← CoW snapshot of @base-agent
│ └── @portal-openclaw/ ← CoW snapshot of portal master
├── images/
│ └── vmlinux ← kernel
└── state/
└── nexus.db ← SQLite
Relationship to Other Components
Drives connect multiple architecture components:
- Master images → Snapshot into Drives
- Drives → Exposed via dm/nbd → Attached to VMs at boot
- VMs → Managed by Nexus
- guest-agent (in Work VMs) → Operates on files within mounted Drives
Data Model
Overview
The data model defines how Nexus persists and manages state using SQLite. This includes VM configurations, workspace metadata, networking, and operational state.
An abstraction layer over SQLite allows swapping to Postgres or etcd for clustering.
Schema
-- Nexus Database Schema (Pre-Alpha)
--
-- During pre-alpha, schema changes are applied by:
-- 1. Updating this file
-- 2. Deleting the database file
-- 3. Restarting the daemon (schema recreates automatically)
-- Application settings (key-value store)
CREATE TABLE settings (
key TEXT PRIMARY KEY,
value TEXT NOT NULL,
type TEXT NOT NULL CHECK(type IN ('string', 'int', 'bool', 'json'))
);
-- Tags for organizational categorization
CREATE TABLE tags (
name TEXT PRIMARY KEY,
description TEXT,
color TEXT, -- hex color for UI display (e.g., "#FF5733")
text_color TEXT -- hex color for contrast (e.g., "#FFFFFF")
);
-- VMs: Firecracker microVM instances
CREATE TABLE vms (
id TEXT PRIMARY KEY,
name TEXT UNIQUE,
role TEXT NOT NULL CHECK(role IN ('portal', 'work', 'service')),
state TEXT NOT NULL CHECK(state IN ('created', 'running', 'stopped', 'crashed', 'failed')),
cid INTEGER NOT NULL UNIQUE, -- vsock context ID
vcpu_count INTEGER NOT NULL DEFAULT 1,
mem_size_mib INTEGER NOT NULL DEFAULT 128,
config_json TEXT, -- full Firecracker config snapshot
pid INTEGER, -- Firecracker process ID (NULL when not running)
socket_path TEXT, -- Firecracker API socket (NULL when not running)
uds_path TEXT, -- vsock UDS base path
console_log_path TEXT,
created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
updated_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
started_at INTEGER,
stopped_at INTEGER
);
-- Master images: read-only btrfs subvolumes
CREATE TABLE master_images (
id TEXT PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
subvolume_path TEXT NOT NULL UNIQUE,
size_bytes INTEGER,
created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))
);
-- Workspaces: btrfs subvolume snapshots assigned to VMs
CREATE TABLE workspaces (
id TEXT PRIMARY KEY,
name TEXT UNIQUE,
vm_id TEXT, -- NULL if unattached
subvolume_path TEXT NOT NULL UNIQUE,
master_image_id TEXT, -- master image this was snapshotted from
parent_workspace_id TEXT, -- NULL if snapshotted from base
size_bytes INTEGER,
is_root_device INTEGER NOT NULL DEFAULT 0 CHECK(is_root_device IN (0, 1)),
is_read_only INTEGER NOT NULL DEFAULT 0 CHECK(is_read_only IN (0, 1)),
attached_at INTEGER,
detached_at INTEGER,
created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
FOREIGN KEY (vm_id) REFERENCES vms(id) ON DELETE SET NULL,
FOREIGN KEY (master_image_id) REFERENCES master_images(id) ON DELETE RESTRICT,
FOREIGN KEY (parent_workspace_id) REFERENCES workspaces(id) ON DELETE SET NULL
);
-- VM boot history: tracks each boot/shutdown cycle
CREATE TABLE vm_boot_history (
id TEXT PRIMARY KEY,
vm_id TEXT NOT NULL,
boot_started_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
boot_stopped_at INTEGER,
exit_code INTEGER,
error_message TEXT,
console_log_path TEXT,
FOREIGN KEY (vm_id) REFERENCES vms(id) ON DELETE CASCADE
);
-- vsock routes: inter-VM communication mediated by Nexus
CREATE TABLE routes (
id TEXT PRIMARY KEY,
source_vm_id TEXT NOT NULL,
target_vm_id TEXT NOT NULL,
source_port INTEGER NOT NULL,
target_port INTEGER NOT NULL,
created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
FOREIGN KEY (source_vm_id) REFERENCES vms(id) ON DELETE CASCADE,
FOREIGN KEY (target_vm_id) REFERENCES vms(id) ON DELETE CASCADE,
UNIQUE (source_vm_id, source_port)
);
-- vsock services registered by guest agents
CREATE TABLE vsock_services (
id TEXT PRIMARY KEY,
vm_id TEXT NOT NULL,
port INTEGER NOT NULL,
service_name TEXT NOT NULL,
state TEXT NOT NULL DEFAULT 'stopped' CHECK(state IN ('listening', 'stopped')),
created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
FOREIGN KEY (vm_id) REFERENCES vms(id) ON DELETE CASCADE,
UNIQUE (vm_id, port)
);
-- Network bridges
CREATE TABLE bridges (
name TEXT PRIMARY KEY,
subnet TEXT NOT NULL, -- CIDR notation (e.g., "172.16.0.0/24")
gateway TEXT NOT NULL, -- gateway IP (e.g., "172.16.0.1")
interface TEXT NOT NULL, -- host interface name
created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))
);
-- VM network configuration
CREATE TABLE vm_network (
vm_id TEXT PRIMARY KEY,
ip_address TEXT NOT NULL,
bridge_name TEXT NOT NULL,
FOREIGN KEY (vm_id) REFERENCES vms(id) ON DELETE CASCADE,
FOREIGN KEY (bridge_name) REFERENCES bridges(name) ON DELETE RESTRICT
);
-- Firewall rules for VMs (nftables-based)
CREATE TABLE firewall_rules (
id TEXT PRIMARY KEY,
vm_id TEXT NOT NULL,
rule_order INTEGER NOT NULL,
action TEXT NOT NULL CHECK(action IN ('accept', 'drop', 'reject')),
protocol TEXT CHECK(protocol IN ('tcp', 'udp', 'icmp', 'all')),
source_ip TEXT,
source_port TEXT,
dest_ip TEXT,
dest_port TEXT,
description TEXT,
created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
FOREIGN KEY (vm_id) REFERENCES vms(id) ON DELETE CASCADE,
UNIQUE (vm_id, rule_order)
);
-- Tags (organizational)
CREATE TABLE vm_tags (
vm_id TEXT NOT NULL,
tag_name TEXT NOT NULL,
PRIMARY KEY (vm_id, tag_name),
FOREIGN KEY (vm_id) REFERENCES vms(id) ON DELETE CASCADE,
FOREIGN KEY (tag_name) REFERENCES tags(name) ON DELETE CASCADE
);
CREATE TABLE workspace_tags (
workspace_id TEXT NOT NULL,
tag_name TEXT NOT NULL,
PRIMARY KEY (workspace_id, tag_name),
FOREIGN KEY (workspace_id) REFERENCES workspaces(id) ON DELETE CASCADE,
FOREIGN KEY (tag_name) REFERENCES tags(name) ON DELETE CASCADE
);
-- Indexes
CREATE INDEX idx_vms_role ON vms(role);
CREATE INDEX idx_vms_state ON vms(state);
CREATE INDEX idx_workspaces_vm_id ON workspaces(vm_id);
CREATE INDEX idx_workspaces_base ON workspaces(master_image_id);
CREATE INDEX idx_vm_boot_history_vm_id ON vm_boot_history(vm_id);
CREATE INDEX idx_vsock_services_vm_id ON vsock_services(vm_id);
CREATE INDEX idx_routes_source ON routes(source_vm_id);
CREATE INDEX idx_routes_target ON routes(target_vm_id);
CREATE INDEX idx_firewall_rules_vm_id ON firewall_rules(vm_id);
CREATE INDEX idx_vm_tags_tag ON vm_tags(tag_name);
CREATE INDEX idx_workspace_tags_tag ON workspace_tags(tag_name);
-- Partial index: workspace can only be attached to one VM at a time
CREATE UNIQUE INDEX idx_workspace_current_attachment
ON workspaces(vm_id) WHERE vm_id IS NOT NULL AND detached_at IS NULL;
-- Partial index: each VM has only one root device
CREATE UNIQUE INDEX idx_vm_root_device
ON workspaces(vm_id) WHERE vm_id IS NOT NULL AND detached_at IS NULL AND is_root_device = 1;
Diagrams
Core Entity Relationships
erDiagram
vms ||--o{ workspaces : "has attached"
vms ||--o{ vm_boot_history : "boot history"
vms ||--o{ vsock_services : "runs services"
vms ||--o| vm_network : "has network"
master_images ||--o{ workspaces : "snapshot of"
workspaces ||--o{ workspaces : "derived from"
bridges ||--o{ vm_network : "provides connectivity"
vms ||--o{ firewall_rules : "has rules"
vms ||--o{ routes : "source"
vms ||--o{ routes : "target"
vms {
text id PK
text name
text role
text state
int cid
int vcpu_count
int mem_size_mib
int pid
}
master_images {
text id PK
text name
text subvolume_path
int size_bytes
}
workspaces {
text id PK
text name
text vm_id FK
text subvolume_path
text master_image_id FK
text parent_workspace_id FK
int is_root_device
int is_read_only
}
routes {
text id PK
text source_vm_id FK
text target_vm_id FK
int source_port
int target_port
}
vsock_services {
text id PK
text vm_id FK
int port
text service_name
text state
}
bridges {
text name PK
text subnet
text gateway
}
vm_network {
text vm_id PK
text ip_address
text bridge_name FK
}
firewall_rules {
text id PK
text vm_id FK
int rule_order
text action
text protocol
}
VM State Machine
stateDiagram-v2
[*] --> created: POST /v1/vms
created --> running: POST /v1/vms/:id/start
created --> failed: Boot failure (automatic)
created --> [*]: DELETE /v1/vms/:id
running --> stopped: POST /v1/vms/:id/stop
running --> crashed: (automatic)
stopped --> running: POST /v1/vms/:id/start
stopped --> [*]: DELETE /v1/vms/:id
crashed --> running: POST /v1/vms/:id/start
crashed --> [*]: DELETE /v1/vms/:id
failed --> [*]: DELETE /v1/vms/:id
States
| State | Description |
|---|---|
created | VM record exists, Firecracker process not started |
running | Firecracker process active, VM booted |
stopped | VM gracefully stopped via API, can be restarted |
crashed | VM terminated unexpectedly, can be restarted |
failed | VM failed to boot (e.g., bad workspace image) |
Valid Transitions
| From | To | Trigger |
|---|---|---|
created | running | POST /v1/vms/:id/start |
created | failed | Automatic (boot failure) |
created | (deleted) | DELETE /v1/vms/:id |
running | stopped | POST /v1/vms/:id/stop |
running | crashed | Automatic (unexpected termination) |
stopped | running | POST /v1/vms/:id/start |
stopped | (deleted) | DELETE /v1/vms/:id |
crashed | running | POST /v1/vms/:id/start |
crashed | (deleted) | DELETE /v1/vms/:id |
failed | (deleted) | DELETE /v1/vms/:id |
Constraints
- Cannot delete running VM: Must stop first (returns
409 Conflict) - Cannot start running VM: Already running (returns
409 Conflict) - Cannot manually transition to crashed or failed: Set automatically by Nexus
- Failed VMs can only be deleted: Boot failure requires recreating with a working workspace
CLI Design
Overview
nexusctl is the command-line interface for managing Firecracker microVMs through the Nexus daemon. It is a thin HTTP client – all state lives in nexusd, and nexusctl is stateless aside from configuration.
The recommended alias is nxc:
nexusctl vm list
nxc vm list # identical
Command Grammar
Every command follows noun-verb ordering:
nexusctl <resource> <action> [name] [flags]
Resources map directly to Nexus API entities. Actions are consistent across all resources.
Resources
| Resource | Shortname | Description |
|---|---|---|
vm | vm | Firecracker microVM instances |
workspace | ws | btrfs subvolume snapshots attached to VMs |
agent | a | Agent runtime sessions (portal + work VM pairs) |
image | img | Master images (read-only base subvolumes) |
network | net | Bridge and tap device configuration |
route | rt | vsock routes between VMs |
service | svc | vsock services registered by guest agents |
Shortnames work anywhere the full resource name works:
nexusctl ws list
nexusctl workspace list # identical
Standard Actions
Every resource supports a consistent set of verbs. Not every verb applies to every resource – attempting an unsupported action returns a clear error.
| Action | Description | Applies to |
|---|---|---|
list | List resources in a table | all |
create | Create a new resource | vm, workspace, agent, route |
inspect | Show detailed resource state | all |
delete | Remove a resource | all |
start | Start a stopped resource | vm, agent |
stop | Stop a running resource | vm, agent |
logs | Stream or tail logs | vm, agent |
Special Commands
Some commands live outside the resource-action pattern:
| Command | Description |
|---|---|
nexusctl attach <vm> | Open a terminal session to a VM (WebSocket/ttyd) |
nexusctl init | Interactive project setup wizard |
nexusctl apply | Apply declarative configuration from nexus.yaml |
nexusctl config <subcommand> | Manage CLI and daemon configuration |
nexusctl completion <shell> | Generate shell completions |
nexusctl version | Print version, daemon version, and API version |
nexusctl status | Quick system health check (daemon, VMs, network) |
Command Map
graph TD
nexusctl["nexusctl"]
nexusctl --> vm["vm"]
nexusctl --> ws["workspace (ws)"]
nexusctl --> agent["agent (a)"]
nexusctl --> image["image (img)"]
nexusctl --> network["network (net)"]
nexusctl --> route["route (rt)"]
nexusctl --> service["service (svc)"]
nexusctl --> config["config"]
nexusctl --> special["attach / init / apply / status"]
vm --> vm_actions["list | create | inspect | delete | start | stop | logs"]
ws --> ws_actions["list | create | inspect | delete | snapshot | restore"]
agent --> agent_actions["list | create | inspect | delete | start | stop | logs"]
image --> image_actions["list | inspect | delete | import"]
network --> network_actions["list | inspect"]
route --> route_actions["list | create | inspect | delete"]
service --> service_actions["list | inspect"]
config --> config_actions["list | edit | set | get"]
Output Formatting
Default: Columnar Tables
All list commands produce clean, aligned tables with no decoration:
$ nexusctl vm list
NAME ROLE STATE VCPU MEM IP AGE
agent-code-1 work running 2 512M 172.16.0.11 3h
portal-oc-1 portal running 1 256M 172.16.0.10 3h
git-server service stopped 1 128M 172.16.0.12 2d
The --output / -o flag controls format:
| Format | Flag | Description |
|---|---|---|
| Table | -o table | Default. Human-readable columns. |
| Wide | -o wide | Table with additional columns (IDs, paths, timestamps). |
| JSON | -o json | Full JSON array. Machine-readable. |
| YAML | -o yaml | Full YAML. Matches config file format. |
| Name | -o name | One resource name per line. For piping. |
$ nexusctl vm list -o name
agent-code-1
portal-oc-1
git-server
$ nexusctl vm list -o json
[
{
"name": "agent-code-1",
"role": "work",
"state": "running",
"vcpu_count": 2,
"mem_size_mib": 512,
"ip_address": "172.16.0.11",
"created_at": "2026-02-18T10:30:00Z"
}
]
JSON Field Selection
The --json flag selects specific fields and implies JSON output. Combine with --jq for inline filtering.
$ nexusctl vm list --json name,state,ip_address
[
{"name": "agent-code-1", "state": "running", "ip_address": "172.16.0.11"},
{"name": "portal-oc-1", "state": "running", "ip_address": "172.16.0.10"}
]
$ nexusctl vm list --json name,state --jq '.[] | select(.state == "running") | .name'
"agent-code-1"
"portal-oc-1"
Layered Detail
Information density increases through progressive commands:
graph LR
A["list"] -->|more columns| B["-o wide"]
B -->|single resource| C["inspect"]
C -->|machine parse| D["inspect -o json"]
$ nexusctl vm list # summary table
$ nexusctl vm list -o wide # adds ID, CID, PID, socket path
$ nexusctl vm inspect agent-code-1 # full detail, formatted
$ nexusctl vm inspect agent-code-1 -o json # full detail, structured
Color
Semantic colors convey state at a glance:
| Color | Meaning |
|---|---|
| Green | Running, success, healthy |
| Yellow | Pending, warning, created |
| Red | Error, stopped, crashed, failed |
| Dim | Metadata, secondary info, timestamps |
Color behavior:
- TTY detected: Colors enabled by default.
- Pipe / redirect: Colors disabled automatically.
NO_COLORenv set: Colors disabled (per no-color.org).FORCE_COLORenv set: Colors forced on regardless of TTY.
Progress Indicators
- Discrete steps:
Creating workspace... (2 of 4)with step descriptions. - Indeterminate waits: Spinner with elapsed time:
Waiting for VM to boot... (3.2s). - Non-TTY: Progress messages printed as plain lines, no ANSI escape sequences.
Interactive Behavior
TTY Gating
Every interactive prompt has a non-interactive equivalent. If stdin is not a TTY and a required value is missing, the command fails with an explicit error:
$ echo | nexusctl vm delete agent-code-1
Error: refusing to delete VM without confirmation
Run with --yes to skip confirmation: nexusctl vm delete agent-code-1 --yes
Override Flags
| Flag | Effect |
|---|---|
--yes / -y | Skip all confirmation prompts |
--interactive | Force interactive mode even without TTY |
--no-interactive | Force non-interactive mode even with TTY |
Destructive Operations
Commands that destroy data or stop running processes require confirmation:
$ nexusctl vm delete agent-code-1
VM "agent-code-1" is currently running with 1 attached workspace.
Delete this VM? This will stop it and detach all workspaces. [y/N] y
Deleted VM "agent-code-1"
Bypass with -y:
$ nexusctl vm delete agent-code-1 -y
Deleted VM "agent-code-1"
Error Messages
Every error has three parts: what failed, why it failed, and how to fix it.
$ nexusctl vm start agent-code-1
Error: cannot start VM "agent-code-1"
VM is already running (state: running, PID: 4821)
To restart, stop it first: nexusctl vm stop agent-code-1
$ nexusctl vm create --image nonexistent/image
Error: cannot create VM
Image "nonexistent/image" not found
Available images: nexusctl image list
$ nexusctl vm lis
Error: unknown action "lis" for resource "vm"
Did you mean: list
Available actions: list, create, inspect, delete, start, stop, logs
Error Output
- All errors go to stderr.
- All normal output goes to stdout.
- Non-zero exit codes on failure.
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | General error |
| 2 | Usage error (bad flags, missing args) |
| 3 | Daemon unreachable |
| 4 | Resource not found |
| 5 | Conflict (e.g., VM already running) |
Onboarding
First Run Detection
When nexusctl cannot reach nexusd, it detects whether the daemon is installed and guides the user:
$ nexusctl vm list
Error: cannot connect to Nexus daemon at 127.0.0.1:9600
The daemon does not appear to be running.
Start it: systemctl --user start nexus.service
Enable it: systemctl --user enable nexus.service
Check status: systemctl --user status nexus.service
If the daemon is not installed at all:
$ nexusctl vm list
Error: cannot connect to Nexus daemon at 127.0.0.1:9600
The nexus package does not appear to be installed.
Install it: sudo pacman -S nexus
Init Wizard
nexusctl init creates a project-level nexus.yaml through guided prompts:
$ nexusctl init
Nexus project setup
? Base image for work VMs: workfort/code-agent (default)
? vCPUs per work VM: 2
? Memory per work VM (MiB): 512
? Agent runtime: openclaw
Created nexus.yaml
Next steps:
nexusctl apply Apply this configuration
nexusctl agent create dev Create an agent from this config
Next-Command Hints
After successful operations, suggest the logical next step:
$ nexusctl vm create --image workfort/code-agent --name agent-code-1
Created VM "agent-code-1" (state: created)
Start it: nexusctl vm start agent-code-1
$ nexusctl vm start agent-code-1
Started VM "agent-code-1" (state: running, IP: 172.16.0.11)
Attach terminal: nexusctl attach agent-code-1
View logs: nexusctl vm logs agent-code-1
Hints are suppressed in non-TTY environments and when using -o json or -o name.
Configuration
Format
All configuration is YAML. No exceptions.
Precedence
Configuration resolves in this order (highest wins):
graph TD
A["CLI flags"] --> B["Environment variables"]
B --> C["Project config ./nexus.yaml"]
C --> D["User config ~/.config/nexusctl/config.yaml"]
D --> E["System config /etc/nexus/nexus.yaml"]
E --> F["Built-in defaults"]
style A fill:#2d6,stroke:#333,color:#000
style F fill:#999,stroke:#333,color:#000
- CLI flags –
--api-url,--output, etc. - Environment variables –
NEXUS_API_URL,NEXUS_OUTPUT, etc. - Project config –
./nexus.yamlin the current directory (or parent search). - User config –
~/.config/nexusctl/config.yaml(respectsXDG_CONFIG_HOME). - System config –
/etc/nexus/nexus.yaml. - Built-in defaults.
Environment Variables
All env vars use the NEXUS_ prefix. Flag names map to env vars by uppercasing and replacing hyphens with underscores:
| Flag | Env Var |
|---|---|
--api-url | NEXUS_API_URL |
--output | NEXUS_OUTPUT |
--no-color | NEXUS_NO_COLOR |
User Config
# ~/.config/nexusctl/config.yaml
api_url: "http://127.0.0.1:9600"
output: table
defaults:
vm:
vcpu_count: 2
mem_size_mib: 512
workspace:
base_image: workfort/code-agent
Config Commands
$ nexusctl config list
KEY SOURCE VALUE
api_url /etc/nexus/nexus.yaml http://127.0.0.1:9600
output default table
defaults.vm.vcpu ~/.config/nexusctl/config.yaml 2
defaults.vm.mem ~/.config/nexusctl/config.yaml 512
$ nexusctl config get api_url
http://127.0.0.1:9600
$ nexusctl config set defaults.vm.vcpu_count 4
Set defaults.vm.vcpu_count = 4 in ~/.config/nexusctl/config.yaml
$ nexusctl config edit
# opens ~/.config/nexusctl/config.yaml in $EDITOR
config list shows every effective value and which source it came from. This eliminates the “where is this setting coming from?” problem.
Declarative Configuration
Project File
A nexus.yaml in the project root declares the desired infrastructure:
# nexus.yaml
version: 1
agents:
dev:
image: workfort/code-agent
vcpu: 2
mem: 512
runtime: openclaw
workspaces:
- name: code
base: workfort/code-agent
- name: data
base: empty
size: 2G
network:
bridge: nexbr0
cidr: 172.16.0.0/24
Apply with Diff and Dry Run
$ nexusctl apply --dry-run
Comparing nexus.yaml against current state...
+ create agent "dev"
+ create VM "dev-portal" (portal, 1 vCPU, 256M)
+ create VM "dev-work" (work, 2 vCPU, 512M)
+ create workspace "dev-code" from workfort/code-agent
+ create workspace "dev-data" (empty, 2G)
+ create route dev-portal:9000 → dev-work:9000
No changes applied (dry run)
$ nexusctl apply
Comparing nexus.yaml against current state...
+ create agent "dev"
+ create VM "dev-portal" (portal, 1 vCPU, 256M)
+ create VM "dev-work" (work, 2 vCPU, 512M)
+ create workspace "dev-code" from workfort/code-agent
+ create workspace "dev-data" (empty, 2G)
+ create route dev-portal:9000 → dev-work:9000
Apply these changes? [y/N] y
Creating agent "dev"... done (2.1s)
Agent "dev" is running.
Attach: nexusctl attach dev
Shell Completion
Generation
$ nexusctl completion bash >> ~/.bashrc
$ nexusctl completion zsh > ~/.zfunc/_nexusctl
$ nexusctl completion fish > ~/.config/fish/completions/nexusctl.fish
Dynamic Completions
Completions query the daemon for live state. Tab-completing a VM name fetches the current VM list:
$ nexusctl vm inspect <TAB>
agent-code-1 (work, running)
portal-oc-1 (portal, running)
git-server (service, stopped)
Zsh completions include descriptions. Resource shortnames complete identically to full names.
Help System
Progressive Disclosure
Default help shows only core commands. Advanced usage is available but not shown upfront.
$ nexusctl --help
nexusctl - manage Firecracker microVMs via Nexus
Usage: nexusctl <resource> <action> [name] [flags]
Core Commands:
vm Manage virtual machines
workspace Manage btrfs workspaces
agent Manage agent sessions
attach Open terminal to a VM
Getting Started:
init Set up a new project
apply Apply declarative configuration
status System health check
Configuration:
config View and edit settings
completion Generate shell completions
Run 'nexusctl <resource> --help' for resource-specific actions.
Run 'nexusctl help --all' for the full command list.
Full help includes every resource, action, and global flag:
$ nexusctl help --all
Resource Help
Each resource shows its actions, 2-3 usage examples, and flag groups:
$ nexusctl vm --help
Manage Firecracker virtual machines
Usage: nexusctl vm <action> [name] [flags]
Actions:
list List all VMs
create Create a new VM
inspect Show VM details
delete Remove a VM
start Start a stopped VM
stop Stop a running VM
logs View VM console logs
Examples:
nexusctl vm list
nexusctl vm create --image workfort/code-agent --name my-vm
nexusctl vm create my-vm # uses default image
nexusctl vm inspect my-vm -o json
Basic Flags:
-o, --output <format> Output format: table, wide, json, yaml, name
--json <fields> Select JSON fields (implies -o json)
--jq <expr> Filter JSON output with jq expression
Resource Flags:
--image <name> Base image for the VM
--role <role> VM role: work, portal, service (default: work)
--vcpu <n> vCPU count (default: 1)
--mem <mib> Memory in MiB (default: 128)
Advanced Flags:
--cid <n> Override vsock context ID (auto-assigned by default)
--no-start Create without starting
--dry-run Show what would happen without executing
Flag Grouping
Flags in --help are organized into groups: basic, resource-specific, and advanced. Basic flags appear on every command. Advanced flags are things most users never touch.
Smart Defaults
The CLI minimizes required input. Sensible defaults make the common case trivial:
$ nexusctl vm create my-vm
This single command:
- Uses the default image from config (or
workfort/code-agent) - Assigns role
work - Allocates 1 vCPU, 128 MiB memory (or user config defaults)
- Auto-assigns a vsock CID
- Auto-assigns an IP address
- Creates and starts the VM
Explicit flags override any default:
$ nexusctl vm create my-vm --vcpu 4 --mem 1024 --role portal --no-start
Dry Run
All commands that create, modify, or destroy state support --dry-run:
$ nexusctl vm delete agent-code-1 --dry-run
Would delete VM "agent-code-1" (state: running)
Would stop VM first
Would detach workspace "agent-code-1-ws" (data preserved)
No changes applied (dry run)
$ nexusctl ws create --base workfort/code-agent --name new-ws --dry-run
Would create workspace "new-ws"
Source: workfort/code-agent
Type: btrfs snapshot
No changes applied (dry run)
--dry-run returns exit code 0 on success (the operation would succeed) and non-zero if it would fail.
Composability
Piping
-o name produces one name per line for piping:
$ nexusctl vm list -o name | xargs -I{} nexusctl vm stop {}
$ nexusctl vm list --json name,state --jq '.[] | select(.state == "stopped") | .name' \
| xargs -I{} nexusctl vm delete {} -y
Scripting
Structured JSON output and consistent exit codes make nexusctl scriptable:
#!/bin/bash
set -e
# Create and wait for VM
nexusctl vm create --image workfort/code-agent --name build-vm -o json > /dev/null
nexusctl vm start build-vm
# Check state
state=$(nexusctl vm inspect build-vm --json state --jq '.state' -r)
if [ "$state" != "running" ]; then
echo "VM failed to start" >&2
exit 1
fi
# Do work...
nexusctl attach build-vm --exec "make build"
# Cleanup
nexusctl vm delete build-vm -y
Exit Codes in Conditionals
if nexusctl vm inspect my-vm &>/dev/null; then
echo "VM exists"
else
echo "VM not found, creating..."
nexusctl vm create my-vm
fi
Performance Targets
| Operation | Target |
|---|---|
| CLI startup (parse args, load config) | < 50ms |
vm list (daemon query + render) | < 200ms |
vm inspect (daemon query + render) | < 100ms |
| Shell completion (query + return) | < 300ms |
apply diff computation | < 500ms |
The CLI does no heavy computation. It parses arguments, reads config, makes one HTTP call, and formats the response. If it feels slow, the daemon is the bottleneck, not the CLI.
Architecture
graph LR
subgraph CLI["nexusctl"]
Args["Arg Parser"]
Config["Config Loader"]
Client["HTTP Client"]
Formatter["Output Formatter"]
end
subgraph Daemon["nexusd"]
API["HTTP API :9600"]
end
Args --> Client
Config --> Client
Client -->|"HTTP/JSON"| API
API -->|"JSON response"| Formatter
Formatter -->|"table / json / yaml"| Stdout["stdout"]
Formatter -->|"errors"| Stderr["stderr"]
nexusctl is a single static binary. No runtime dependencies beyond libc. The binary contains:
- Argument parser – Clap with derive macros. Handles subcommands, aliases, shortnames, completions.
- Config loader – Reads YAML from all sources, merges by precedence, resolves env vars.
- HTTP client – reqwest (blocking). One request per command invocation. WebSocket for
attach. - Output formatter – Renders JSON responses into the requested format. Handles color, alignment, truncation.
Startup Sequence
1. Parse args (~5ms)
2. Load config (flags + env + files) (~10ms)
3. HTTP request to daemon (network-bound)
4. Format + print response (~5ms)
No daemon connection is made for --help, --version, completion, or config commands.
Full Command Reference
nexusctl vm list [--role <role>] [--state <state>]
nexusctl vm create [name] [--image <img>] [--role <role>] [--vcpu <n>] [--mem <mib>]
nexusctl vm inspect <name>
nexusctl vm delete <name> [-y]
nexusctl vm start <name>
nexusctl vm stop <name> [--force]
nexusctl vm logs <name> [--follow] [--tail <n>]
nexusctl workspace list [--vm <name>]
nexusctl workspace create [name] [--base <image>] [--size <size>]
nexusctl workspace inspect <name>
nexusctl workspace delete <name> [-y]
nexusctl workspace snapshot <name> [--tag <label>]
nexusctl workspace restore <name> --snapshot <tag>
nexusctl agent list
nexusctl agent create [name] [--image <img>] [--runtime <rt>]
nexusctl agent inspect <name>
nexusctl agent delete <name> [-y]
nexusctl agent start <name>
nexusctl agent stop <name>
nexusctl agent logs <name> [--follow] [--tail <n>]
nexusctl image list
nexusctl image inspect <name>
nexusctl image delete <name> [-y]
nexusctl image import <path> [--name <name>]
nexusctl network list
nexusctl network inspect <bridge>
nexusctl route list [--vm <name>]
nexusctl route create --from <vm:port> --to <vm:port>
nexusctl route inspect <id>
nexusctl route delete <id> [-y]
nexusctl service list [--vm <name>]
nexusctl service inspect <name>
nexusctl attach <vm> [--exec <command>]
nexusctl init
nexusctl apply [--dry-run] [-y] [-f <file>]
nexusctl status
nexusctl version
nexusctl config list
nexusctl config get <key>
nexusctl config set <key> <value>
nexusctl config edit
nexusctl completion bash|zsh|fish
Global Flags
These flags are available on every command:
-o, --output <format> Output format: table, wide, json, yaml, name
--json <fields> Select JSON fields (comma-separated)
--jq <expr> jq filter expression (requires --json or -o json)
--api-url <url> Nexus daemon URL (default: http://127.0.0.1:9600)
--no-color Disable colored output
--no-interactive Disable interactive prompts
-y, --yes Skip confirmation prompts
--dry-run Preview changes without applying
-v, --verbose Increase log verbosity (repeatable: -vvv)
-q, --quiet Suppress non-error output
-h, --help Show help
--version Show version
WorkFort Alpha Design
Date: 2026-02-17 Status: Approved
Goal
One agent producing code in a VM. After this milestone, WorkFort dogfoods itself — the most needed tools to develop WorkFort further are built next, and the self-improvement loop continues until the full vision is realized.
System Topology
graph TB
systemd["systemd --user"] -->|starts| Nexus["nexusd"]
Nexus -->|spawns & manages| FC1["Firecracker: Portal VM (CID 3)"]
Nexus -->|spawns & manages| FC2["Firecracker: Work VM (CID 4)"]
subgraph Portal["Portal VM"]
AgentRT["Agent Runtime (LLM client)"]
end
subgraph WorkVM["Work VM"]
GA["guest-agent"]
MCP["MCP Server (JSON-RPC 2.0)"]
PTY["PTY Manager"]
end
AgentRT -->|"MCP tool calls"| Nexus
Nexus -->|"vsock route"| GA
GA --- MCP
GA --- PTY
Nexus --- State["SQLite"]
Nexus --- Storage["btrfs subvolumes"]
Nexus --- Net["nexbr0 + nftables"]
Nexus is the vsock router — all inter-VM communication flows through it. Firecracker’s vsock only supports host-guest communication, so Nexus mediates on the host via Unix domain sockets with the CONNECT <port> protocol.
Nexus Daemon
nexusd is a single Rust binary, started by systemd at the user level:
systemctl --user start nexus.service
The binary gets CAP_NET_ADMIN via setcap for tap/bridge/nftables operations. btrfs subvolume operations work unprivileged via ioctls.
Responsibilities
- Firecracker process lifecycle (spawn, monitor, kill)
- vsock routing between VMs
- btrfs workspace management (create/snapshot/destroy subvolumes)
- PTY management per VM, exposed over WebSocket via the ttyd protocol
- State tracking in SQLite
- HTTP API on
127.0.0.1:9600 - Reconciliation loop: reads desired state, compares to running VMs, converges
Configuration
# /etc/nexus/nexus.yaml
storage:
root: /var/lib/nexus
workspaces: /var/lib/nexus/workspaces
network:
bridge: nexbr0
cidr: 172.16.0.0/24 # user-configurable, must be RFC 1918
api:
listen: 127.0.0.1:9600
firecracker:
binary: /usr/bin/firecracker
kernel: /var/lib/nexus/images/vmlinux
State (SQLite)
vms: id, name, role, cid, status, config_json, created_at
workspaces: id, vm_id, subvolume_path, base_image, created_at
routes: id, source_vm_id, target_vm_id, source_port, target_port
An abstraction layer over SQLite allows swapping to Postgres or etcd for clustering.
Guest Agent
A small Rust binary baked into work VM images. It is the VM’s interface to Nexus.
vsock Port Allocation
The guest-agent listens on well-known vsock ports. Each service gets its own independent connection — no application-layer multiplexing is needed because Firecracker’s vsock natively supports multiple concurrent connections via the CONNECT <port> protocol.
| Port | Purpose | Direction |
|---|---|---|
| 100 | Control channel (image metadata, health) | host → guest |
| 200 | MCP server (JSON-RPC 2.0) | host → guest |
| 300-399 | PTY sessions (one port per terminal attach) | host → guest |
| 500 | MCP client outbound (portal VMs) | guest → host |
Direction matters. Host-to-guest connections use the VM’s UDS with CONNECT <port>\n. Guest-to-host connections trigger Firecracker to connect to <uds_path>_<port> on the host, where Nexus must be listening.
The MCP routing chain for a portal VM calling a work VM:
graph LR
Agent["Agent Runtime"] -->|"vsock CID 2, port 500"| Portal["Portal VM"]
Portal -->|"uds_path_500"| Nexus
Nexus -->|"CONNECT 200"| Work["Work VM"]
Work --> GA["guest-agent (MCP)"]
Connection Pooling
Nexus maintains a connection pool per VM per port. The first vsock connection and initial message are ~50-100x slower than subsequent messages on an established connection (validated through cracker-barrel vsock benchmarking). Connections are established eagerly at boot and kept alive for the VM’s lifetime. Reconnection is automatic on failure.
MCP Tools (Alpha)
| Tool | Description |
|---|---|
file_read | Read file contents at path |
file_write | Write content to path |
file_delete | Delete file at path |
run_command | Execute command, return stdout/stderr/exit code |
Long-running commands stream stdout/stderr incrementally over the MCP channel.
Boot Sequence
- VM kernel boots, systemd starts
guest-agent.service - Guest-agent listens on vsock ports 100, 200, and 300+ (
VMADDR_CID_ANY) - Nexus connects via UDS +
CONNECT 100\n(control channel) - Guest-agent sends image metadata (parsed from
/etc/nexus/image.yaml) - Nexus registers the VM as ready, sets up routes
- Nexus opens additional connections as needed (MCP on port 200, PTY on 300+)
Image Metadata Standard
Each VM image declares its access contract:
# /etc/nexus/image.yaml (inside the VM rootfs)
name: workfort/code-agent
version: 0.1.0
access:
terminal: vsock-pty # or: ssh, none
mcp: vsock # tool-calling interface
ports:
http: 8080
metrics: 9090
Nexus reads this at boot and routes accordingly. If terminal: vsock-pty, Nexus can expose a ttyd WebSocket session with access control. If terminal: ssh, Nexus proxies SSH over WebSocket. If terminal: none, no terminal access.
A human developer can attach to any VM with terminal access:
nexusctl attach agent-code-1
The full chain:
graph LR
CLI["Browser / CLI"] -->|"WebSocket (ttyd)"| Nexus
Nexus -->|"vsock port 300+"| GA["guest-agent"]
GA -->|PTY| Shell["/bin/bash"]
Agent Runtimes
The MCP interface is agent-runtime agnostic. Any runtime that can speak MCP (directly or via adapter) works in a portal VM.
OpenClaw Integration
OpenClaw does not natively support MCP as a client. Integration is via an OpenClaw tool plugin:
Portal VM
└─ OpenClaw gateway (ws://127.0.0.1:18789)
└─ workfort-tools plugin
└─ Translates OpenClaw tool calls → MCP JSON-RPC
└─ Sends over vsock → Nexus → Work VM
Each MCP tool on the work VM gets registered as an OpenClaw tool via api.registerTool(). The plugin acts as a thin MCP client.
If OpenClaw ships native MCP client support, the plugin becomes unnecessary.
Portal VM Image Metadata
# portal VM image.yaml
name: workfort/portal-openclaw
version: 0.1.0
runtime: openclaw
access:
terminal: vsock-pty
mcp: vsock-client # this VM consumes MCP, not serves it
ports:
gateway: 18789
Networking
Each VM gets a tap device bridged through nexbr0.
graph TB
Internet -->|NAT| Host["Host (eth0)"]
Host --- Bridge["nexbr0 (172.16.0.0/24)"]
Bridge --- tap0["tap0 → Portal VM (172.16.0.10)"]
Bridge --- tap1["tap1 → Work VM (172.16.0.11)"]
Bridge --- tap2["tap2 → Service VM (172.16.0.12)"]
Nexus manages
- Bridge creation/teardown
- Tap device lifecycle (one per VM)
- IP assignment from the configured CIDR (stored in SQLite)
- NAT masquerade for outbound internet access
- nftables rules per VM for isolation
Configuration
network:
bridge: nexbr0
cidr: 172.16.0.0/24 # default from 172.16.0.0/12, user-configurable
gateway: 172.16.0.1 # derived from cidr
Nexus validates that the chosen block falls within RFC 1918 space (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16).
Service Discovery
VMs discover services by asking Nexus over the vsock control channel. Data flows over the bridge network.
// request
{"method": "service_lookup", "params": {"name": "git-server", "port": "http"}}
// response
{"result": {"address": "172.16.0.12", "port": 3000}}
Storage
btrfs subvolumes as workspaces with CoW snapshots.
Layout
/var/lib/nexus/
├── workspaces/
│ ├── @base-agent/ ← read-only base subvolume
│ ├── @work-code-1/ ← CoW snapshot of @base-agent
│ └── @portal-openclaw/ ← CoW snapshot of portal base
├── images/
│ └── vmlinux ← kernel
└── state/
└── nexus.db ← SQLite
How It Works
- Base images are btrfs subvolumes marked read-only
- New workspace =
btrfs subvolume snapshot(instant, zero disk cost) - Exposed to Firecracker as a block device via dm/nbd
- Workspace grows only as the agent writes — CoW keeps shared blocks shared
- Destroy =
btrfs subvolume delete
Operations
| Operation | What it does |
|---|---|
workspace create | Snapshot from a base image |
workspace snapshot | Checkpoint a running workspace |
workspace restore | Roll back to a previous snapshot |
workspace destroy | Delete subvolume |
workspace list | List all workspaces with disk usage |
API
HTTP REST on 127.0.0.1:9600. WebSocket upgrade for terminal sessions.
GET /v1/vms # list all VMs
POST /v1/vms # create VM
GET /v1/vms/:id # get VM status
DELETE /v1/vms/:id # destroy VM
POST /v1/vms/:id/start # start VM
POST /v1/vms/:id/stop # stop VM
GET /v1/workspaces # list workspaces
POST /v1/workspaces # create workspace
POST /v1/workspaces/:id/snapshot # checkpoint
POST /v1/workspaces/:id/restore # roll back
DELETE /v1/workspaces/:id # destroy
GET /v1/services # list registered services
GET /v1/routes # list vsock routes
GET /v1/vms/:id/terminal # WebSocket upgrade → ttyd session
CLI
nexusctl is a thin HTTP client:
nexusctl vm list
nexusctl vm create --image workfort/code-agent --name agent-1
nexusctl vm start agent-1
nexusctl attach agent-1
nexusctl workspace snapshot agent-1 --name "before-refactor"
Technology Stack
| Component | Crate / Tool | Version |
|---|---|---|
| Async runtime | tokio | 1.x |
| vsock | tokio-vsock | 0.7.x |
| cgroups | cgroups-rs | 0.5.x |
| Networking rules | nftables (JSON API) | 0.6.x |
| btrfs | libbtrfsutil | latest |
| PTY | nix::pty + AsyncFd | via nix |
| Terminal WS | ttyd protocol (DIY) | — |
| State store | rusqlite | latest |
| Serialization | serde + serde_json | 1.x |
| HTTP API | axum | latest |
| PGP signing | rpgp (pgp crate) | 0.19.x |
Repo Structure
WorkFort/
├── codex/ ← mdBook, documentation and plans
├── cracker-barrel/ ← Go, kernel build tool
└── nexus/ ← Rust workspace (to be created)
├── Cargo.toml ← workspace root
├── nexusd/ ← daemon binary
├── guest-agent/ ← MCP server binary for work VMs
└── nexus-lib/ ← shared types, vsock protocol, storage
Package Repository
A signed pacman mirror at packages.workfort.dev:
# /etc/pacman.conf
[workfort]
Server = https://packages.workfort.dev/$arch
SigLevel = Required DatabaseOptional
Installable on the host via pacman. Packages include nexus, guest-agent, and kernel images built by cracker-barrel.
Alpha Roadmap
Date: 2026-02-18 Status: Draft
Context
These decisions were made during the design phase and inform the roadmap:
- Networking is in scope — outbound-only (tap + bridge + MASQUERADE) for dependency resolution (pip, cargo, npm, etc.)
- No host-side git — generic file/folder passing through btrfs workspaces only
- Git via service VM — Soft Serve in a service VM, agents clone/push over the bridge network
- Remote push via guest-agent — Nexus triggers
git pushon the service VM through MCPrun_command, credentials stay in the service VM - Alpine rootfs — matches cracker-barrel’s known-working configuration
- vsock for all control plane — MCP, PTY, control channel all flow through vsock via Nexus
- XDG Base Directory spec — host-side paths follow XDG: config in
$XDG_CONFIG_HOME/nexus/, state in$XDG_STATE_HOME/nexus/, data in$XDG_DATA_HOME/nexus/, runtime in$XDG_RUNTIME_DIR/nexus/
Steps
These 10 steps are the first phase of work toward the alpha milestone. They do not complete the milestone — additional steps will be planned as these are underway.
Step 1: nexusd — Systemd-Ready Daemon
Create the nexus Rust workspace (nexusd, nexus-lib). Build nexusd with signal handling (SIGTERM/SIGINT), structured logging, and an HTTP server serving /v1/health. Write a systemd user unit file.
Deliverable: systemctl --user start nexus starts the daemon. curl localhost:9600/v1/health returns {"status":"ok"}. SIGTERM triggers graceful shutdown with log output.
Detailed plan: Step 1 Plan
Step 2: nexusctl — CLI Skeleton
Add nexusctl to the workspace. Clap-based CLI with noun-verb grammar. Implement nexusctl status (queries /v1/health) and nexusctl version. Recommended alias nxc. User config at $XDG_CONFIG_HOME/nexusctl/config.yaml. Actionable error messages when the daemon is unreachable.
Deliverable: nexusctl status reports daemon health. When daemon is down:
Error: cannot connect to Nexus daemon at 127.0.0.1:9600
The daemon does not appear to be running.
Start it: systemctl --user start nexus.service
Step 3: SQLite State Store
Add rusqlite to nexus-lib. Initialize the schema on first daemon start. Storage abstraction trait for future backend swaps. Pre-alpha migration strategy: delete DB and recreate.
Deliverable: Daemon creates $XDG_STATE_HOME/nexus/nexus.db with the full schema on startup. nexusctl status reports database status (path, table count, size).
Step 4: VM Records — CRUD Without Firecracker
REST endpoints for VMs (POST/GET/DELETE /v1/vms). CLI commands: vm list, vm create, vm inspect, vm delete. State machine limited to created — no Firecracker processes yet. Auto-assign vsock CID on create.
Deliverable: nexusctl vm create my-vm persists to SQLite. nexusctl vm list renders a table. nexusctl vm inspect my-vm shows full detail. nexusctl vm delete my-vm removes the record.
Step 5: btrfs Workspace Management
Master image import (mark an existing btrfs subvolume as read-only, register in DB). Workspace create (btrfs subvolume snapshot from master). List, inspect, delete. REST endpoints + CLI commands. Use libbtrfsutil — Rust bindings to btrfs-progs’s upstream libbtrfsutil, which supports subvolume create, delete, snapshot, and list via ioctls on directory file descriptors. Common subvolume operations (create, snapshot) work unprivileged — no CAP_SYS_ADMIN required.
Firecracker requires block devices, not directories. The approach: each workspace subvolume contains a raw ext4 image file. mke2fs -d converts a directory tree into an ext4 image without root. btrfs CoW still applies at the host layer — snapshotting a subvolume containing a 1GB image file is instant and zero-cost until writes diverge.
Deliverable: nexusctl image import /path --name base registers an image. nexusctl ws create --base base --name my-ws creates a btrfs snapshot. nexusctl ws list shows workspaces. Verified with btrfs subvolume list.
Step 6: Rootfs Image + Firecracker VM Boot
Build a minimal Alpine rootfs, reusing cracker-barrel’s known-working Alpine configuration. Package it as an ext4 image via mke2fs -d (directory → ext4 without root). Store the image inside a btrfs subvolume and register as a master image. Spawn Firecracker with config (kernel from cracker-barrel, rootfs from master image snapshot, vsock device). Process monitoring — detect exit/crash, update VM state in SQLite. Start/stop lifecycle.
Deliverable: nexusctl vm start my-vm boots an Alpine VM in Firecracker, VM reaches running state. nexusctl vm stop my-vm shuts down cleanly. Unexpected termination updates state to crashed. nexusctl vm logs my-vm shows console output.
Unknowns:
- Firecracker API socket management and cleanup.
- CID allocation strategy for vsock (auto-increment from 3, or pool).
Step 7: guest-agent — vsock Control Channel
Add guest-agent binary to the workspace. Uses tokio-vsock for async vsock I/O on both sides — guest-agent listens on VMADDR_CID_ANY port 100, nexusd connects via the VM’s UDS with CONNECT 100\n. Sends image metadata on connect (parsed from /etc/nexus/image.yaml). Systemd service inside the VM rootfs.
First vsock connection and initial message are ~50-100x slower than subsequent messages on an established connection (validated through cracker-barrel benchmarking). Connections are established eagerly at boot and kept alive.
Deliverable: VM boots. guest-agent starts via systemd inside the VM. nexusd connects on vsock port 100 and receives image metadata. VM state includes readiness status.
Step 8: MCP Tools in guest-agent
JSON-RPC 2.0 server on vsock port 200 inside the guest-agent via tokio-vsock. Implements four tools: file_read, file_write, file_delete, run_command. nexusd maintains a connection pool per VM per port — connections established eagerly at boot and kept alive for the VM’s lifetime. Reconnection is automatic on failure. run_command streams stdout/stderr incrementally over the MCP channel.
Deliverable: From the host, send MCP file_write to a running VM — file appears inside the VM. Send run_command with cat /etc/os-release — returns Alpine release info. Send file_read — returns file contents. Send file_delete — file is removed.
Step 9: Networking — Outbound Access
Bridge creation (nexbr0). Tap device per VM, attached to the bridge. IP assignment from configured CIDR (stored in SQLite). NAT masquerade for outbound internet access. CAP_NET_ADMIN via setcap on the nexusd binary. Per-VM isolation rules via the nftables crate (JSON API — drives nftables via nft -j, requires nftables >= 0.9.3 at runtime). DNS configuration inside VMs.
Deliverable: A booted VM can curl https://example.com successfully. nexusctl vm list shows assigned IP addresses. nexusctl vm inspect shows network configuration.
Unknowns:
setcapinteraction with systemd user services — may needAmbientCapabilities=in the unit file instead.- DNS resolver configuration inside Alpine VMs (static
/etc/resolv.confvs. DHCP).
Step 10: PTY + Terminal Attach
PTY management in guest-agent using nix::pty (already a transitive dependency) wrapped in tokio::io::unix::AsyncFd for async I/O. One PTY per session on vsock ports 300-399. WebSocket endpoint in nexusd (GET /v1/vms/:id/terminal with upgrade) implementing the ttyd protocol — a single-byte-prefix framing scheme that gives xterm.js compatibility for free:
Client → Server: '0'=INPUT '1'=RESIZE(JSON) '2'=PAUSE '3'=RESUME '{'=HANDSHAKE
Server → Client: '0'=OUTPUT '1'=SET_TITLE '2'=SET_PREFS
No Rust library exists for this — implement directly over axum WebSocket (~150 lines). nexusctl attach <vm> connects to the WebSocket and bridges to the local terminal. Terminal resize (SIGWINCH) propagated via the RESIZE message type.
Deliverable: nexusctl attach my-vm opens an interactive shell inside the VM. Typing commands works. Ctrl-C, Ctrl-D, and window resizing behave correctly. Disconnecting leaves the VM running.
After These Steps
These are needed for the alpha milestone but will be planned after the first 10 steps:
- Soft Serve service VM setup and configuration
- Portal VM with agent runtime (OpenClaw integration)
- vsock routing between VMs (portal → Nexus → work)
- Agent resource abstraction (portal + work VM pairs managed as one unit)
nexusctl applyand declarativenexus.yamlconfiguration- Package repository setup (
packages.workfort.dev) - CLI polish: output formatting,
--jq, shell completions,--dry-run
Step 1: nexusd Skeleton — Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: A Rust daemon that starts via systemd, handles signals, logs to journald, and serves a health check endpoint.
Architecture: Single binary (nexusd) in a Cargo workspace with a shared library crate (nexus-lib). Uses tokio for async, axum for HTTP, tracing for structured logging. Config loaded from YAML with sensible defaults.
Tech Stack:
tokio1.x — async runtimeaxum— HTTP serverclap4.x — CLI argument parsingtracing+tracing-subscriber— structured loggingserde+serde_norway— config deserialization (serde_yamlis deprecated/archived)serde_json— API responsesdirs— XDG Base Directory paths
XDG Directory Layout:
- Config:
$XDG_CONFIG_HOME/nexus/nexus.yaml(default:~/.config/nexus/nexus.yaml) - Data (workspaces, images):
$XDG_DATA_HOME/nexus/(default:~/.local/share/nexus/) - State (database):
$XDG_STATE_HOME/nexus/(default:~/.local/state/nexus/) - Runtime (sockets):
$XDG_RUNTIME_DIR/nexus/(default:/run/user/$UID/nexus/)
Task 1: Create Rust Workspace
Files:
- Create:
nexus/Cargo.toml - Create:
nexus/nexusd/Cargo.toml - Create:
nexus/nexusd/src/main.rs - Create:
nexus/nexus-lib/Cargo.toml - Create:
nexus/nexus-lib/src/lib.rs
Step 1: Create directory structure
mkdir -p nexus/nexusd/src nexus/nexus-lib/src
Step 2: Write workspace Cargo.toml
# nexus/Cargo.toml
[workspace]
members = ["nexusd", "nexus-lib"]
resolver = "2"
Step 3: Write nexus-lib Cargo.toml
# nexus/nexus-lib/Cargo.toml
[package]
name = "nexus-lib"
version = "0.1.0"
edition = "2021"
[dependencies]
serde = { version = "1", features = ["derive"] }
serde_norway = "0.9"
dirs = "6"
Step 4: Write nexus-lib stub
#![allow(unused)]
fn main() {
// nexus/nexus-lib/src/lib.rs
pub mod config;
}
#![allow(unused)]
fn main() {
// nexus/nexus-lib/src/config.rs
// Filled in Task 2
}
Step 5: Write nexusd Cargo.toml
# nexus/nexusd/Cargo.toml
[package]
name = "nexusd"
version = "0.1.0"
edition = "2021"
[dependencies]
nexus-lib = { path = "../nexus-lib" }
tokio = { version = "1", features = ["full"] }
axum = "0.8"
clap = { version = "4", features = ["derive"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
[dev-dependencies]
tower = { version = "0.5", features = ["util"] }
Step 6: Write minimal main.rs
// nexus/nexusd/src/main.rs
fn main() {
println!("nexusd");
}
Step 7: Verify build
Run: cd nexus && cargo build
Expected: Compiles with no errors.
Step 8: Commit
git add nexus/
git commit -m "feat: create nexus Rust workspace with nexusd and nexus-lib crates"
Task 2: Configuration Types and Loading
Files:
- Create:
nexus/nexus-lib/src/config.rs - Modify:
nexus/nexus-lib/src/lib.rs
Step 1: Write the failing test
#![allow(unused)]
fn main() {
// nexus/nexus-lib/src/config.rs
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn deserialize_minimal_config() {
let yaml = r#"
api:
listen: "127.0.0.1:8080"
"#;
let config: Config = serde_norway::from_str(yaml).unwrap();
assert_eq!(config.api.listen, "127.0.0.1:8080");
}
#[test]
fn default_config_values() {
let config = Config::default();
assert_eq!(config.api.listen, "127.0.0.1:9600");
}
#[test]
fn partial_yaml_uses_defaults() {
let yaml = "{}";
let config: Config = serde_norway::from_str(yaml).unwrap();
assert_eq!(config.api.listen, "127.0.0.1:9600");
}
#[test]
fn load_nonexistent_file_returns_not_found() {
let result = Config::load("/nonexistent/path/config.yaml");
assert!(result.is_err());
assert!(result.unwrap_err().is_not_found());
}
#[test]
fn load_invalid_yaml_returns_invalid() {
let dir = std::env::temp_dir();
let path = dir.join("nexus-test-bad-config.yaml");
std::fs::write(&path, "{{invalid yaml").unwrap();
let result = Config::load(&path);
assert!(result.is_err());
assert!(!result.unwrap_err().is_not_found());
std::fs::remove_file(&path).ok();
}
}
}
Step 2: Run tests to verify they fail
Run: cd nexus && cargo test -p nexus-lib
Expected: FAIL — Config type does not exist yet.
Step 3: Implement Config
#![allow(unused)]
fn main() {
// nexus/nexus-lib/src/config.rs
use serde::Deserialize;
use std::path::{Path, PathBuf};
#[derive(Debug)]
pub enum ConfigError {
NotFound(std::io::Error),
Invalid(String),
}
impl std::fmt::Display for ConfigError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
ConfigError::NotFound(e) => write!(f, "config file not found: {e}"),
ConfigError::Invalid(e) => write!(f, "invalid config: {e}"),
}
}
}
impl std::error::Error for ConfigError {}
impl ConfigError {
pub fn is_not_found(&self) -> bool {
matches!(self, ConfigError::NotFound(_))
}
}
#[derive(Debug, Clone, Deserialize)]
#[serde(default)]
pub struct Config {
pub api: ApiConfig,
}
#[derive(Debug, Clone, Deserialize)]
#[serde(default)]
pub struct ApiConfig {
pub listen: String,
}
impl Default for Config {
fn default() -> Self {
Config {
api: ApiConfig::default(),
}
}
}
impl Default for ApiConfig {
fn default() -> Self {
ApiConfig {
listen: "127.0.0.1:9600".to_string(),
}
}
}
/// Returns the default config file path: $XDG_CONFIG_HOME/nexus/nexus.yaml
pub fn default_config_path() -> PathBuf {
let config_dir = dirs::config_dir()
.expect("cannot determine XDG_CONFIG_HOME")
.join("nexus");
config_dir.join("nexus.yaml")
}
impl Config {
pub fn load(path: impl AsRef<Path>) -> Result<Self, ConfigError> {
let content = std::fs::read_to_string(path).map_err(|e| {
if e.kind() == std::io::ErrorKind::NotFound {
ConfigError::NotFound(e)
} else {
ConfigError::Invalid(e.to_string())
}
})?;
let config: Config =
serde_norway::from_str(&content).map_err(|e| ConfigError::Invalid(e.to_string()))?;
Ok(config)
}
}
}
Step 4: Run tests to verify they pass
Run: cd nexus && cargo test -p nexus-lib
Expected: All 4 tests PASS.
Step 5: Commit
git add nexus/nexus-lib/
git commit -m "feat(nexus-lib): add Config type with YAML loading and defaults"
Task 3: Health Endpoint
Files:
- Create:
nexus/nexusd/src/api.rs - Modify:
nexus/nexusd/src/main.rs
Step 1: Write the failing test
#![allow(unused)]
fn main() {
// nexus/nexusd/src/api.rs
use axum::{Json, Router, routing::get};
use serde::Serialize;
#[derive(Serialize)]
struct HealthResponse {
status: String,
}
async fn health() -> Json<HealthResponse> {
todo!()
}
pub fn router() -> Router {
Router::new().route("/v1/health", get(health))
}
#[cfg(test)]
mod tests {
use super::*;
use axum::http::StatusCode;
use axum::body::Body;
use axum::http::Request;
use tower::ServiceExt;
#[tokio::test]
async fn health_returns_ok() {
let app = router();
let response = app
.oneshot(Request::get("/v1/health").body(Body::empty()).unwrap())
.await
.unwrap();
assert_eq!(response.status(), StatusCode::OK);
let body = axum::body::to_bytes(response.into_body(), usize::MAX)
.await
.unwrap();
let json: serde_json::Value = serde_json::from_slice(&body).unwrap();
assert_eq!(json["status"], "ok");
}
}
}
Step 2: Run test to verify it fails
Run: cd nexus && cargo test -p nexusd api::tests::health_returns_ok
Expected: FAIL — todo!() panics.
Step 3: Implement the handler
Replace todo!() with the real implementation:
#![allow(unused)]
fn main() {
async fn health() -> Json<HealthResponse> {
Json(HealthResponse {
status: "ok".to_string(),
})
}
}
Step 4: Add module to main.rs
// nexus/nexusd/src/main.rs
mod api;
fn main() {
println!("nexusd");
}
Step 5: Run test to verify it passes
Run: cd nexus && cargo test -p nexusd api::tests::health_returns_ok
Expected: PASS.
Step 6: Commit
git add nexus/nexusd/
git commit -m "feat(nexusd): add GET /v1/health endpoint"
Task 4: Signal Handling and Graceful Shutdown
Files:
- Create:
nexus/nexusd/src/server.rs
This task creates the server startup and shutdown logic. Signal handling is difficult to unit test in isolation, so it will be verified in the integration test (Task 7).
Step 1: Implement the server module
#![allow(unused)]
fn main() {
// nexus/nexusd/src/server.rs
use crate::api;
use nexus_lib::config::Config;
use tokio::net::TcpListener;
use tracing::info;
pub async fn run(config: &Config) -> Result<(), Box<dyn std::error::Error>> {
let app = api::router();
let listener = TcpListener::bind(&config.api.listen).await?;
info!(listen = %config.api.listen, "HTTP API ready");
axum::serve(listener, app)
.with_graceful_shutdown(shutdown_signal())
.await?;
info!("nexusd stopped");
Ok(())
}
async fn shutdown_signal() {
use tokio::signal::unix::{signal, SignalKind};
let mut sigterm = signal(SignalKind::terminate())
.expect("failed to install SIGTERM handler");
let mut sigint = signal(SignalKind::interrupt())
.expect("failed to install SIGINT handler");
tokio::select! {
_ = sigterm.recv() => info!("received SIGTERM, shutting down"),
_ = sigint.recv() => info!("received SIGINT, shutting down"),
}
}
}
Step 2: Commit
git add nexus/nexusd/src/server.rs
git commit -m "feat(nexusd): add server startup with graceful shutdown on SIGTERM/SIGINT"
Task 5: Logging
Files:
- Create:
nexus/nexusd/src/logging.rs
Step 1: Implement logging setup
#![allow(unused)]
fn main() {
// nexus/nexusd/src/logging.rs
use tracing_subscriber::{fmt, EnvFilter};
pub fn init() {
let filter = EnvFilter::try_from_default_env()
.unwrap_or_else(|_| EnvFilter::new("info"));
fmt()
.with_env_filter(filter)
.init();
}
}
Uses RUST_LOG env var when set, defaults to info. systemd captures stdout to journald automatically — no special journald integration needed.
Step 2: Commit
git add nexus/nexusd/src/logging.rs
git commit -m "feat(nexusd): add tracing-based logging with env filter"
Task 6: CLI Arguments and main() Wiring
Files:
- Modify:
nexus/nexusd/src/main.rs
Step 1: Wire everything together
// nexus/nexusd/src/main.rs
use clap::Parser;
use nexus_lib::config::{self, Config};
use tracing::{error, info};
mod api;
mod logging;
mod server;
#[derive(Parser)]
#[command(name = "nexusd", about = "WorkFort Nexus daemon")]
struct Cli {
/// Path to configuration file
/// [default: $XDG_CONFIG_HOME/nexus/nexus.yaml]
#[arg(long)]
config: Option<String>,
}
#[tokio::main]
async fn main() {
let cli = Cli::parse();
logging::init();
let config_path = cli.config
.map(std::path::PathBuf::from)
.unwrap_or_else(config::default_config_path);
let config = match Config::load(&config_path) {
Ok(config) => {
info!(config_path = %config_path.display(), "loaded configuration");
config
}
Err(e) if e.is_not_found() => {
info!("no config file found, using defaults");
Config::default()
}
Err(e) => {
error!(error = %e, path = %config_path.display(), "invalid configuration file");
std::process::exit(1);
}
};
info!("nexusd starting");
if let Err(e) = server::run(&config).await {
error!(error = %e, "daemon failed");
std::process::exit(1);
}
}
Step 2: Verify build
Run: cd nexus && cargo build
Expected: Compiles with no errors.
Step 3: Verify --help
Run: cd nexus && cargo run -p nexusd -- --help
Expected:
WorkFort Nexus daemon
Usage: nexusd [OPTIONS]
Options:
--config <CONFIG> Path to configuration file [default: $XDG_CONFIG_HOME/nexus/nexus.yaml]
-h, --help Print help
Step 4: Quick manual smoke test
Run: cd nexus && cargo run -p nexusd
Expected: Daemon starts, logs “HTTP API ready” with listen=127.0.0.1:9600. In another terminal:
Run: curl -s http://127.0.0.1:9600/v1/health | python -m json.tool
Expected:
{
"status": "ok"
}
Kill the daemon with Ctrl-C. Expected: logs “received SIGINT, shutting down” and “nexusd stopped”, then exits cleanly.
Step 5: Commit
git add nexus/nexusd/
git commit -m "feat(nexusd): wire CLI args, config loading, logging, and server into main"
Task 7: Systemd Unit File
Files:
- Create:
nexus/dist/nexus.service
Step 1: Write the unit file
# nexus/dist/nexus.service
[Unit]
Description=WorkFort Nexus Daemon
[Service]
Type=exec
ExecStart=%h/.cargo/bin/nexusd
Restart=on-failure
RestartSec=5
Environment=RUST_LOG=info
[Install]
WantedBy=default.target
Notes:
Type=execwaits for the binary to launch successfully, catching missing binary errors (better thanType=simple).%hexpands to the user’s home directory. Binary path will be adjusted once packaging is set up.StandardOutput=journalandSyslogIdentifierare omitted as they are systemd defaults.
Step 2: Test with systemd
# Install the unit file
mkdir -p ~/.config/systemd/user
cp nexus/dist/nexus.service ~/.config/systemd/user/
systemctl --user daemon-reload
# First, build and install the binary somewhere on PATH
cd nexus && cargo build --release
cp target/release/nexusd ~/.cargo/bin/
# Start and verify
systemctl --user start nexus
systemctl --user status nexus
curl -s http://127.0.0.1:9600/v1/health
# Check logs
journalctl --user -u nexus -n 20
# Stop
systemctl --user stop nexus
Expected: Service starts, health endpoint responds, logs appear in journald, service stops cleanly on stop.
Step 3: Commit
git add nexus/dist/
git commit -m "feat(nexusd): add systemd user service unit file"
Task 8: Integration Test
Files:
- Create:
nexus/nexusd/tests/daemon.rs - Modify:
nexus/nexusd/Cargo.toml(add dev-dependencies)
Step 1: Add dev-dependencies
Add to nexus/nexusd/Cargo.toml:
Merge into the existing [dev-dependencies] section:
[dev-dependencies]
tower = { version = "0.5", features = ["util"] }
reqwest = { version = "0.13", features = ["json"] }
nix = { version = "0.30", features = ["signal"] }
serde_json = "1"
Step 2: Write the integration test
#![allow(unused)]
fn main() {
// nexus/nexusd/tests/daemon.rs
use std::process::{Command, Child};
use std::time::Duration;
use nix::sys::signal::{self, Signal};
use nix::unistd::Pid;
fn start_daemon() -> Child {
let binary = env!("CARGO_BIN_EXE_nexusd");
Command::new(binary)
.env("RUST_LOG", "info")
.spawn()
.expect("failed to start nexusd")
}
fn stop_daemon(child: &Child) {
signal::kill(Pid::from_raw(child.id() as i32), Signal::SIGTERM)
.expect("failed to send SIGTERM");
}
#[tokio::test]
async fn daemon_starts_serves_health_and_stops() {
let mut child = start_daemon();
// Wait for the daemon to be ready
let client = reqwest::Client::new();
let mut ready = false;
for _ in 0..50 {
tokio::time::sleep(Duration::from_millis(100)).await;
if client.get("http://127.0.0.1:9600/v1/health")
.send()
.await
.is_ok()
{
ready = true;
break;
}
}
assert!(ready, "daemon did not become ready within 5 seconds");
// Verify health endpoint
let resp = client.get("http://127.0.0.1:9600/v1/health")
.send()
.await
.expect("health request failed");
assert_eq!(resp.status(), 200);
let body: serde_json::Value = resp.json().await.unwrap();
assert_eq!(body["status"], "ok");
// Graceful shutdown
stop_daemon(&child);
let status = child.wait().expect("failed to wait on daemon");
assert!(status.success(), "daemon exited with non-zero status: {}", status);
}
}
Step 3: Run the integration test
Run: cd nexus && cargo test -p nexusd --test daemon
Expected: PASS — daemon starts, health endpoint returns {"status":"ok"}, SIGTERM causes clean exit with code 0.
Note: This test uses a hardcoded port (9600). If the port is in use, the test will fail. For now this is acceptable — a single integration test doesn’t need port randomization. Address this when adding more integration tests.
Step 4: Commit
git add nexus/nexusd/
git commit -m "test(nexusd): add integration test for daemon lifecycle"
Verification Checklist
After all tasks are complete, verify the following:
-
cargo buildsucceeds with no warnings -
cargo test --workspace— all tests pass -
cargo run -p nexusd -- --help— prints usage -
curl localhost:9600/v1/health— returns{"status":"ok"} -
systemctl --user start nexus— daemon starts -
journalctl --user -u nexus— shows structured log output -
systemctl --user stop nexus— daemon stops cleanly (exit 0) - Sending
SIGTERMto the process causes graceful shutdown - Sending
SIGINT(Ctrl-C) to the process causes graceful shutdown - No config file present — daemon starts with defaults
- Config file present — daemon uses configured values