Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

WorkFort Codex

WorkFort is an Arch Linux distribution purpose-built as an office for AI agents. Each agent gets its own Firecracker microVM — a private workspace with full system access — managed by the Nexus daemon.

This codex contains design documents, specifications, and plans for the WorkFort project.

Repositories

RepoLanguagePurpose
codexmdBookDocumentation and design plans
cracker-barrelGoFirecracker kernel build tool
nexusRustVM management daemon + guest agent

Architecture Overview

graph TB
    subgraph Host["Host (Arch Linux, btrfs)"]
        Nexus["nexusd"]
        SQLite["SQLite"]
        Bridge["nexbr0 (172.16.0.0/24)"]

        subgraph Portal["Portal VM"]
            Agent["Agent Runtime"]
        end

        subgraph Work["Work VM"]
            GA["guest-agent (MCP server)"]
            Tools["file R/W/D, run command"]
        end

        subgraph Services["Service VMs (later)"]
            Git["Git Server"]
            Tracker["Project Tracker"]
        end
    end

    Nexus -->|vsock| Portal
    Nexus -->|vsock| Work
    Nexus -->|vsock| Services
    Agent -->|MCP via Nexus| GA
    GA --> Tools
    Portal --- Bridge
    Work --- Bridge
    Services --- Bridge
    Nexus --- SQLite

Core Principles

Scientific Method

This project is driven by the scientific method. A plan or hypothesis is the origin of any actions taken.

All work follows this process:

  1. Hypothesis/Plan: Define what you intend to build and why. State assumptions explicitly.
  2. Design: Specify the approach and expected outcomes
  3. Experimentation: Execute through deliberate experiments designed to test assumptions
  4. Observation: Measure and record results objectively
  5. Analysis: Evaluate whether results prove or disprove assumptions
  6. Iteration: Refine hypothesis based on experimental findings

Experimentation

Experimentation is the primary methodology for proving or disproving assumptions.

  • Every assumption must be tested through experimentation
  • Design experiments that can clearly validate or invalidate hypotheses
  • Document expected outcomes before running experiments
  • Record actual results, even if they contradict expectations
  • Failed experiments are valuable — they disprove assumptions and guide better solutions
  • Successful experiments validate assumptions and provide confidence to proceed

Before implementing:

  • What assumptions are you making?
  • How will you test these assumptions?
  • What experiments can prove or disprove them?

After experimenting:

  • What did the experiment reveal?
  • Which assumptions were validated?
  • Which assumptions were invalidated?
  • What new questions emerged?

Non-Negotiables

  • No code without a clear plan
  • No changes without understanding purpose and expected impact
  • No assumptions without testing

Security Through Isolation

WorkFort’s security model prioritizes hardware-enforced isolation over software-level controls:

  • Work VMs execute agent tool calls in sandboxed environments
  • Credentials never enter VMs — all credential operations happen in the host (Nexus)
  • Communication via vsock eliminates network attack surface between Nexus and VMs
  • Firecracker provides hypervisor-level isolation stronger than containers
  • Portal/Work VM separation ensures the agent runtime cannot interfere with its own execution environment

The security boundary is the hypervisor, not kernel namespaces or cgroups.

Provider Independence

WorkFort’s architecture maintains independence from specific AI providers:

  • MCP (JSON-RPC 2.0) is the tool-calling interface — not provider-specific formats
  • Agent runtimes are pluggable — any runtime that speaks MCP works in a portal VM
  • Same guest-agent binary works with any AI provider
  • No provider SDKs in Work VMs — keeps VMs simple and focused

This separation ensures Work VMs remain generic execution environments while portal VMs and agent runtimes handle provider-specific concerns.

Centralized Control

All operations flow through Nexus for observability and policy enforcement:

  • Token usage tracking across all AI conversations
  • Cost attribution per user, project, or VM
  • Rate limiting and quota management
  • Security policies enforced at a single point
  • Audit trail for all VM and credential operations

Nexus is the vsock router — all inter-VM communication flows through it. Distributed access to AI APIs would require instrumenting every VM and prevent centralized control features.

Clear Boundaries

The system is organized into distinct components with well-defined responsibilities:

  • nexusctl: Thin CLI client, uses HTTP to communicate with Nexus
  • nexusd: Control plane, orchestration, vsock routing, state persistence
  • guest-agent: Execution interface within Work VMs (MCP server)
  • nexus-lib: Shared types, vsock protocol, storage abstractions

This separation:

  • Reduces cognitive load during development
  • Minimizes AI assistant context (work on CLI without loading daemon code)
  • Enables focused testing and debugging
  • Allows independent evolution of components

Drives

Overview

Drives are persistent block devices that provide storage for VMs. They serve as both the bootable root filesystem and data storage mechanism within WorkFort’s architecture.

On the host, drives are backed by btrfs subvolumes — enabling instant CoW snapshots, zero-cost cloning, and efficient storage sharing. Firecracker VMs see them as standard block devices, exposed via dm/nbd.

Responsibilities

Bootable Root Filesystem

Drives created from base subvolumes provide the root filesystem for VMs:

  • Work VMs boot from drives containing the execution environment and guest-agent
  • Portal VMs boot from drives containing the agent runtime
  • Service VMs boot from drives containing their application stack

Data Storage

Drives provide persistent storage independent of VM lifecycle:

  • Data persists after VM shutdown
  • Can be reused across multiple VM sessions
  • Support both read-write and read-only modes

Data Movement Between VMs

Drives enable sequential data transfer between VMs:

  • VM completes work and shuts down
  • Drive detaches from terminated VM
  • Drive attaches to new VM at boot
  • New VM accesses data written by previous VM

This sequential access pattern is imposed by Firecracker’s security model — concurrent host/guest access is not supported.

Design

btrfs-Backed Storage

Unlike traditional ext4 image files, WorkFort’s drives are backed by btrfs subvolumes:

OperationTraditionalWorkFort (btrfs)
Create workspaceCopy full image (slow, full disk cost)btrfs subvolume snapshot (instant, zero disk cost)
Storage sharingOverlayFS layersCoW — shared blocks stay shared
CleanupDelete image filebtrfs subvolume delete
CheckpointCopy image or OverlayFS snapshotbtrfs subvolume snapshot (instant)
RollbackRestore from backupSwitch to previous snapshot

Drive Types

Drives are distinguished by purpose:

  • Boot drives: Created from read-only master image snapshots, contain bootable root filesystem with init system, tools, and (for work VMs) guest-agent
  • Data drives: Created empty or populated with project data, used for workspace storage and transfer between VMs

Both are btrfs subvolumes exposed to Firecracker as block devices via dm/nbd.

Access Patterns

Drives follow a sequential access model:

1. Host prepares drive    → Snapshot base subvolume or create empty
2. VM boots with drive    → Drive attached before VM starts
3. VM operates on drive   → Read/write within VM
4. VM shuts down          → Drive detaches
5. Host or next VM uses   → Snapshot, inspect, or attach to new VM

Constraint: No concurrent access. Host and guest cannot access the same drive simultaneously. This is a Firecracker security design decision, not a limitation being addressed.

Multiple Drives Per VM

VMs support multiple drive attachments:

  • One bootable drive (required for VM boot)
  • Additional data drives (workspace, shared datasets, outputs)

Example: Work VM with boot drive + workspace drive containing project source code.

Persistence Model

Drives are persistent resources:

  • Survive VM termination
  • Reusable across multiple VM sessions
  • Managed independently of VM lifecycle
  • Can accumulate data across multiple VM executions

Host Layout

/var/lib/nexus/
  ├── workspaces/
  │   ├── @base-agent/           ← read-only master image
  │   ├── @work-code-1/          ← CoW snapshot of @base-agent
  │   └── @portal-openclaw/      ← CoW snapshot of portal master
  ├── images/
  │   └── vmlinux               ← kernel
  └── state/
      └── nexus.db              ← SQLite

Relationship to Other Components

Drives connect multiple architecture components:

  • Master images → Snapshot into Drives
  • Drives → Exposed via dm/nbd → Attached to VMs at boot
  • VMs → Managed by Nexus
  • guest-agent (in Work VMs) → Operates on files within mounted Drives

Data Model

Overview

The data model defines how Nexus persists and manages state using SQLite. This includes VM configurations, workspace metadata, networking, and operational state.

An abstraction layer over SQLite allows swapping to Postgres or etcd for clustering.

Schema

-- Nexus Database Schema (Pre-Alpha)
--
-- During pre-alpha, schema changes are applied by:
-- 1. Updating this file
-- 2. Deleting the database file
-- 3. Restarting the daemon (schema recreates automatically)

-- Application settings (key-value store)
CREATE TABLE settings (
    key TEXT PRIMARY KEY,
    value TEXT NOT NULL,
    type TEXT NOT NULL CHECK(type IN ('string', 'int', 'bool', 'json'))
);

-- Tags for organizational categorization
CREATE TABLE tags (
    name TEXT PRIMARY KEY,
    description TEXT,
    color TEXT,       -- hex color for UI display (e.g., "#FF5733")
    text_color TEXT   -- hex color for contrast (e.g., "#FFFFFF")
);

-- VMs: Firecracker microVM instances
CREATE TABLE vms (
    id TEXT PRIMARY KEY,
    name TEXT UNIQUE,
    role TEXT NOT NULL CHECK(role IN ('portal', 'work', 'service')),
    state TEXT NOT NULL CHECK(state IN ('created', 'running', 'stopped', 'crashed', 'failed')),
    cid INTEGER NOT NULL UNIQUE,          -- vsock context ID
    vcpu_count INTEGER NOT NULL DEFAULT 1,
    mem_size_mib INTEGER NOT NULL DEFAULT 128,
    config_json TEXT,                      -- full Firecracker config snapshot
    pid INTEGER,                           -- Firecracker process ID (NULL when not running)
    socket_path TEXT,                      -- Firecracker API socket (NULL when not running)
    uds_path TEXT,                         -- vsock UDS base path
    console_log_path TEXT,
    created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
    updated_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
    started_at INTEGER,
    stopped_at INTEGER
);

-- Master images: read-only btrfs subvolumes
CREATE TABLE master_images (
    id TEXT PRIMARY KEY,
    name TEXT NOT NULL UNIQUE,
    subvolume_path TEXT NOT NULL UNIQUE,
    size_bytes INTEGER,
    created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))
);

-- Workspaces: btrfs subvolume snapshots assigned to VMs
CREATE TABLE workspaces (
    id TEXT PRIMARY KEY,
    name TEXT UNIQUE,
    vm_id TEXT,                            -- NULL if unattached
    subvolume_path TEXT NOT NULL UNIQUE,
    master_image_id TEXT,                -- master image this was snapshotted from
    parent_workspace_id TEXT,              -- NULL if snapshotted from base
    size_bytes INTEGER,
    is_root_device INTEGER NOT NULL DEFAULT 0 CHECK(is_root_device IN (0, 1)),
    is_read_only INTEGER NOT NULL DEFAULT 0 CHECK(is_read_only IN (0, 1)),
    attached_at INTEGER,
    detached_at INTEGER,
    created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
    FOREIGN KEY (vm_id) REFERENCES vms(id) ON DELETE SET NULL,
    FOREIGN KEY (master_image_id) REFERENCES master_images(id) ON DELETE RESTRICT,
    FOREIGN KEY (parent_workspace_id) REFERENCES workspaces(id) ON DELETE SET NULL
);

-- VM boot history: tracks each boot/shutdown cycle
CREATE TABLE vm_boot_history (
    id TEXT PRIMARY KEY,
    vm_id TEXT NOT NULL,
    boot_started_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
    boot_stopped_at INTEGER,
    exit_code INTEGER,
    error_message TEXT,
    console_log_path TEXT,
    FOREIGN KEY (vm_id) REFERENCES vms(id) ON DELETE CASCADE
);

-- vsock routes: inter-VM communication mediated by Nexus
CREATE TABLE routes (
    id TEXT PRIMARY KEY,
    source_vm_id TEXT NOT NULL,
    target_vm_id TEXT NOT NULL,
    source_port INTEGER NOT NULL,
    target_port INTEGER NOT NULL,
    created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
    FOREIGN KEY (source_vm_id) REFERENCES vms(id) ON DELETE CASCADE,
    FOREIGN KEY (target_vm_id) REFERENCES vms(id) ON DELETE CASCADE,
    UNIQUE (source_vm_id, source_port)
);

-- vsock services registered by guest agents
CREATE TABLE vsock_services (
    id TEXT PRIMARY KEY,
    vm_id TEXT NOT NULL,
    port INTEGER NOT NULL,
    service_name TEXT NOT NULL,
    state TEXT NOT NULL DEFAULT 'stopped' CHECK(state IN ('listening', 'stopped')),
    created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
    FOREIGN KEY (vm_id) REFERENCES vms(id) ON DELETE CASCADE,
    UNIQUE (vm_id, port)
);

-- Network bridges
CREATE TABLE bridges (
    name TEXT PRIMARY KEY,
    subnet TEXT NOT NULL,      -- CIDR notation (e.g., "172.16.0.0/24")
    gateway TEXT NOT NULL,     -- gateway IP (e.g., "172.16.0.1")
    interface TEXT NOT NULL,   -- host interface name
    created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))
);

-- VM network configuration
CREATE TABLE vm_network (
    vm_id TEXT PRIMARY KEY,
    ip_address TEXT NOT NULL,
    bridge_name TEXT NOT NULL,
    FOREIGN KEY (vm_id) REFERENCES vms(id) ON DELETE CASCADE,
    FOREIGN KEY (bridge_name) REFERENCES bridges(name) ON DELETE RESTRICT
);

-- Firewall rules for VMs (nftables-based)
CREATE TABLE firewall_rules (
    id TEXT PRIMARY KEY,
    vm_id TEXT NOT NULL,
    rule_order INTEGER NOT NULL,
    action TEXT NOT NULL CHECK(action IN ('accept', 'drop', 'reject')),
    protocol TEXT CHECK(protocol IN ('tcp', 'udp', 'icmp', 'all')),
    source_ip TEXT,
    source_port TEXT,
    dest_ip TEXT,
    dest_port TEXT,
    description TEXT,
    created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
    FOREIGN KEY (vm_id) REFERENCES vms(id) ON DELETE CASCADE,
    UNIQUE (vm_id, rule_order)
);

-- Tags (organizational)
CREATE TABLE vm_tags (
    vm_id TEXT NOT NULL,
    tag_name TEXT NOT NULL,
    PRIMARY KEY (vm_id, tag_name),
    FOREIGN KEY (vm_id) REFERENCES vms(id) ON DELETE CASCADE,
    FOREIGN KEY (tag_name) REFERENCES tags(name) ON DELETE CASCADE
);

CREATE TABLE workspace_tags (
    workspace_id TEXT NOT NULL,
    tag_name TEXT NOT NULL,
    PRIMARY KEY (workspace_id, tag_name),
    FOREIGN KEY (workspace_id) REFERENCES workspaces(id) ON DELETE CASCADE,
    FOREIGN KEY (tag_name) REFERENCES tags(name) ON DELETE CASCADE
);

-- Indexes
CREATE INDEX idx_vms_role ON vms(role);
CREATE INDEX idx_vms_state ON vms(state);
CREATE INDEX idx_workspaces_vm_id ON workspaces(vm_id);
CREATE INDEX idx_workspaces_base ON workspaces(master_image_id);
CREATE INDEX idx_vm_boot_history_vm_id ON vm_boot_history(vm_id);
CREATE INDEX idx_vsock_services_vm_id ON vsock_services(vm_id);
CREATE INDEX idx_routes_source ON routes(source_vm_id);
CREATE INDEX idx_routes_target ON routes(target_vm_id);
CREATE INDEX idx_firewall_rules_vm_id ON firewall_rules(vm_id);
CREATE INDEX idx_vm_tags_tag ON vm_tags(tag_name);
CREATE INDEX idx_workspace_tags_tag ON workspace_tags(tag_name);

-- Partial index: workspace can only be attached to one VM at a time
CREATE UNIQUE INDEX idx_workspace_current_attachment
    ON workspaces(vm_id) WHERE vm_id IS NOT NULL AND detached_at IS NULL;

-- Partial index: each VM has only one root device
CREATE UNIQUE INDEX idx_vm_root_device
    ON workspaces(vm_id) WHERE vm_id IS NOT NULL AND detached_at IS NULL AND is_root_device = 1;

Diagrams

Core Entity Relationships

erDiagram
    vms ||--o{ workspaces : "has attached"
    vms ||--o{ vm_boot_history : "boot history"
    vms ||--o{ vsock_services : "runs services"
    vms ||--o| vm_network : "has network"
    master_images ||--o{ workspaces : "snapshot of"
    workspaces ||--o{ workspaces : "derived from"
    bridges ||--o{ vm_network : "provides connectivity"
    vms ||--o{ firewall_rules : "has rules"
    vms ||--o{ routes : "source"
    vms ||--o{ routes : "target"

    vms {
        text id PK
        text name
        text role
        text state
        int cid
        int vcpu_count
        int mem_size_mib
        int pid
    }

    master_images {
        text id PK
        text name
        text subvolume_path
        int size_bytes
    }

    workspaces {
        text id PK
        text name
        text vm_id FK
        text subvolume_path
        text master_image_id FK
        text parent_workspace_id FK
        int is_root_device
        int is_read_only
    }

    routes {
        text id PK
        text source_vm_id FK
        text target_vm_id FK
        int source_port
        int target_port
    }

    vsock_services {
        text id PK
        text vm_id FK
        int port
        text service_name
        text state
    }

    bridges {
        text name PK
        text subnet
        text gateway
    }

    vm_network {
        text vm_id PK
        text ip_address
        text bridge_name FK
    }

    firewall_rules {
        text id PK
        text vm_id FK
        int rule_order
        text action
        text protocol
    }

VM State Machine

stateDiagram-v2
    [*] --> created: POST /v1/vms
    created --> running: POST /v1/vms/:id/start
    created --> failed: Boot failure (automatic)
    created --> [*]: DELETE /v1/vms/:id
    running --> stopped: POST /v1/vms/:id/stop
    running --> crashed: (automatic)
    stopped --> running: POST /v1/vms/:id/start
    stopped --> [*]: DELETE /v1/vms/:id
    crashed --> running: POST /v1/vms/:id/start
    crashed --> [*]: DELETE /v1/vms/:id
    failed --> [*]: DELETE /v1/vms/:id

States

StateDescription
createdVM record exists, Firecracker process not started
runningFirecracker process active, VM booted
stoppedVM gracefully stopped via API, can be restarted
crashedVM terminated unexpectedly, can be restarted
failedVM failed to boot (e.g., bad workspace image)

Valid Transitions

FromToTrigger
createdrunningPOST /v1/vms/:id/start
createdfailedAutomatic (boot failure)
created(deleted)DELETE /v1/vms/:id
runningstoppedPOST /v1/vms/:id/stop
runningcrashedAutomatic (unexpected termination)
stoppedrunningPOST /v1/vms/:id/start
stopped(deleted)DELETE /v1/vms/:id
crashedrunningPOST /v1/vms/:id/start
crashed(deleted)DELETE /v1/vms/:id
failed(deleted)DELETE /v1/vms/:id

Constraints

  • Cannot delete running VM: Must stop first (returns 409 Conflict)
  • Cannot start running VM: Already running (returns 409 Conflict)
  • Cannot manually transition to crashed or failed: Set automatically by Nexus
  • Failed VMs can only be deleted: Boot failure requires recreating with a working workspace

CLI Design

Overview

nexusctl is the command-line interface for managing Firecracker microVMs through the Nexus daemon. It is a thin HTTP client – all state lives in nexusd, and nexusctl is stateless aside from configuration.

The recommended alias is nxc:

nexusctl vm list
nxc vm list          # identical

Command Grammar

Every command follows noun-verb ordering:

nexusctl <resource> <action> [name] [flags]

Resources map directly to Nexus API entities. Actions are consistent across all resources.

Resources

ResourceShortnameDescription
vmvmFirecracker microVM instances
workspacewsbtrfs subvolume snapshots attached to VMs
agentaAgent runtime sessions (portal + work VM pairs)
imageimgMaster images (read-only base subvolumes)
networknetBridge and tap device configuration
routertvsock routes between VMs
servicesvcvsock services registered by guest agents

Shortnames work anywhere the full resource name works:

nexusctl ws list
nexusctl workspace list   # identical

Standard Actions

Every resource supports a consistent set of verbs. Not every verb applies to every resource – attempting an unsupported action returns a clear error.

ActionDescriptionApplies to
listList resources in a tableall
createCreate a new resourcevm, workspace, agent, route
inspectShow detailed resource stateall
deleteRemove a resourceall
startStart a stopped resourcevm, agent
stopStop a running resourcevm, agent
logsStream or tail logsvm, agent

Special Commands

Some commands live outside the resource-action pattern:

CommandDescription
nexusctl attach <vm>Open a terminal session to a VM (WebSocket/ttyd)
nexusctl initInteractive project setup wizard
nexusctl applyApply declarative configuration from nexus.yaml
nexusctl config <subcommand>Manage CLI and daemon configuration
nexusctl completion <shell>Generate shell completions
nexusctl versionPrint version, daemon version, and API version
nexusctl statusQuick system health check (daemon, VMs, network)

Command Map

graph TD
    nexusctl["nexusctl"]

    nexusctl --> vm["vm"]
    nexusctl --> ws["workspace (ws)"]
    nexusctl --> agent["agent (a)"]
    nexusctl --> image["image (img)"]
    nexusctl --> network["network (net)"]
    nexusctl --> route["route (rt)"]
    nexusctl --> service["service (svc)"]
    nexusctl --> config["config"]
    nexusctl --> special["attach / init / apply / status"]

    vm --> vm_actions["list | create | inspect | delete | start | stop | logs"]
    ws --> ws_actions["list | create | inspect | delete | snapshot | restore"]
    agent --> agent_actions["list | create | inspect | delete | start | stop | logs"]
    image --> image_actions["list | inspect | delete | import"]
    network --> network_actions["list | inspect"]
    route --> route_actions["list | create | inspect | delete"]
    service --> service_actions["list | inspect"]
    config --> config_actions["list | edit | set | get"]

Output Formatting

Default: Columnar Tables

All list commands produce clean, aligned tables with no decoration:

$ nexusctl vm list
NAME            ROLE      STATE     VCPU  MEM    IP             AGE
agent-code-1    work      running   2     512M   172.16.0.11    3h
portal-oc-1     portal    running   1     256M   172.16.0.10    3h
git-server      service   stopped   1     128M   172.16.0.12    2d

The --output / -o flag controls format:

FormatFlagDescription
Table-o tableDefault. Human-readable columns.
Wide-o wideTable with additional columns (IDs, paths, timestamps).
JSON-o jsonFull JSON array. Machine-readable.
YAML-o yamlFull YAML. Matches config file format.
Name-o nameOne resource name per line. For piping.
$ nexusctl vm list -o name
agent-code-1
portal-oc-1
git-server

$ nexusctl vm list -o json
[
  {
    "name": "agent-code-1",
    "role": "work",
    "state": "running",
    "vcpu_count": 2,
    "mem_size_mib": 512,
    "ip_address": "172.16.0.11",
    "created_at": "2026-02-18T10:30:00Z"
  }
]

JSON Field Selection

The --json flag selects specific fields and implies JSON output. Combine with --jq for inline filtering.

$ nexusctl vm list --json name,state,ip_address
[
  {"name": "agent-code-1", "state": "running", "ip_address": "172.16.0.11"},
  {"name": "portal-oc-1", "state": "running", "ip_address": "172.16.0.10"}
]

$ nexusctl vm list --json name,state --jq '.[] | select(.state == "running") | .name'
"agent-code-1"
"portal-oc-1"

Layered Detail

Information density increases through progressive commands:

graph LR
    A["list"] -->|more columns| B["-o wide"]
    B -->|single resource| C["inspect"]
    C -->|machine parse| D["inspect -o json"]
$ nexusctl vm list                       # summary table
$ nexusctl vm list -o wide               # adds ID, CID, PID, socket path
$ nexusctl vm inspect agent-code-1       # full detail, formatted
$ nexusctl vm inspect agent-code-1 -o json   # full detail, structured

Color

Semantic colors convey state at a glance:

ColorMeaning
GreenRunning, success, healthy
YellowPending, warning, created
RedError, stopped, crashed, failed
DimMetadata, secondary info, timestamps

Color behavior:

  • TTY detected: Colors enabled by default.
  • Pipe / redirect: Colors disabled automatically.
  • NO_COLOR env set: Colors disabled (per no-color.org).
  • FORCE_COLOR env set: Colors forced on regardless of TTY.

Progress Indicators

  • Discrete steps: Creating workspace... (2 of 4) with step descriptions.
  • Indeterminate waits: Spinner with elapsed time: Waiting for VM to boot... (3.2s).
  • Non-TTY: Progress messages printed as plain lines, no ANSI escape sequences.

Interactive Behavior

TTY Gating

Every interactive prompt has a non-interactive equivalent. If stdin is not a TTY and a required value is missing, the command fails with an explicit error:

$ echo | nexusctl vm delete agent-code-1
Error: refusing to delete VM without confirmation
  Run with --yes to skip confirmation: nexusctl vm delete agent-code-1 --yes

Override Flags

FlagEffect
--yes / -ySkip all confirmation prompts
--interactiveForce interactive mode even without TTY
--no-interactiveForce non-interactive mode even with TTY

Destructive Operations

Commands that destroy data or stop running processes require confirmation:

$ nexusctl vm delete agent-code-1
VM "agent-code-1" is currently running with 1 attached workspace.
Delete this VM? This will stop it and detach all workspaces. [y/N] y
Deleted VM "agent-code-1"

Bypass with -y:

$ nexusctl vm delete agent-code-1 -y
Deleted VM "agent-code-1"

Error Messages

Every error has three parts: what failed, why it failed, and how to fix it.

$ nexusctl vm start agent-code-1
Error: cannot start VM "agent-code-1"
  VM is already running (state: running, PID: 4821)
  To restart, stop it first: nexusctl vm stop agent-code-1

$ nexusctl vm create --image nonexistent/image
Error: cannot create VM
  Image "nonexistent/image" not found
  Available images: nexusctl image list

$ nexusctl vm lis
Error: unknown action "lis" for resource "vm"
  Did you mean: list
  Available actions: list, create, inspect, delete, start, stop, logs

Error Output

  • All errors go to stderr.
  • All normal output goes to stdout.
  • Non-zero exit codes on failure.

Exit Codes

CodeMeaning
0Success
1General error
2Usage error (bad flags, missing args)
3Daemon unreachable
4Resource not found
5Conflict (e.g., VM already running)

Onboarding

First Run Detection

When nexusctl cannot reach nexusd, it detects whether the daemon is installed and guides the user:

$ nexusctl vm list
Error: cannot connect to Nexus daemon at 127.0.0.1:9600
  The daemon does not appear to be running.

  Start it:     systemctl --user start nexus.service
  Enable it:    systemctl --user enable nexus.service
  Check status: systemctl --user status nexus.service

If the daemon is not installed at all:

$ nexusctl vm list
Error: cannot connect to Nexus daemon at 127.0.0.1:9600
  The nexus package does not appear to be installed.

  Install it:   sudo pacman -S nexus

Init Wizard

nexusctl init creates a project-level nexus.yaml through guided prompts:

$ nexusctl init
Nexus project setup

? Base image for work VMs:  workfort/code-agent (default)
? vCPUs per work VM:        2
? Memory per work VM (MiB): 512
? Agent runtime:            openclaw

Created nexus.yaml

Next steps:
  nexusctl apply              Apply this configuration
  nexusctl agent create dev   Create an agent from this config

Next-Command Hints

After successful operations, suggest the logical next step:

$ nexusctl vm create --image workfort/code-agent --name agent-code-1
Created VM "agent-code-1" (state: created)

  Start it: nexusctl vm start agent-code-1

$ nexusctl vm start agent-code-1
Started VM "agent-code-1" (state: running, IP: 172.16.0.11)

  Attach terminal: nexusctl attach agent-code-1
  View logs:       nexusctl vm logs agent-code-1

Hints are suppressed in non-TTY environments and when using -o json or -o name.

Configuration

Format

All configuration is YAML. No exceptions.

Precedence

Configuration resolves in this order (highest wins):

graph TD
    A["CLI flags"] --> B["Environment variables"]
    B --> C["Project config ./nexus.yaml"]
    C --> D["User config ~/.config/nexusctl/config.yaml"]
    D --> E["System config /etc/nexus/nexus.yaml"]
    E --> F["Built-in defaults"]

    style A fill:#2d6,stroke:#333,color:#000
    style F fill:#999,stroke:#333,color:#000
  1. CLI flags--api-url, --output, etc.
  2. Environment variablesNEXUS_API_URL, NEXUS_OUTPUT, etc.
  3. Project config./nexus.yaml in the current directory (or parent search).
  4. User config~/.config/nexusctl/config.yaml (respects XDG_CONFIG_HOME).
  5. System config/etc/nexus/nexus.yaml.
  6. Built-in defaults.

Environment Variables

All env vars use the NEXUS_ prefix. Flag names map to env vars by uppercasing and replacing hyphens with underscores:

FlagEnv Var
--api-urlNEXUS_API_URL
--outputNEXUS_OUTPUT
--no-colorNEXUS_NO_COLOR

User Config

# ~/.config/nexusctl/config.yaml
api_url: "http://127.0.0.1:9600"
output: table
defaults:
  vm:
    vcpu_count: 2
    mem_size_mib: 512
  workspace:
    base_image: workfort/code-agent

Config Commands

$ nexusctl config list
KEY                SOURCE              VALUE
api_url            /etc/nexus/nexus.yaml   http://127.0.0.1:9600
output             default             table
defaults.vm.vcpu   ~/.config/nexusctl/config.yaml  2
defaults.vm.mem    ~/.config/nexusctl/config.yaml  512

$ nexusctl config get api_url
http://127.0.0.1:9600

$ nexusctl config set defaults.vm.vcpu_count 4
Set defaults.vm.vcpu_count = 4 in ~/.config/nexusctl/config.yaml

$ nexusctl config edit
# opens ~/.config/nexusctl/config.yaml in $EDITOR

config list shows every effective value and which source it came from. This eliminates the “where is this setting coming from?” problem.

Declarative Configuration

Project File

A nexus.yaml in the project root declares the desired infrastructure:

# nexus.yaml
version: 1

agents:
  dev:
    image: workfort/code-agent
    vcpu: 2
    mem: 512
    runtime: openclaw
    workspaces:
      - name: code
        base: workfort/code-agent
      - name: data
        base: empty
        size: 2G

network:
  bridge: nexbr0
  cidr: 172.16.0.0/24

Apply with Diff and Dry Run

$ nexusctl apply --dry-run
Comparing nexus.yaml against current state...

  + create agent "dev"
    + create VM "dev-portal" (portal, 1 vCPU, 256M)
    + create VM "dev-work" (work, 2 vCPU, 512M)
    + create workspace "dev-code" from workfort/code-agent
    + create workspace "dev-data" (empty, 2G)
    + create route dev-portal:9000 → dev-work:9000

No changes applied (dry run)

$ nexusctl apply
Comparing nexus.yaml against current state...

  + create agent "dev"
    + create VM "dev-portal" (portal, 1 vCPU, 256M)
    + create VM "dev-work" (work, 2 vCPU, 512M)
    + create workspace "dev-code" from workfort/code-agent
    + create workspace "dev-data" (empty, 2G)
    + create route dev-portal:9000 → dev-work:9000

Apply these changes? [y/N] y
Creating agent "dev"... done (2.1s)

Agent "dev" is running.
  Attach: nexusctl attach dev

Shell Completion

Generation

$ nexusctl completion bash >> ~/.bashrc
$ nexusctl completion zsh > ~/.zfunc/_nexusctl
$ nexusctl completion fish > ~/.config/fish/completions/nexusctl.fish

Dynamic Completions

Completions query the daemon for live state. Tab-completing a VM name fetches the current VM list:

$ nexusctl vm inspect <TAB>
agent-code-1    (work, running)
portal-oc-1     (portal, running)
git-server      (service, stopped)

Zsh completions include descriptions. Resource shortnames complete identically to full names.

Help System

Progressive Disclosure

Default help shows only core commands. Advanced usage is available but not shown upfront.

$ nexusctl --help
nexusctl - manage Firecracker microVMs via Nexus

Usage: nexusctl <resource> <action> [name] [flags]

Core Commands:
  vm          Manage virtual machines
  workspace   Manage btrfs workspaces
  agent       Manage agent sessions
  attach      Open terminal to a VM

Getting Started:
  init        Set up a new project
  apply       Apply declarative configuration
  status      System health check

Configuration:
  config      View and edit settings
  completion  Generate shell completions

Run 'nexusctl <resource> --help' for resource-specific actions.
Run 'nexusctl help --all' for the full command list.

Full help includes every resource, action, and global flag:

$ nexusctl help --all

Resource Help

Each resource shows its actions, 2-3 usage examples, and flag groups:

$ nexusctl vm --help
Manage Firecracker virtual machines

Usage: nexusctl vm <action> [name] [flags]

Actions:
  list      List all VMs
  create    Create a new VM
  inspect   Show VM details
  delete    Remove a VM
  start     Start a stopped VM
  stop      Stop a running VM
  logs      View VM console logs

Examples:
  nexusctl vm list
  nexusctl vm create --image workfort/code-agent --name my-vm
  nexusctl vm create my-vm                          # uses default image
  nexusctl vm inspect my-vm -o json

Basic Flags:
  -o, --output <format>   Output format: table, wide, json, yaml, name
      --json <fields>     Select JSON fields (implies -o json)
      --jq <expr>         Filter JSON output with jq expression

Resource Flags:
      --image <name>      Base image for the VM
      --role <role>       VM role: work, portal, service (default: work)
      --vcpu <n>          vCPU count (default: 1)
      --mem <mib>         Memory in MiB (default: 128)

Advanced Flags:
      --cid <n>           Override vsock context ID (auto-assigned by default)
      --no-start          Create without starting
      --dry-run           Show what would happen without executing

Flag Grouping

Flags in --help are organized into groups: basic, resource-specific, and advanced. Basic flags appear on every command. Advanced flags are things most users never touch.

Smart Defaults

The CLI minimizes required input. Sensible defaults make the common case trivial:

$ nexusctl vm create my-vm

This single command:

  • Uses the default image from config (or workfort/code-agent)
  • Assigns role work
  • Allocates 1 vCPU, 128 MiB memory (or user config defaults)
  • Auto-assigns a vsock CID
  • Auto-assigns an IP address
  • Creates and starts the VM

Explicit flags override any default:

$ nexusctl vm create my-vm --vcpu 4 --mem 1024 --role portal --no-start

Dry Run

All commands that create, modify, or destroy state support --dry-run:

$ nexusctl vm delete agent-code-1 --dry-run
Would delete VM "agent-code-1" (state: running)
  Would stop VM first
  Would detach workspace "agent-code-1-ws" (data preserved)

No changes applied (dry run)

$ nexusctl ws create --base workfort/code-agent --name new-ws --dry-run
Would create workspace "new-ws"
  Source: workfort/code-agent
  Type: btrfs snapshot

No changes applied (dry run)

--dry-run returns exit code 0 on success (the operation would succeed) and non-zero if it would fail.

Composability

Piping

-o name produces one name per line for piping:

$ nexusctl vm list -o name | xargs -I{} nexusctl vm stop {}

$ nexusctl vm list --json name,state --jq '.[] | select(.state == "stopped") | .name' \
  | xargs -I{} nexusctl vm delete {} -y

Scripting

Structured JSON output and consistent exit codes make nexusctl scriptable:

#!/bin/bash
set -e

# Create and wait for VM
nexusctl vm create --image workfort/code-agent --name build-vm -o json > /dev/null
nexusctl vm start build-vm

# Check state
state=$(nexusctl vm inspect build-vm --json state --jq '.state' -r)
if [ "$state" != "running" ]; then
  echo "VM failed to start" >&2
  exit 1
fi

# Do work...
nexusctl attach build-vm --exec "make build"

# Cleanup
nexusctl vm delete build-vm -y

Exit Codes in Conditionals

if nexusctl vm inspect my-vm &>/dev/null; then
  echo "VM exists"
else
  echo "VM not found, creating..."
  nexusctl vm create my-vm
fi

Performance Targets

OperationTarget
CLI startup (parse args, load config)< 50ms
vm list (daemon query + render)< 200ms
vm inspect (daemon query + render)< 100ms
Shell completion (query + return)< 300ms
apply diff computation< 500ms

The CLI does no heavy computation. It parses arguments, reads config, makes one HTTP call, and formats the response. If it feels slow, the daemon is the bottleneck, not the CLI.

Architecture

graph LR
    subgraph CLI["nexusctl"]
        Args["Arg Parser"]
        Config["Config Loader"]
        Client["HTTP Client"]
        Formatter["Output Formatter"]
    end

    subgraph Daemon["nexusd"]
        API["HTTP API :9600"]
    end

    Args --> Client
    Config --> Client
    Client -->|"HTTP/JSON"| API
    API -->|"JSON response"| Formatter
    Formatter -->|"table / json / yaml"| Stdout["stdout"]
    Formatter -->|"errors"| Stderr["stderr"]

nexusctl is a single static binary. No runtime dependencies beyond libc. The binary contains:

  1. Argument parser – Clap with derive macros. Handles subcommands, aliases, shortnames, completions.
  2. Config loader – Reads YAML from all sources, merges by precedence, resolves env vars.
  3. HTTP client – reqwest (blocking). One request per command invocation. WebSocket for attach.
  4. Output formatter – Renders JSON responses into the requested format. Handles color, alignment, truncation.

Startup Sequence

1. Parse args                    (~5ms)
2. Load config (flags + env + files) (~10ms)
3. HTTP request to daemon        (network-bound)
4. Format + print response       (~5ms)

No daemon connection is made for --help, --version, completion, or config commands.

Full Command Reference

nexusctl vm list [--role <role>] [--state <state>]
nexusctl vm create [name] [--image <img>] [--role <role>] [--vcpu <n>] [--mem <mib>]
nexusctl vm inspect <name>
nexusctl vm delete <name> [-y]
nexusctl vm start <name>
nexusctl vm stop <name> [--force]
nexusctl vm logs <name> [--follow] [--tail <n>]

nexusctl workspace list [--vm <name>]
nexusctl workspace create [name] [--base <image>] [--size <size>]
nexusctl workspace inspect <name>
nexusctl workspace delete <name> [-y]
nexusctl workspace snapshot <name> [--tag <label>]
nexusctl workspace restore <name> --snapshot <tag>

nexusctl agent list
nexusctl agent create [name] [--image <img>] [--runtime <rt>]
nexusctl agent inspect <name>
nexusctl agent delete <name> [-y]
nexusctl agent start <name>
nexusctl agent stop <name>
nexusctl agent logs <name> [--follow] [--tail <n>]

nexusctl image list
nexusctl image inspect <name>
nexusctl image delete <name> [-y]
nexusctl image import <path> [--name <name>]

nexusctl network list
nexusctl network inspect <bridge>

nexusctl route list [--vm <name>]
nexusctl route create --from <vm:port> --to <vm:port>
nexusctl route inspect <id>
nexusctl route delete <id> [-y]

nexusctl service list [--vm <name>]
nexusctl service inspect <name>

nexusctl attach <vm> [--exec <command>]
nexusctl init
nexusctl apply [--dry-run] [-y] [-f <file>]
nexusctl status
nexusctl version

nexusctl config list
nexusctl config get <key>
nexusctl config set <key> <value>
nexusctl config edit

nexusctl completion bash|zsh|fish

Global Flags

These flags are available on every command:

-o, --output <format>     Output format: table, wide, json, yaml, name
    --json <fields>       Select JSON fields (comma-separated)
    --jq <expr>           jq filter expression (requires --json or -o json)
    --api-url <url>       Nexus daemon URL (default: http://127.0.0.1:9600)
    --no-color            Disable colored output
    --no-interactive      Disable interactive prompts
-y, --yes                 Skip confirmation prompts
    --dry-run             Preview changes without applying
-v, --verbose             Increase log verbosity (repeatable: -vvv)
-q, --quiet               Suppress non-error output
-h, --help                Show help
    --version             Show version

WorkFort Alpha Design

Date: 2026-02-17 Status: Approved

Goal

One agent producing code in a VM. After this milestone, WorkFort dogfoods itself — the most needed tools to develop WorkFort further are built next, and the self-improvement loop continues until the full vision is realized.

System Topology

graph TB
    systemd["systemd --user"] -->|starts| Nexus["nexusd"]

    Nexus -->|spawns & manages| FC1["Firecracker: Portal VM (CID 3)"]
    Nexus -->|spawns & manages| FC2["Firecracker: Work VM (CID 4)"]

    subgraph Portal["Portal VM"]
        AgentRT["Agent Runtime (LLM client)"]
    end

    subgraph WorkVM["Work VM"]
        GA["guest-agent"]
        MCP["MCP Server (JSON-RPC 2.0)"]
        PTY["PTY Manager"]
    end

    AgentRT -->|"MCP tool calls"| Nexus
    Nexus -->|"vsock route"| GA
    GA --- MCP
    GA --- PTY

    Nexus --- State["SQLite"]
    Nexus --- Storage["btrfs subvolumes"]
    Nexus --- Net["nexbr0 + nftables"]

Nexus is the vsock router — all inter-VM communication flows through it. Firecracker’s vsock only supports host-guest communication, so Nexus mediates on the host via Unix domain sockets with the CONNECT <port> protocol.

Nexus Daemon

nexusd is a single Rust binary, started by systemd at the user level:

systemctl --user start nexus.service

The binary gets CAP_NET_ADMIN via setcap for tap/bridge/nftables operations. btrfs subvolume operations work unprivileged via ioctls.

Responsibilities

  • Firecracker process lifecycle (spawn, monitor, kill)
  • vsock routing between VMs
  • btrfs workspace management (create/snapshot/destroy subvolumes)
  • PTY management per VM, exposed over WebSocket via the ttyd protocol
  • State tracking in SQLite
  • HTTP API on 127.0.0.1:9600
  • Reconciliation loop: reads desired state, compares to running VMs, converges

Configuration

# /etc/nexus/nexus.yaml
storage:
  root: /var/lib/nexus
  workspaces: /var/lib/nexus/workspaces

network:
  bridge: nexbr0
  cidr: 172.16.0.0/24  # user-configurable, must be RFC 1918

api:
  listen: 127.0.0.1:9600

firecracker:
  binary: /usr/bin/firecracker
  kernel: /var/lib/nexus/images/vmlinux

State (SQLite)

vms:        id, name, role, cid, status, config_json, created_at
workspaces: id, vm_id, subvolume_path, base_image, created_at
routes:     id, source_vm_id, target_vm_id, source_port, target_port

An abstraction layer over SQLite allows swapping to Postgres or etcd for clustering.

Guest Agent

A small Rust binary baked into work VM images. It is the VM’s interface to Nexus.

vsock Port Allocation

The guest-agent listens on well-known vsock ports. Each service gets its own independent connection — no application-layer multiplexing is needed because Firecracker’s vsock natively supports multiple concurrent connections via the CONNECT <port> protocol.

PortPurposeDirection
100Control channel (image metadata, health)host → guest
200MCP server (JSON-RPC 2.0)host → guest
300-399PTY sessions (one port per terminal attach)host → guest
500MCP client outbound (portal VMs)guest → host

Direction matters. Host-to-guest connections use the VM’s UDS with CONNECT <port>\n. Guest-to-host connections trigger Firecracker to connect to <uds_path>_<port> on the host, where Nexus must be listening.

The MCP routing chain for a portal VM calling a work VM:

graph LR
    Agent["Agent Runtime"] -->|"vsock CID 2, port 500"| Portal["Portal VM"]
    Portal -->|"uds_path_500"| Nexus
    Nexus -->|"CONNECT 200"| Work["Work VM"]
    Work --> GA["guest-agent (MCP)"]

Connection Pooling

Nexus maintains a connection pool per VM per port. The first vsock connection and initial message are ~50-100x slower than subsequent messages on an established connection (validated through cracker-barrel vsock benchmarking). Connections are established eagerly at boot and kept alive for the VM’s lifetime. Reconnection is automatic on failure.

MCP Tools (Alpha)

ToolDescription
file_readRead file contents at path
file_writeWrite content to path
file_deleteDelete file at path
run_commandExecute command, return stdout/stderr/exit code

Long-running commands stream stdout/stderr incrementally over the MCP channel.

Boot Sequence

  1. VM kernel boots, systemd starts guest-agent.service
  2. Guest-agent listens on vsock ports 100, 200, and 300+ (VMADDR_CID_ANY)
  3. Nexus connects via UDS + CONNECT 100\n (control channel)
  4. Guest-agent sends image metadata (parsed from /etc/nexus/image.yaml)
  5. Nexus registers the VM as ready, sets up routes
  6. Nexus opens additional connections as needed (MCP on port 200, PTY on 300+)

Image Metadata Standard

Each VM image declares its access contract:

# /etc/nexus/image.yaml (inside the VM rootfs)
name: workfort/code-agent
version: 0.1.0
access:
  terminal: vsock-pty    # or: ssh, none
  mcp: vsock             # tool-calling interface
  ports:
    http: 8080
    metrics: 9090

Nexus reads this at boot and routes accordingly. If terminal: vsock-pty, Nexus can expose a ttyd WebSocket session with access control. If terminal: ssh, Nexus proxies SSH over WebSocket. If terminal: none, no terminal access.

A human developer can attach to any VM with terminal access:

nexusctl attach agent-code-1

The full chain:

graph LR
    CLI["Browser / CLI"] -->|"WebSocket (ttyd)"| Nexus
    Nexus -->|"vsock port 300+"| GA["guest-agent"]
    GA -->|PTY| Shell["/bin/bash"]

Agent Runtimes

The MCP interface is agent-runtime agnostic. Any runtime that can speak MCP (directly or via adapter) works in a portal VM.

OpenClaw Integration

OpenClaw does not natively support MCP as a client. Integration is via an OpenClaw tool plugin:

Portal VM
  └─ OpenClaw gateway (ws://127.0.0.1:18789)
     └─ workfort-tools plugin
        └─ Translates OpenClaw tool calls → MCP JSON-RPC
        └─ Sends over vsock → Nexus → Work VM

Each MCP tool on the work VM gets registered as an OpenClaw tool via api.registerTool(). The plugin acts as a thin MCP client.

If OpenClaw ships native MCP client support, the plugin becomes unnecessary.

Portal VM Image Metadata

# portal VM image.yaml
name: workfort/portal-openclaw
version: 0.1.0
runtime: openclaw
access:
  terminal: vsock-pty
  mcp: vsock-client  # this VM consumes MCP, not serves it
  ports:
    gateway: 18789

Networking

Each VM gets a tap device bridged through nexbr0.

graph TB
    Internet -->|NAT| Host["Host (eth0)"]
    Host --- Bridge["nexbr0 (172.16.0.0/24)"]
    Bridge --- tap0["tap0 → Portal VM (172.16.0.10)"]
    Bridge --- tap1["tap1 → Work VM (172.16.0.11)"]
    Bridge --- tap2["tap2 → Service VM (172.16.0.12)"]

Nexus manages

  • Bridge creation/teardown
  • Tap device lifecycle (one per VM)
  • IP assignment from the configured CIDR (stored in SQLite)
  • NAT masquerade for outbound internet access
  • nftables rules per VM for isolation

Configuration

network:
  bridge: nexbr0
  cidr: 172.16.0.0/24  # default from 172.16.0.0/12, user-configurable
  gateway: 172.16.0.1  # derived from cidr

Nexus validates that the chosen block falls within RFC 1918 space (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16).

Service Discovery

VMs discover services by asking Nexus over the vsock control channel. Data flows over the bridge network.

// request
{"method": "service_lookup", "params": {"name": "git-server", "port": "http"}}
// response
{"result": {"address": "172.16.0.12", "port": 3000}}

Storage

btrfs subvolumes as workspaces with CoW snapshots.

Layout

/var/lib/nexus/
  ├── workspaces/
  │   ├── @base-agent/           ← read-only base subvolume
  │   ├── @work-code-1/          ← CoW snapshot of @base-agent
  │   └── @portal-openclaw/      ← CoW snapshot of portal base
  ├── images/
  │   └── vmlinux               ← kernel
  └── state/
      └── nexus.db              ← SQLite

How It Works

  1. Base images are btrfs subvolumes marked read-only
  2. New workspace = btrfs subvolume snapshot (instant, zero disk cost)
  3. Exposed to Firecracker as a block device via dm/nbd
  4. Workspace grows only as the agent writes — CoW keeps shared blocks shared
  5. Destroy = btrfs subvolume delete

Operations

OperationWhat it does
workspace createSnapshot from a base image
workspace snapshotCheckpoint a running workspace
workspace restoreRoll back to a previous snapshot
workspace destroyDelete subvolume
workspace listList all workspaces with disk usage

API

HTTP REST on 127.0.0.1:9600. WebSocket upgrade for terminal sessions.

GET    /v1/vms                      # list all VMs
POST   /v1/vms                      # create VM
GET    /v1/vms/:id                  # get VM status
DELETE /v1/vms/:id                  # destroy VM
POST   /v1/vms/:id/start            # start VM
POST   /v1/vms/:id/stop             # stop VM

GET    /v1/workspaces               # list workspaces
POST   /v1/workspaces               # create workspace
POST   /v1/workspaces/:id/snapshot  # checkpoint
POST   /v1/workspaces/:id/restore   # roll back
DELETE /v1/workspaces/:id           # destroy

GET    /v1/services                 # list registered services
GET    /v1/routes                   # list vsock routes

GET    /v1/vms/:id/terminal         # WebSocket upgrade → ttyd session

CLI

nexusctl is a thin HTTP client:

nexusctl vm list
nexusctl vm create --image workfort/code-agent --name agent-1
nexusctl vm start agent-1
nexusctl attach agent-1
nexusctl workspace snapshot agent-1 --name "before-refactor"

Technology Stack

ComponentCrate / ToolVersion
Async runtimetokio1.x
vsocktokio-vsock0.7.x
cgroupscgroups-rs0.5.x
Networking rulesnftables (JSON API)0.6.x
btrfslibbtrfsutillatest
PTYnix::pty + AsyncFdvia nix
Terminal WSttyd protocol (DIY)
State storerusqlitelatest
Serializationserde + serde_json1.x
HTTP APIaxumlatest
PGP signingrpgp (pgp crate)0.19.x

Repo Structure

WorkFort/
├── codex/              ← mdBook, documentation and plans
├── cracker-barrel/     ← Go, kernel build tool
└── nexus/              ← Rust workspace (to be created)
    ├── Cargo.toml      ← workspace root
    ├── nexusd/         ← daemon binary
    ├── guest-agent/    ← MCP server binary for work VMs
    └── nexus-lib/      ← shared types, vsock protocol, storage

Package Repository

A signed pacman mirror at packages.workfort.dev:

# /etc/pacman.conf
[workfort]
Server = https://packages.workfort.dev/$arch
SigLevel = Required DatabaseOptional

Installable on the host via pacman. Packages include nexus, guest-agent, and kernel images built by cracker-barrel.

Alpha Roadmap

Date: 2026-02-18 Status: Draft

Context

These decisions were made during the design phase and inform the roadmap:

  • Networking is in scope — outbound-only (tap + bridge + MASQUERADE) for dependency resolution (pip, cargo, npm, etc.)
  • No host-side git — generic file/folder passing through btrfs workspaces only
  • Git via service VM — Soft Serve in a service VM, agents clone/push over the bridge network
  • Remote push via guest-agent — Nexus triggers git push on the service VM through MCP run_command, credentials stay in the service VM
  • Alpine rootfs — matches cracker-barrel’s known-working configuration
  • vsock for all control plane — MCP, PTY, control channel all flow through vsock via Nexus
  • XDG Base Directory spec — host-side paths follow XDG: config in $XDG_CONFIG_HOME/nexus/, state in $XDG_STATE_HOME/nexus/, data in $XDG_DATA_HOME/nexus/, runtime in $XDG_RUNTIME_DIR/nexus/

Steps

These 10 steps are the first phase of work toward the alpha milestone. They do not complete the milestone — additional steps will be planned as these are underway.

Step 1: nexusd — Systemd-Ready Daemon

Create the nexus Rust workspace (nexusd, nexus-lib). Build nexusd with signal handling (SIGTERM/SIGINT), structured logging, and an HTTP server serving /v1/health. Write a systemd user unit file.

Deliverable: systemctl --user start nexus starts the daemon. curl localhost:9600/v1/health returns {"status":"ok"}. SIGTERM triggers graceful shutdown with log output.

Detailed plan: Step 1 Plan


Step 2: nexusctl — CLI Skeleton

Add nexusctl to the workspace. Clap-based CLI with noun-verb grammar. Implement nexusctl status (queries /v1/health) and nexusctl version. Recommended alias nxc. User config at $XDG_CONFIG_HOME/nexusctl/config.yaml. Actionable error messages when the daemon is unreachable.

Deliverable: nexusctl status reports daemon health. When daemon is down:

Error: cannot connect to Nexus daemon at 127.0.0.1:9600
  The daemon does not appear to be running.

  Start it: systemctl --user start nexus.service

Step 3: SQLite State Store

Add rusqlite to nexus-lib. Initialize the schema on first daemon start. Storage abstraction trait for future backend swaps. Pre-alpha migration strategy: delete DB and recreate.

Deliverable: Daemon creates $XDG_STATE_HOME/nexus/nexus.db with the full schema on startup. nexusctl status reports database status (path, table count, size).


Step 4: VM Records — CRUD Without Firecracker

REST endpoints for VMs (POST/GET/DELETE /v1/vms). CLI commands: vm list, vm create, vm inspect, vm delete. State machine limited to created — no Firecracker processes yet. Auto-assign vsock CID on create.

Deliverable: nexusctl vm create my-vm persists to SQLite. nexusctl vm list renders a table. nexusctl vm inspect my-vm shows full detail. nexusctl vm delete my-vm removes the record.


Step 5: btrfs Workspace Management

Master image import (mark an existing btrfs subvolume as read-only, register in DB). Workspace create (btrfs subvolume snapshot from master). List, inspect, delete. REST endpoints + CLI commands. Use libbtrfsutil — Rust bindings to btrfs-progs’s upstream libbtrfsutil, which supports subvolume create, delete, snapshot, and list via ioctls on directory file descriptors. Common subvolume operations (create, snapshot) work unprivileged — no CAP_SYS_ADMIN required.

Firecracker requires block devices, not directories. The approach: each workspace subvolume contains a raw ext4 image file. mke2fs -d converts a directory tree into an ext4 image without root. btrfs CoW still applies at the host layer — snapshotting a subvolume containing a 1GB image file is instant and zero-cost until writes diverge.

Deliverable: nexusctl image import /path --name base registers an image. nexusctl ws create --base base --name my-ws creates a btrfs snapshot. nexusctl ws list shows workspaces. Verified with btrfs subvolume list.


Step 6: Rootfs Image + Firecracker VM Boot

Build a minimal Alpine rootfs, reusing cracker-barrel’s known-working Alpine configuration. Package it as an ext4 image via mke2fs -d (directory → ext4 without root). Store the image inside a btrfs subvolume and register as a master image. Spawn Firecracker with config (kernel from cracker-barrel, rootfs from master image snapshot, vsock device). Process monitoring — detect exit/crash, update VM state in SQLite. Start/stop lifecycle.

Deliverable: nexusctl vm start my-vm boots an Alpine VM in Firecracker, VM reaches running state. nexusctl vm stop my-vm shuts down cleanly. Unexpected termination updates state to crashed. nexusctl vm logs my-vm shows console output.

Unknowns:

  • Firecracker API socket management and cleanup.
  • CID allocation strategy for vsock (auto-increment from 3, or pool).

Step 7: guest-agent — vsock Control Channel

Add guest-agent binary to the workspace. Uses tokio-vsock for async vsock I/O on both sides — guest-agent listens on VMADDR_CID_ANY port 100, nexusd connects via the VM’s UDS with CONNECT 100\n. Sends image metadata on connect (parsed from /etc/nexus/image.yaml). Systemd service inside the VM rootfs.

First vsock connection and initial message are ~50-100x slower than subsequent messages on an established connection (validated through cracker-barrel benchmarking). Connections are established eagerly at boot and kept alive.

Deliverable: VM boots. guest-agent starts via systemd inside the VM. nexusd connects on vsock port 100 and receives image metadata. VM state includes readiness status.


Step 8: MCP Tools in guest-agent

JSON-RPC 2.0 server on vsock port 200 inside the guest-agent via tokio-vsock. Implements four tools: file_read, file_write, file_delete, run_command. nexusd maintains a connection pool per VM per port — connections established eagerly at boot and kept alive for the VM’s lifetime. Reconnection is automatic on failure. run_command streams stdout/stderr incrementally over the MCP channel.

Deliverable: From the host, send MCP file_write to a running VM — file appears inside the VM. Send run_command with cat /etc/os-release — returns Alpine release info. Send file_read — returns file contents. Send file_delete — file is removed.


Step 9: Networking — Outbound Access

Bridge creation (nexbr0). Tap device per VM, attached to the bridge. IP assignment from configured CIDR (stored in SQLite). NAT masquerade for outbound internet access. CAP_NET_ADMIN via setcap on the nexusd binary. Per-VM isolation rules via the nftables crate (JSON API — drives nftables via nft -j, requires nftables >= 0.9.3 at runtime). DNS configuration inside VMs.

Deliverable: A booted VM can curl https://example.com successfully. nexusctl vm list shows assigned IP addresses. nexusctl vm inspect shows network configuration.

Unknowns:

  • setcap interaction with systemd user services — may need AmbientCapabilities= in the unit file instead.
  • DNS resolver configuration inside Alpine VMs (static /etc/resolv.conf vs. DHCP).

Step 10: PTY + Terminal Attach

PTY management in guest-agent using nix::pty (already a transitive dependency) wrapped in tokio::io::unix::AsyncFd for async I/O. One PTY per session on vsock ports 300-399. WebSocket endpoint in nexusd (GET /v1/vms/:id/terminal with upgrade) implementing the ttyd protocol — a single-byte-prefix framing scheme that gives xterm.js compatibility for free:

Client → Server:  '0'=INPUT  '1'=RESIZE(JSON)  '2'=PAUSE  '3'=RESUME  '{'=HANDSHAKE
Server → Client:  '0'=OUTPUT  '1'=SET_TITLE  '2'=SET_PREFS

No Rust library exists for this — implement directly over axum WebSocket (~150 lines). nexusctl attach <vm> connects to the WebSocket and bridges to the local terminal. Terminal resize (SIGWINCH) propagated via the RESIZE message type.

Deliverable: nexusctl attach my-vm opens an interactive shell inside the VM. Typing commands works. Ctrl-C, Ctrl-D, and window resizing behave correctly. Disconnecting leaves the VM running.


After These Steps

These are needed for the alpha milestone but will be planned after the first 10 steps:

  • Soft Serve service VM setup and configuration
  • Portal VM with agent runtime (OpenClaw integration)
  • vsock routing between VMs (portal → Nexus → work)
  • Agent resource abstraction (portal + work VM pairs managed as one unit)
  • nexusctl apply and declarative nexus.yaml configuration
  • Package repository setup (packages.workfort.dev)
  • CLI polish: output formatting, --jq, shell completions, --dry-run

Step 1: nexusd Skeleton — Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: A Rust daemon that starts via systemd, handles signals, logs to journald, and serves a health check endpoint.

Architecture: Single binary (nexusd) in a Cargo workspace with a shared library crate (nexus-lib). Uses tokio for async, axum for HTTP, tracing for structured logging. Config loaded from YAML with sensible defaults.

Tech Stack:

  • tokio 1.x — async runtime
  • axum — HTTP server
  • clap 4.x — CLI argument parsing
  • tracing + tracing-subscriber — structured logging
  • serde + serde_norway — config deserialization (serde_yaml is deprecated/archived)
  • serde_json — API responses
  • dirs — XDG Base Directory paths

XDG Directory Layout:

  • Config: $XDG_CONFIG_HOME/nexus/nexus.yaml (default: ~/.config/nexus/nexus.yaml)
  • Data (workspaces, images): $XDG_DATA_HOME/nexus/ (default: ~/.local/share/nexus/)
  • State (database): $XDG_STATE_HOME/nexus/ (default: ~/.local/state/nexus/)
  • Runtime (sockets): $XDG_RUNTIME_DIR/nexus/ (default: /run/user/$UID/nexus/)

Task 1: Create Rust Workspace

Files:

  • Create: nexus/Cargo.toml
  • Create: nexus/nexusd/Cargo.toml
  • Create: nexus/nexusd/src/main.rs
  • Create: nexus/nexus-lib/Cargo.toml
  • Create: nexus/nexus-lib/src/lib.rs

Step 1: Create directory structure

mkdir -p nexus/nexusd/src nexus/nexus-lib/src

Step 2: Write workspace Cargo.toml

# nexus/Cargo.toml
[workspace]
members = ["nexusd", "nexus-lib"]
resolver = "2"

Step 3: Write nexus-lib Cargo.toml

# nexus/nexus-lib/Cargo.toml
[package]
name = "nexus-lib"
version = "0.1.0"
edition = "2021"

[dependencies]
serde = { version = "1", features = ["derive"] }
serde_norway = "0.9"
dirs = "6"

Step 4: Write nexus-lib stub

#![allow(unused)]
fn main() {
// nexus/nexus-lib/src/lib.rs
pub mod config;
}
#![allow(unused)]
fn main() {
// nexus/nexus-lib/src/config.rs
// Filled in Task 2
}

Step 5: Write nexusd Cargo.toml

# nexus/nexusd/Cargo.toml
[package]
name = "nexusd"
version = "0.1.0"
edition = "2021"

[dependencies]
nexus-lib = { path = "../nexus-lib" }
tokio = { version = "1", features = ["full"] }
axum = "0.8"
clap = { version = "4", features = ["derive"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

[dev-dependencies]
tower = { version = "0.5", features = ["util"] }

Step 6: Write minimal main.rs

// nexus/nexusd/src/main.rs
fn main() {
    println!("nexusd");
}

Step 7: Verify build

Run: cd nexus && cargo build

Expected: Compiles with no errors.

Step 8: Commit

git add nexus/
git commit -m "feat: create nexus Rust workspace with nexusd and nexus-lib crates"

Task 2: Configuration Types and Loading

Files:

  • Create: nexus/nexus-lib/src/config.rs
  • Modify: nexus/nexus-lib/src/lib.rs

Step 1: Write the failing test

#![allow(unused)]
fn main() {
// nexus/nexus-lib/src/config.rs

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn deserialize_minimal_config() {
        let yaml = r#"
api:
  listen: "127.0.0.1:8080"
"#;
        let config: Config = serde_norway::from_str(yaml).unwrap();
        assert_eq!(config.api.listen, "127.0.0.1:8080");
    }

    #[test]
    fn default_config_values() {
        let config = Config::default();
        assert_eq!(config.api.listen, "127.0.0.1:9600");
    }

    #[test]
    fn partial_yaml_uses_defaults() {
        let yaml = "{}";
        let config: Config = serde_norway::from_str(yaml).unwrap();
        assert_eq!(config.api.listen, "127.0.0.1:9600");
    }

    #[test]
    fn load_nonexistent_file_returns_not_found() {
        let result = Config::load("/nonexistent/path/config.yaml");
        assert!(result.is_err());
        assert!(result.unwrap_err().is_not_found());
    }

    #[test]
    fn load_invalid_yaml_returns_invalid() {
        let dir = std::env::temp_dir();
        let path = dir.join("nexus-test-bad-config.yaml");
        std::fs::write(&path, "{{invalid yaml").unwrap();
        let result = Config::load(&path);
        assert!(result.is_err());
        assert!(!result.unwrap_err().is_not_found());
        std::fs::remove_file(&path).ok();
    }
}
}

Step 2: Run tests to verify they fail

Run: cd nexus && cargo test -p nexus-lib

Expected: FAIL — Config type does not exist yet.

Step 3: Implement Config

#![allow(unused)]
fn main() {
// nexus/nexus-lib/src/config.rs
use serde::Deserialize;
use std::path::{Path, PathBuf};

#[derive(Debug)]
pub enum ConfigError {
    NotFound(std::io::Error),
    Invalid(String),
}

impl std::fmt::Display for ConfigError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            ConfigError::NotFound(e) => write!(f, "config file not found: {e}"),
            ConfigError::Invalid(e) => write!(f, "invalid config: {e}"),
        }
    }
}

impl std::error::Error for ConfigError {}

impl ConfigError {
    pub fn is_not_found(&self) -> bool {
        matches!(self, ConfigError::NotFound(_))
    }
}

#[derive(Debug, Clone, Deserialize)]
#[serde(default)]
pub struct Config {
    pub api: ApiConfig,
}

#[derive(Debug, Clone, Deserialize)]
#[serde(default)]
pub struct ApiConfig {
    pub listen: String,
}

impl Default for Config {
    fn default() -> Self {
        Config {
            api: ApiConfig::default(),
        }
    }
}

impl Default for ApiConfig {
    fn default() -> Self {
        ApiConfig {
            listen: "127.0.0.1:9600".to_string(),
        }
    }
}

/// Returns the default config file path: $XDG_CONFIG_HOME/nexus/nexus.yaml
pub fn default_config_path() -> PathBuf {
    let config_dir = dirs::config_dir()
        .expect("cannot determine XDG_CONFIG_HOME")
        .join("nexus");
    config_dir.join("nexus.yaml")
}

impl Config {
    pub fn load(path: impl AsRef<Path>) -> Result<Self, ConfigError> {
        let content = std::fs::read_to_string(path).map_err(|e| {
            if e.kind() == std::io::ErrorKind::NotFound {
                ConfigError::NotFound(e)
            } else {
                ConfigError::Invalid(e.to_string())
            }
        })?;
        let config: Config =
            serde_norway::from_str(&content).map_err(|e| ConfigError::Invalid(e.to_string()))?;
        Ok(config)
    }
}
}

Step 4: Run tests to verify they pass

Run: cd nexus && cargo test -p nexus-lib

Expected: All 4 tests PASS.

Step 5: Commit

git add nexus/nexus-lib/
git commit -m "feat(nexus-lib): add Config type with YAML loading and defaults"

Task 3: Health Endpoint

Files:

  • Create: nexus/nexusd/src/api.rs
  • Modify: nexus/nexusd/src/main.rs

Step 1: Write the failing test

#![allow(unused)]
fn main() {
// nexus/nexusd/src/api.rs
use axum::{Json, Router, routing::get};
use serde::Serialize;

#[derive(Serialize)]
struct HealthResponse {
    status: String,
}

async fn health() -> Json<HealthResponse> {
    todo!()
}

pub fn router() -> Router {
    Router::new().route("/v1/health", get(health))
}

#[cfg(test)]
mod tests {
    use super::*;
    use axum::http::StatusCode;
    use axum::body::Body;
    use axum::http::Request;
    use tower::ServiceExt;

    #[tokio::test]
    async fn health_returns_ok() {
        let app = router();

        let response = app
            .oneshot(Request::get("/v1/health").body(Body::empty()).unwrap())
            .await
            .unwrap();

        assert_eq!(response.status(), StatusCode::OK);

        let body = axum::body::to_bytes(response.into_body(), usize::MAX)
            .await
            .unwrap();
        let json: serde_json::Value = serde_json::from_slice(&body).unwrap();

        assert_eq!(json["status"], "ok");
    }
}
}

Step 2: Run test to verify it fails

Run: cd nexus && cargo test -p nexusd api::tests::health_returns_ok

Expected: FAIL — todo!() panics.

Step 3: Implement the handler

Replace todo!() with the real implementation:

#![allow(unused)]
fn main() {
async fn health() -> Json<HealthResponse> {
    Json(HealthResponse {
        status: "ok".to_string(),
    })
}
}

Step 4: Add module to main.rs

// nexus/nexusd/src/main.rs
mod api;

fn main() {
    println!("nexusd");
}

Step 5: Run test to verify it passes

Run: cd nexus && cargo test -p nexusd api::tests::health_returns_ok

Expected: PASS.

Step 6: Commit

git add nexus/nexusd/
git commit -m "feat(nexusd): add GET /v1/health endpoint"

Task 4: Signal Handling and Graceful Shutdown

Files:

  • Create: nexus/nexusd/src/server.rs

This task creates the server startup and shutdown logic. Signal handling is difficult to unit test in isolation, so it will be verified in the integration test (Task 7).

Step 1: Implement the server module

#![allow(unused)]
fn main() {
// nexus/nexusd/src/server.rs
use crate::api;
use nexus_lib::config::Config;
use tokio::net::TcpListener;
use tracing::info;

pub async fn run(config: &Config) -> Result<(), Box<dyn std::error::Error>> {
    let app = api::router();

    let listener = TcpListener::bind(&config.api.listen).await?;
    info!(listen = %config.api.listen, "HTTP API ready");

    axum::serve(listener, app)
        .with_graceful_shutdown(shutdown_signal())
        .await?;

    info!("nexusd stopped");
    Ok(())
}

async fn shutdown_signal() {
    use tokio::signal::unix::{signal, SignalKind};

    let mut sigterm = signal(SignalKind::terminate())
        .expect("failed to install SIGTERM handler");
    let mut sigint = signal(SignalKind::interrupt())
        .expect("failed to install SIGINT handler");

    tokio::select! {
        _ = sigterm.recv() => info!("received SIGTERM, shutting down"),
        _ = sigint.recv() => info!("received SIGINT, shutting down"),
    }
}
}

Step 2: Commit

git add nexus/nexusd/src/server.rs
git commit -m "feat(nexusd): add server startup with graceful shutdown on SIGTERM/SIGINT"

Task 5: Logging

Files:

  • Create: nexus/nexusd/src/logging.rs

Step 1: Implement logging setup

#![allow(unused)]
fn main() {
// nexus/nexusd/src/logging.rs
use tracing_subscriber::{fmt, EnvFilter};

pub fn init() {
    let filter = EnvFilter::try_from_default_env()
        .unwrap_or_else(|_| EnvFilter::new("info"));

    fmt()
        .with_env_filter(filter)
        .init();
}
}

Uses RUST_LOG env var when set, defaults to info. systemd captures stdout to journald automatically — no special journald integration needed.

Step 2: Commit

git add nexus/nexusd/src/logging.rs
git commit -m "feat(nexusd): add tracing-based logging with env filter"

Task 6: CLI Arguments and main() Wiring

Files:

  • Modify: nexus/nexusd/src/main.rs

Step 1: Wire everything together

// nexus/nexusd/src/main.rs
use clap::Parser;
use nexus_lib::config::{self, Config};
use tracing::{error, info};

mod api;
mod logging;
mod server;

#[derive(Parser)]
#[command(name = "nexusd", about = "WorkFort Nexus daemon")]
struct Cli {
    /// Path to configuration file
    /// [default: $XDG_CONFIG_HOME/nexus/nexus.yaml]
    #[arg(long)]
    config: Option<String>,
}

#[tokio::main]
async fn main() {
    let cli = Cli::parse();

    logging::init();

    let config_path = cli.config
        .map(std::path::PathBuf::from)
        .unwrap_or_else(config::default_config_path);

    let config = match Config::load(&config_path) {
        Ok(config) => {
            info!(config_path = %config_path.display(), "loaded configuration");
            config
        }
        Err(e) if e.is_not_found() => {
            info!("no config file found, using defaults");
            Config::default()
        }
        Err(e) => {
            error!(error = %e, path = %config_path.display(), "invalid configuration file");
            std::process::exit(1);
        }
    };

    info!("nexusd starting");

    if let Err(e) = server::run(&config).await {
        error!(error = %e, "daemon failed");
        std::process::exit(1);
    }
}

Step 2: Verify build

Run: cd nexus && cargo build

Expected: Compiles with no errors.

Step 3: Verify --help

Run: cd nexus && cargo run -p nexusd -- --help

Expected:

WorkFort Nexus daemon

Usage: nexusd [OPTIONS]

Options:
      --config <CONFIG>  Path to configuration file [default: $XDG_CONFIG_HOME/nexus/nexus.yaml]
  -h, --help             Print help

Step 4: Quick manual smoke test

Run: cd nexus && cargo run -p nexusd

Expected: Daemon starts, logs “HTTP API ready” with listen=127.0.0.1:9600. In another terminal:

Run: curl -s http://127.0.0.1:9600/v1/health | python -m json.tool

Expected:

{
    "status": "ok"
}

Kill the daemon with Ctrl-C. Expected: logs “received SIGINT, shutting down” and “nexusd stopped”, then exits cleanly.

Step 5: Commit

git add nexus/nexusd/
git commit -m "feat(nexusd): wire CLI args, config loading, logging, and server into main"

Task 7: Systemd Unit File

Files:

  • Create: nexus/dist/nexus.service

Step 1: Write the unit file

# nexus/dist/nexus.service
[Unit]
Description=WorkFort Nexus Daemon

[Service]
Type=exec
ExecStart=%h/.cargo/bin/nexusd
Restart=on-failure
RestartSec=5
Environment=RUST_LOG=info

[Install]
WantedBy=default.target

Notes:

  • Type=exec waits for the binary to launch successfully, catching missing binary errors (better than Type=simple).
  • %h expands to the user’s home directory. Binary path will be adjusted once packaging is set up.
  • StandardOutput=journal and SyslogIdentifier are omitted as they are systemd defaults.

Step 2: Test with systemd

# Install the unit file
mkdir -p ~/.config/systemd/user
cp nexus/dist/nexus.service ~/.config/systemd/user/
systemctl --user daemon-reload

# First, build and install the binary somewhere on PATH
cd nexus && cargo build --release
cp target/release/nexusd ~/.cargo/bin/

# Start and verify
systemctl --user start nexus
systemctl --user status nexus
curl -s http://127.0.0.1:9600/v1/health

# Check logs
journalctl --user -u nexus -n 20

# Stop
systemctl --user stop nexus

Expected: Service starts, health endpoint responds, logs appear in journald, service stops cleanly on stop.

Step 3: Commit

git add nexus/dist/
git commit -m "feat(nexusd): add systemd user service unit file"

Task 8: Integration Test

Files:

  • Create: nexus/nexusd/tests/daemon.rs
  • Modify: nexus/nexusd/Cargo.toml (add dev-dependencies)

Step 1: Add dev-dependencies

Add to nexus/nexusd/Cargo.toml:

Merge into the existing [dev-dependencies] section:

[dev-dependencies]
tower = { version = "0.5", features = ["util"] }
reqwest = { version = "0.13", features = ["json"] }
nix = { version = "0.30", features = ["signal"] }
serde_json = "1"

Step 2: Write the integration test

#![allow(unused)]
fn main() {
// nexus/nexusd/tests/daemon.rs
use std::process::{Command, Child};
use std::time::Duration;
use nix::sys::signal::{self, Signal};
use nix::unistd::Pid;

fn start_daemon() -> Child {
    let binary = env!("CARGO_BIN_EXE_nexusd");
    Command::new(binary)
        .env("RUST_LOG", "info")
        .spawn()
        .expect("failed to start nexusd")
}

fn stop_daemon(child: &Child) {
    signal::kill(Pid::from_raw(child.id() as i32), Signal::SIGTERM)
        .expect("failed to send SIGTERM");
}

#[tokio::test]
async fn daemon_starts_serves_health_and_stops() {
    let mut child = start_daemon();

    // Wait for the daemon to be ready
    let client = reqwest::Client::new();
    let mut ready = false;
    for _ in 0..50 {
        tokio::time::sleep(Duration::from_millis(100)).await;
        if client.get("http://127.0.0.1:9600/v1/health")
            .send()
            .await
            .is_ok()
        {
            ready = true;
            break;
        }
    }
    assert!(ready, "daemon did not become ready within 5 seconds");

    // Verify health endpoint
    let resp = client.get("http://127.0.0.1:9600/v1/health")
        .send()
        .await
        .expect("health request failed");
    assert_eq!(resp.status(), 200);

    let body: serde_json::Value = resp.json().await.unwrap();
    assert_eq!(body["status"], "ok");

    // Graceful shutdown
    stop_daemon(&child);
    let status = child.wait().expect("failed to wait on daemon");
    assert!(status.success(), "daemon exited with non-zero status: {}", status);
}
}

Step 3: Run the integration test

Run: cd nexus && cargo test -p nexusd --test daemon

Expected: PASS — daemon starts, health endpoint returns {"status":"ok"}, SIGTERM causes clean exit with code 0.

Note: This test uses a hardcoded port (9600). If the port is in use, the test will fail. For now this is acceptable — a single integration test doesn’t need port randomization. Address this when adding more integration tests.

Step 4: Commit

git add nexus/nexusd/
git commit -m "test(nexusd): add integration test for daemon lifecycle"

Verification Checklist

After all tasks are complete, verify the following:

  • cargo build succeeds with no warnings
  • cargo test --workspace — all tests pass
  • cargo run -p nexusd -- --help — prints usage
  • curl localhost:9600/v1/health — returns {"status":"ok"}
  • systemctl --user start nexus — daemon starts
  • journalctl --user -u nexus — shows structured log output
  • systemctl --user stop nexus — daemon stops cleanly (exit 0)
  • Sending SIGTERM to the process causes graceful shutdown
  • Sending SIGINT (Ctrl-C) to the process causes graceful shutdown
  • No config file present — daemon starts with defaults
  • Config file present — daemon uses configured values