Skip to content

workspace bootstrap: lazy + deduplicated dependencies — keep disk light at high worktree counts #791

Description

@chubes4

Problem

The workspace (/var/lib/datamachine/workspace) hit 28G, and the driver is NOT homeboy or source — it's duplicated node_modules/vendor per worktree. Every worktree DMC bootstraps gets its OWN full dependency tree:

  • wp-codebox 2.7G, data-machine-events 1.7G, extrachill-community 1.2G, blocks-everywhere 1.2G — each is ~1GB of npm/composer deps, not source.
  • Worse, worktrees DUPLICATE the parent: extrachill-artist-platform 985M + @funnel-comment-fix 948M = ~1.9G for one plugin. 9 extrachill-events worktrees × 420M each = ~3.8G of near-identical node_modules.

A heavy merge-fleet session (50+ worktrees) accumulates tens of GB. We hand-reclaimed 5.8G from 56 dead worktrees, but that's mopping up after the fact. The structural fix keeps LIVE worktrees light so the platform scales to high worktree counts without disk pressure. (Companion to the cleanup-policy issue #789, which reclaims DEAD worktrees — this keeps live ones small.)

The fix — two levers, lazy-bootstrap is primary

1. Lazy bootstrap (the big win, do first)

Most minion worktrees NEVER run a JS/build step — they edit a PHP ability/file and commit. They do not need node_modules at all. Yet DMC eagerly runs npm install/composer install on every worktree add. That eager bootstrap is the bulk of the waste.

  • Default: do NOT install deps on worktree add. Bootstrap lazily — install on first build/test invocation, or behind an explicit --bootstrap flag.
  • This alone likely cuts workspace size ~80-90% (most worktrees are source-only).
  • Gotcha to respect: worktrees that DO build (JS-block plugins) must get a reliable just-in-time bootstrap so the build doesn't fail confusingly. "Lazy with a clean on-demand trigger," not "never install." The trigger should be obvious (a build/test command auto-bootstraps if deps are missing) and the failure mode legible.

2. Deduplicated dependency store (for the worktrees that DO build)

When deps ARE needed, don't give each worktree its own physical copy:

  • pnpm for npm: global content-addressed store + hardlinks → N worktrees of a repo share ONE physical node_modules copy. Purpose-built for exactly this; transparent working node_modules per worktree.
  • composer already has a global cache; --prefer-dist + cache reduces vendor duplication (smaller problem than node_modules).
  • Gotchas: repos may use package-lock.json (npm) not pnpm-lock.yaml — installing with pnpm regardless can cause subtle resolution diffs; needs care/testing. Branches with DIFFERENT lockfiles must still resolve correctly (pnpm handles this; naive symlink-sharing does NOT — avoid the "one shared node_modules symlinked into all worktrees" approach, it breaks when branches diverge on deps).

Explicitly rejected approaches (document so they're not re-proposed)

  • CoW/reflink dedup: this VPS is ext4 (no reflink support). Out without a filesystem change.
  • Cross-worktree hardlinks of node_modules: dangerous — a build writing into node_modules corrupts the shared copy. pnpm's store is safe (it manages this); ad-hoc hardlinking is not.
  • Single shared node_modules symlinked into all worktrees of a repo: silently breaks when two branches need different dep versions. Fragile.

Design target

"Spin up 100 worktrees, stay light." Lazy bootstrap makes source-only worktrees ~free; pnpm dedups the build-needing ones. Target: workspace stays in single-digit GB regardless of worktree count.

Severity

Medium-high. Recurring disk-pressure risk on a 150GB VPS under heavy fleets (we've had disk incidents before). Lazy bootstrap is the high-leverage, low-risk first move; the dedup store is the follow-on.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions