|
13 | 13 | - `build.zig.zon` — minimal manifest (name, version, fingerprint). |
14 | 14 | - `.zig-cache/cargo-target/<triple>/` — per-target cargo target directory, isolated so parallel cross builds don't clobber each other's fingerprints. |
15 | 15 | - `dist/rg-<triple>[.exe|.wasm]` — install outputs. The published package only ships the wasm one. |
16 | | -- `build.ts` — inlines `dist/rg-wasm32-wasip1.wasm` into `lib/_rg.wasm.mjs` (brotli + z85). Run with `node build.ts`. |
| 16 | +- `build.ts` — inlines `dist/rg-wasm32-wasip1.wasm` into `lib/_rg.wasm.mjs` (brotli + z85) and stamps the wasm hash into `lib/_rg.mjs` for disk-cache invalidation. Run with `node build.ts`. |
17 | 17 |
|
18 | 18 | ### npm package (`lib/`) |
19 | 19 |
|
20 | | -- `lib/index.mjs` — ESM entry. Exports `ripgrep(args, options)` and doubles as the `rg` bin (`bin: { rg: "./lib/index.mjs" }`). Compiles the wasm module lazily once via `WebAssembly.compile(getRgWasmBytes())` and caches the `Promise<WebAssembly.Module>`; instances are still created per-call since they own WASI state. |
| 20 | +- `lib/index.mjs` — ESM entry. Exports `ripgrep(args, options)` and `rgPath`. Delegates wasm loading and WASI runtime creation to `_rg.mjs`. |
| 21 | +- `lib/rg.mjs` — CLI bin entry (`bin: { rg, ripgrep }`). Calls `enableCompileCache()` from `node:module` for faster repeated runs, then forwards `process.argv` to `ripgrep()` and exits with its code. |
| 22 | +- `lib/_rg.mjs` — orchestration layer between the wasm blob and WASI runtimes: |
| 23 | + - `getRgWasmModule()` — memoizes the `Promise<WebAssembly.Module>`. Decompresses the blob via `brotliDecompressSync` and caches the raw `.wasm` bytes to disk (`os.tmpdir()/ripgrep-wasm-<hash>.wasm`) so subsequent runs skip decompression entirely. |
| 24 | + - `createWasiRuntime(options)` — picks the WASI backend: uses `node:wasi` by default on Node (suppresses `ExperimentalWarning`), falls back to the custom shim if `node:wasi` import fails or if custom `stdout`/`stderr` streams without a numeric `fd` are provided. On Bun and Deno the custom shim is always used. |
21 | 25 | - `lib/_wasi.mjs` — minimal WASI preview1 shim. Implements only the ~23 syscalls ripgrep actually imports (`fd_read`, `fd_write`, `fd_readdir`, `fd_seek`, `fd_tell`, `fd_close`, `fd_fdstat_get`, `fd_fdstat_set_flags`, `fd_filestat_get`, `fd_prestat_*`, `path_open`, `path_filestat_get`, `path_readlink`, `args_*`, `environ_*`, `clock_time_get`, `random_get`, `proc_exit`, `sched_yield`, `poll_oneoff` stub). Backed by `node:fs` sync APIs so it works on Node, Bun, and Deno uniformly. `proc_exit` throws a `WASIExit` that `start()` catches to return the exit code. |
22 | | -- `lib/_rg.wasm.mjs` — auto-generated by `build.ts`. Exports `getRgWasmBytes()` which z85-decodes + `brotliDecompressSync`s the blob into a `Uint8Array` on demand. The encoded blob lives inside a `getEncoded = () => "…"` function so V8 lazy-parses the large string literal and only allocates it on first call, not at import time. |
23 | | -- `lib/index.d.mts` — hand-written types, including an `RgFlag` union of known ripgrep long/short flags for autocomplete on `ripgrep` args (still accepts arbitrary strings via `RgArg = RgFlag | (string & {})`). |
24 | | -- `package.json` — `name: "ripgrep"`, `type: module`, `files: ["lib"]`, `bin.rg`, exports only `./lib/index.mjs`. |
| 26 | +- `lib/_rg.wasm.mjs` — auto-generated by `build.ts`. Exports `getCompressedBytes()` which z85-decodes the blob into raw brotli-compressed bytes. The encoded blob lives inside a `getEncoded = () => "…"` function so V8 lazy-parses the large string literal and only allocates it on first call, not at import time. |
| 27 | +- `lib/index.d.mts` — hand-written types. Includes an `RgFlag` union of known ripgrep long/short flags for autocomplete on `ripgrep` args (still accepts arbitrary strings via `RgArg = RgFlag | (string & {})`). Overloads `ripgrep()` for `{ buffer: true }` → `RipgrepBufferedResult` (with `stdout` and `stderr` strings). |
| 28 | +- `package.json` — `name: "ripgrep"`, `type: module`, `files: ["lib"]`, `bin: { rg, ripgrep }` → `./lib/rg.mjs`, exports only `./lib/index.mjs`. |
25 | 29 |
|
26 | 30 | ### Runtime behavior |
27 | 31 |
|
28 | 32 | - Default preopens map `.` → `process.cwd()`; absolute paths passed as args are auto-added as preopens so they work without extra configuration. |
29 | | -- ripgrep's TTY auto-detection doesn't work through WASI preview1 (it always sees a non-TTY), so `ripgrep` auto-injects `--color=ansi` when `process.stdout.isTTY` and the caller hasn't specified a color flag. Detection checks `--color`, `--color=…`, and `--no-color`. |
30 | | -- `nodeWasi: true` (or `RIPGREP_NODE_WASI=1`) swaps the custom shim for Node's built-in `node:wasi`. Same `{ imports, start }` shape either way. Node's version prints an `ExperimentalWarning` on every run — that's why the custom shim is the default. |
| 33 | +- ripgrep's TTY auto-detection doesn't work through WASI preview1 (it always sees a non-TTY), so `ripgrep` auto-injects `--color=ansi` when `process.stdout.isTTY` and the caller hasn't specified a color flag and no custom `stdout` stream is provided. Detection checks `--color`, `--color=…`, and `--no-color`. |
| 34 | +- **WASI backend selection:** On Node, `node:wasi` is used by default (the `ExperimentalWarning` is silently suppressed). On Bun and Deno, the custom shim is used. Override with `nodeWasi: true/false` or `RIPGREP_NODE_WASI=1/0`. If `node:wasi` import fails at runtime, it falls back to the custom shim automatically. When custom `stdout`/`stderr` streams without a numeric `.fd` property are provided, the custom shim is forced regardless. |
| 35 | +- **Wasm disk cache:** The decompressed wasm bytes are cached in `os.tmpdir()/ripgrep-wasm-<hash>.wasm`. On subsequent runs, the cached file is read directly via `readFileSync`, skipping z85 decode + brotli decompression. The hash in the filename changes when the wasm binary is rebuilt. |
| 36 | +- **Buffered mode:** `ripgrep(args, { buffer: true })` captures stdout/stderr into strings returned as `result.stdout` / `result.stderr`. Custom streams take precedence over buffering for their respective fd. |
31 | 37 | - `start()` returns `0` on clean exit or the exit code from `WASIExit`; `node:wasi`'s `start()` returns `undefined` on success, so the adapter coerces with `?? 0`. The public `ripgrep()` function wraps this into a `{ code }` result object. |
32 | 38 |
|
| 39 | +### Testing |
| 40 | + |
| 41 | +- `test/ripgrep.test.mjs` — vitest tests covering the programmatic API (`ripgrep()` with buffered/non-buffered/custom streams), `rgPath` export, and CLI execution via `spawn`. |
| 42 | +- `test/fixture/` — test data: `hello.txt`, `subdir/nested.txt`, `link.txt` → symlink to `hello.txt`. |
| 43 | +- Run: `pnpm vitest run test/ripgrep.test.mjs`. |
| 44 | + |
| 45 | +### Benchmarks |
| 46 | + |
| 47 | +- `bench/cli.mjs` — cold-start comparison: native `rg` binary vs `node lib/rg.mjs` (both via `execFileSync`). Uses [`mitata`](https://github.com/evanwashere/mitata). |
| 48 | +- `bench/api.mjs` — warm comparison: native `exec(rg)` vs `ripgrep()` API (wasm module pre-warmed, measures per-call overhead). Also uses `mitata`. |
| 49 | +- Both benchmark against `vendor/ripgrep/crates` searching for `fn main`. |
| 50 | +- Run: `node bench/cli.mjs` or `node bench/api.mjs`. |
| 51 | + |
33 | 52 | ## Build flavors |
34 | 53 |
|
35 | 54 | There is only one cargo profile: cross-compile with the smallest-size settings. No mode option, no native install, no sub-steps for tweaking. |
36 | 55 |
|
37 | 56 | The cargo profile is `release-lto` (defined in `vendor/ripgrep/Cargo.toml`: fat LTO, `codegen-units=1`, `panic="abort"`), with size tuning layered on via `cargo --config`: `opt-level="z"`, `debug=false`, `strip="symbols"`. |
38 | 57 |
|
| 58 | +For wasm targets, SIMD is enabled via `RUSTFLAGS="-C target-feature=+simd128"` — this unlocks memchr's simd128 vectorized search paths in the wasm binary. |
| 59 | + |
39 | 60 | ## Commands |
40 | 61 |
|
| 62 | +- `pnpm build` — build wasm + inline into JS (equivalent to `zig build wasi && node build.ts`) |
41 | 63 | - `zig build wasi` — build `wasm32-wasip1` → `dist/rg-wasm32-wasip1.wasm` (default step) |
42 | | -- `zig build native` — cross-compile all native targets → `dist/rg-<triple>[.exe]` (not shipped to npm; kept for local use / future flavors) |
| 64 | +- `zig build native` — cross-compile all native targets → `dist/rg-<triple>[.exe]` (not shipped to npm; kept for local use / benchmarks) |
43 | 65 | - `zig build` — same as `zig build wasi` |
44 | | -- `node build.ts` — inline the built wasm into `lib/_rg.wasm.mjs`. Must be re-run any time the wasm changes. |
45 | | -- `node lib/index.mjs <rg args…>` — run the packaged CLI directly from source. Pass `--node-wasi` as the first arg to use Node's built-in WASI. |
| 66 | +- `node build.ts` — inline the built wasm into `lib/_rg.wasm.mjs` and stamp hash into `lib/_rg.mjs`. Must be re-run any time the wasm changes. |
| 67 | +- `node lib/rg.mjs <rg args…>` — run the packaged CLI directly from source. |
| 68 | +- `pnpm test` — run vitest tests |
| 69 | +- `pnpm fmt` — format with oxfmt |
46 | 70 |
|
47 | 71 | ## Cross-compile targets |
48 | 72 |
|
@@ -77,21 +101,34 @@ The install loop uses `.{ .custom = "../dist" }` as the install directory — th |
77 | 101 | ## Gotchas / design notes |
78 | 102 |
|
79 | 103 | - **Wasm is inlined, not a separate file.** `lib/_rg.wasm.mjs` ships the compressed bytes inside an ESM module so the npm package is pure JS — no `.wasm` asset resolution, no postinstall step. The tradeoff is a larger JS file and a one-time decode on first call. |
| 104 | +- **Two-layer decompression.** `_rg.wasm.mjs` exports `getCompressedBytes()` (z85 decode only), and `_rg.mjs` handles brotli decompression + disk caching. This separation keeps the generated blob module simple and the caching logic in hand-written code. |
| 105 | +- **Disk cache in tmpdir.** The decompressed wasm is written to `os.tmpdir()/ripgrep-wasm-<hash>.wasm` on first run. Subsequent runs load the cached file directly via `readFileSync`, skipping z85 decode and brotli decompression. The `<hash>` is the first 16 hex chars of the wasm's SHA-256, stamped by `build.ts`. |
80 | 106 | - **Lazy string literal.** The z85-encoded blob is wrapped in `const getEncoded = () => "…"` specifically so V8 doesn't eagerly parse/allocate it at import time. Don't inline it back into a top-level `const`. |
81 | 107 | - **Wasm module is cached, instances aren't.** `getRgWasmModule()` memoizes the `Promise<WebAssembly.Module>`; each `ripgrep` call still creates a fresh `WebAssembly.Instance` because WASI state (memory, fds) is per-instance. |
| 108 | +- **node:wasi ExperimentalWarning suppression.** `createNodeWasi` temporarily replaces `process.emitWarning` with a no-op while constructing the WASI instance, then restores it. This avoids the warning spam without `--no-warnings`. |
| 109 | +- **Stream-based stdio forces custom shim.** `node:wasi` only accepts numeric fd values for stdout/stderr. If a custom stream without a `.fd` property is passed, `createWasiRuntime` automatically uses the custom shim instead. |
82 | 110 | - **Custom shim grants all rights in `fd_fdstat_get`.** ripgrep only reads, so over-granting `fs_rights_base` / `fs_rights_inheriting` (`~0n`) is harmless and avoids tracking precise capability bits. |
83 | 111 | - **`poll_oneoff` is stubbed to `NOTSUP`.** ripgrep only uses it for stdin-driven modes, which the shim doesn't support anyway (stdin reads return 0 / EOF). |
84 | 112 | - **`path_open` ENOENT handling.** Only ENOENT is swallowed when `O_CREAT` is set; everything else propagates, so bad paths surface as real errors instead of silent creates. |
| 113 | +- **WASM SIMD.** `build.zig` sets `RUSTFLAGS="-C target-feature=+simd128"` for wasm targets, enabling memchr's simd128 vectorized paths for faster search. |
85 | 114 | - **TOML-quoted `--config` values.** `cargo --config` parses values as TOML, so string values must include the quotes: `opt-level="z"` not `opt-level=z`. |
86 | 115 | - **Env-var form doesn't work for hyphenated profile names.** `CARGO_PROFILE_RELEASE_LTO_OPT_LEVEL` is ambiguous — cargo parses it as profile `release` with key `lto_opt_level` and fails. Use `--config` instead. |
87 | 116 | - **Per-target `CARGO_TARGET_DIR`.** Sharing one target dir across triples causes cargo to constantly rebuild dependencies. Keeping them separate under `.zig-cache/cargo-target/<triple>/` gives proper caching. |
88 | 117 | - **Cargo's output path is `$CARGO_TARGET_DIR/<triple>/release-lto/rg[.exe|.wasm]`** — for custom profiles the profile dir equals the profile name. |
89 | 118 | - **ripgrep is still a Rust project.** `rustc` does all the code generation; Zig only acts as the C compiler and linker. A pure `zig build` of ripgrep would require rewriting it. |
| 119 | +- **`enableCompileCache`.** `rg.mjs` calls `node:module`'s `enableCompileCache()` for faster cold starts on Node. The call is wrapped in try/catch since some Node-compatible runtimes don't support it. |
90 | 120 |
|
91 | 121 | ## Dependencies |
92 | 122 |
|
93 | 123 | - `zig` (tested with 0.15.2) |
94 | 124 | - `rustc` + `cargo` (tested with 1.94.1) |
95 | 125 | - `cargo-zigbuild` (`cargo install cargo-zigbuild`) |
96 | 126 | - Rust std for `wasm32-wasip1` (plus the native targets if building those) |
97 | | -- Node.js 18+ / Bun / Deno at runtime (for `WebAssembly.compileStreaming`, web streams, and `node:fs` sync APIs) |
| 127 | +- Node.js 18+ / Bun / Deno at runtime (for `WebAssembly.compile`, `node:fs` sync APIs, and `node:zlib` brotli) |
| 128 | + |
| 129 | +### Dev dependencies |
| 130 | + |
| 131 | +- `vitest` — test runner |
| 132 | +- `mitata` — benchmarking |
| 133 | +- `oxfmt` — code formatter |
| 134 | +- `@vitest/coverage-v8` — coverage |
0 commit comments