From 86f38c4fc624f6af9eb415071857715f40956b06 Mon Sep 17 00:00:00 2001 From: ruv Date: Thu, 14 May 2026 08:45:33 -0400 Subject: [PATCH 1/3] fix: first-run breakage (closes #559, #561) + #560 platform-aware diagnosis MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three related fixes — a fresh-clone user hitting any of these would conclude the project doesn't work; #557's "feels like mock" narrative is fed in part by these breakages. ## #559 — `./verify` pointed at removed `v1/` paths The wrapper hard-coded `v1/data/proof` / `v1/src`, but the proof scripts moved to `archive/v1/` long ago. A fresh clone failed before the pipeline could even run. User `Fewmanism` provided the exact diff in the issue. Applied verbatim across four hits (PROOF_DIR, V1_SRC, the Phase 3 scan-message, and the SKIP-state recovery hint). ./verify # now PASS end-to-end ## #561 — firmware README would misflash and point at the wrong provisioner Two real bring-up bugs: 1. Manual flash command put the app at `0x10000`. The partition tables (`partitions_display.csv`, `partitions_4mb.csv`) define `ota_0` at `0x20000`. `0x10000` is the start of `phy_init` data — flashing the app binary there would corrupt the PHY init data and the app would never run. The QEMU section already had the right `0x20000`, so this was an internal contradiction. Both occurrences fixed. Also added `0xf000 ota_data_initial.bin` to the manual flash command — the release bundle ships this binary and without it the bootloader can refuse to boot after a factory wipe. 2. `python scripts/provision.py` referenced the wrong file. There are actually TWO `provision.py` files in the repo (`scripts/` — 275 lines, stale; `firmware/esp32-csi-node/` — 348 lines, has the issue #391 full-replace semantics fix). The canonical one is in the firmware dir. Both README occurrences fixed to point at the canonical path. (The stale `scripts/provision.py` is a separate cleanup; the historical ADRs that reference it are intentionally not touched.) ## #560 — proof hash mismatches on macOS arm64 / Accelerate User `Fewmanism` reports that with the same pinned `numpy 1.26.4` / `scipy 1.14.1` on macOS arm64, the proof's SHA-256 differs from the published expected hash. The proof passes on linux-x86_64 and windows-x86_64 (where wheels ship OpenBLAS); it mismatches on darwin-arm64 (where numpy/scipy use Accelerate.framework). That is not a code bug — Accelerate's FFT and BLAS produce bit-different output on identical IEEE 754 inputs from the same backend, and the proof's bit-exact contract therefore cannot hold across backends. What this commit changes: - `verify.py` now prints a RUNTIME ENVIRONMENT block before the pipeline runs: platform, machine, Python version, numpy BLAS backend. Users on a non-reference backend see the cause up front. - The FAIL message reorders causes: platform BLAS/FFT backend is now the *primary* suspect (not "unlikely"), with a pointer to the printed RUNTIME ENVIRONMENT block. - New `archive/v1/data/proof/REFERENCE_PLATFORMS.md` documents the reference platforms (linux-x86_64 + windows-x86_64 with OpenBLAS), the expected-MISMATCH platforms (darwin-arm64 with Accelerate, any MKL install), and three workable responses for users hitting a non-reference backend (run on a reference platform, generate a local-reference hash, or use tolerance-based comparison — that last one is the roadmap path). This converts #560 from "the proof is broken on my Mac" to "the proof has a documented single-backend contract". ## Verification - `./verify` (Windows x86_64 / OpenBLAS): VERDICT PASS, hash `8c0680d7…51c6` matches expected. RUNTIME ENVIRONMENT block prints numpy BLAS = `scipy-openblas`. - `grep -E '0x10000|scripts/provision\.py' firmware/esp32-csi-node/README.md`: no matches. Co-Authored-By: claude-flow --- archive/v1/data/proof/REFERENCE_PLATFORMS.md | 52 ++++++++++++++++ archive/v1/data/proof/verify.py | 63 ++++++++++++++++++-- firmware/esp32-csi-node/README.md | 23 ++++--- verify | 8 +-- 4 files changed, 129 insertions(+), 17 deletions(-) create mode 100644 archive/v1/data/proof/REFERENCE_PLATFORMS.md diff --git a/archive/v1/data/proof/REFERENCE_PLATFORMS.md b/archive/v1/data/proof/REFERENCE_PLATFORMS.md new file mode 100644 index 0000000000..82dd772474 --- /dev/null +++ b/archive/v1/data/proof/REFERENCE_PLATFORMS.md @@ -0,0 +1,52 @@ +# Reference platforms for `expected_features.sha256` + +The hash in `expected_features.sha256` was generated on a specific BLAS / FFT +backend. Numpy + scipy delegate FFT/linear-algebra to platform-native +libraries, and those libraries produce **bit-different output on identical +IEEE 754 inputs** depending on the backend. This is not a bug in the proof +pipeline — it is a property of the underlying numerical libraries. (See +issue #560.) + +## Platforms where the hash is expected to MATCH + +| Platform | BLAS backend | Status | +|---|---|---| +| `linux-x86_64-gnu` (Python 3.11.x, numpy 1.26.4 from PyPI wheels, scipy 1.14.1) | OpenBLAS | ✅ Reference | +| `windows-x86_64-msvc` (Python 3.11.x / 3.13.x, numpy 1.26.4 from PyPI wheels, scipy 1.14.1) | OpenBLAS | ✅ Reference | + +## Platforms where the hash is **expected to MISMATCH** + +| Platform | BLAS backend | Why | +|---|---|---| +| `darwin-arm64` (macOS arm64, Apple Silicon) | Accelerate.framework | FFT + matrix kernels differ in last-bit positions; the SHA-256 will differ even with pinned `numpy 1.26.4` / `scipy 1.14.1`. | +| Any environment with MKL installed | Intel MKL | Same root cause as Accelerate: different vectorized FFT path. | + +## What to do if you get MISMATCH on a non-reference platform + +The pipeline is still correct on your platform — the *output* is bit-different +because the *backend* is bit-different, not because the proof code has a bug. +Three workable responses: + +1. **Run the proof on a reference platform** (Linux x86_64 or Windows x86_64 + with the PyPI OpenBLAS wheels). This is what CI does. + +2. **Generate a new local-reference hash** for your platform and check it + against the same hash on a teammate's machine with the *same* backend: + + ```bash + # Regenerate from your platform + python archive/v1/data/proof/verify.py --generate-hash + + # Commit the new hash to a side file (do NOT overwrite expected_features.sha256 + # unless you are publishing a new cross-platform reference) + ``` + +3. **Compare numerical output, not the hash.** A relaxed-tolerance comparison + on the feature vectors (e.g. `np.allclose(features, reference, atol=1e-10)`) + will pass across backends. This is on the roadmap (see issue #560). + +## The `verify.py` runtime environment block + +Every run of `verify.py` now prints a `RUNTIME ENVIRONMENT` block before the +pipeline runs. Include that block in any issue report — it identifies the +platform + numpy version + BLAS backend in one place. diff --git a/archive/v1/data/proof/verify.py b/archive/v1/data/proof/verify.py index 00c2cef12d..4d5557cf4c 100644 --- a/archive/v1/data/proof/verify.py +++ b/archive/v1/data/proof/verify.py @@ -116,6 +116,48 @@ def print_source_provenance(): print() +def print_runtime_environment(): + """Print the platform + numpy/scipy BLAS backend. + + The proof pipeline's SHA-256 is sensitive to the BLAS / FFT backend + behind numpy + scipy.fft. Different platforms ship different backends + (OpenBLAS on Linux/Windows wheels, Accelerate.framework on macOS arm64, + MKL when installed) and they produce bit-different output on identical + IEEE 754 inputs. Surfacing the backend up front turns an unexplained + MISMATCH into a one-line diagnosis -- see issue #560. + """ + import platform + print(" RUNTIME ENVIRONMENT:") + print(f" Platform : {platform.platform()}") + print(f" Machine : {platform.machine()}") + print(f" Python : {platform.python_version()} ({platform.python_implementation()})") + + # numpy BLAS / LAPACK backend. + try: + blas_info = np.__config__.blas_ilp64_opt_info # type: ignore[attr-defined] + backend = getattr(blas_info, "get", lambda *_: None)("libraries", None) or "unknown" + except Exception: + # Newer numpy (>= 1.26) reports via show_config(); fall back to a stringified dump. + try: + import io + buf = io.StringIO() + np.show_config(mode="dicts") if hasattr(np, "show_config") else None + # `show_config(mode='dicts')` returns a dict in numpy >= 1.26. + cfg = np.show_config(mode="dicts") if hasattr(np, "show_config") else {} + if isinstance(cfg, dict): + blas = cfg.get("Build Dependencies", {}).get("blas", {}) + backend = blas.get("name", "unknown") + else: + backend = "unknown" + except Exception: + backend = "unknown" + print(f" numpy BLAS : {backend}") + print(" (FFT/BLAS backend affects the hash -- see #560 if MISMATCH on") + print(" macOS arm64 / Accelerate. Reference platforms: linux-x86_64,") + print(" windows-x86_64 with OpenBLAS; see expected_features.sha256.)") + print() + + def load_reference_signal(data_path): """Load the reference CSI signal from JSON. @@ -417,6 +459,7 @@ def main(): # --------------------------------------------------------------- print("[0/4] SOURCE PROVENANCE") print_source_provenance() + print_runtime_environment() # --------------------------------------------------------------- # Step 1: Load and describe reference signal @@ -518,13 +561,23 @@ def main(): print() print(" The pipeline output does NOT match the expected hash.") print() - print(" Possible causes:") - print(" - Numpy/scipy version mismatch (check requirements)") - print(" - Code change in CSI processor that alters numerical output") - print(" - Platform floating-point differences (unlikely for IEEE 754)") + print(" Likely causes, in order of probability:") + print(" 1. Platform BLAS/FFT backend differs from the reference.") + print(" The expected hash was generated on linux-x86_64 +") + print(" windows-x86_64 with OpenBLAS. macOS arm64 ships with") + print(" Accelerate.framework, which produces bit-different FFT") + print(" output on identical inputs (issue #560). Inspect the") + print(" RUNTIME ENVIRONMENT block printed at the top of this run.") + print(" 2. Numpy/scipy version mismatch.") + print(" Install pinned versions: pip install -r archive/v1/requirements-lock.txt") + print(" 3. Real code change in the CSI processor that alters output.") + print(" Investigate the diff against the reference commit.") print() - print(" To update the expected hash after intentional changes:") + print(" To regenerate the expected hash on a NEW reference platform:") print(" python verify.py --generate-hash") + print(" (Only do this if you intend to publish a new reference; the") + print(" single-platform contract of expected_features.sha256 is") + print(" documented at the top of that file.)") print("=" * 72) sys.exit(1) diff --git a/firmware/esp32-csi-node/README.md b/firmware/esp32-csi-node/README.md index a3cfe28d7e..9b05d4b6e5 100644 --- a/firmware/esp32-csi-node/README.md +++ b/firmware/esp32-csi-node/README.md @@ -40,15 +40,21 @@ MSYS_NO_PATHCONV=1 docker run --rm \ ```bash python -m esptool --chip esp32s3 --port COM7 --baud 460800 \ write_flash --flash_mode dio --flash_size 8MB \ - 0x0 firmware/esp32-csi-node/build/bootloader/bootloader.bin \ - 0x8000 firmware/esp32-csi-node/build/partition_table/partition-table.bin \ - 0x10000 firmware/esp32-csi-node/build/esp32-csi-node.bin + 0x0 firmware/esp32-csi-node/build/bootloader/bootloader.bin \ + 0x8000 firmware/esp32-csi-node/build/partition_table/partition-table.bin \ + 0xf000 firmware/esp32-csi-node/build/ota_data_initial.bin \ + 0x20000 firmware/esp32-csi-node/build/esp32-csi-node.bin ``` +> The app slot (`ota_0`) starts at `0x20000` per `partitions_display.csv` / +> `partitions_4mb.csv`. `ota_data_initial.bin` at `0xf000` initialises the OTA +> slot pointer; without it the bootloader can refuse to boot the app after a +> factory wipe. + ### 3. Provision WiFi credentials (no reflash needed) ```bash -python scripts/provision.py --port COM7 \ +python firmware/esp32-csi-node/provision.py --port COM7 \ --ssid "YourSSID" --password "YourPass" --target-ip 192.168.1.20 ``` @@ -254,9 +260,10 @@ Find your serial port: `COM7` on Windows, `/dev/ttyUSB0` on Linux, `/dev/cu.SLAB ```bash python -m esptool --chip esp32s3 --port COM7 --baud 460800 \ write_flash --flash_mode dio --flash_size 8MB \ - 0x0 firmware/esp32-csi-node/build/bootloader/bootloader.bin \ - 0x8000 firmware/esp32-csi-node/build/partition_table/partition-table.bin \ - 0x10000 firmware/esp32-csi-node/build/esp32-csi-node.bin + 0x0 firmware/esp32-csi-node/build/bootloader/bootloader.bin \ + 0x8000 firmware/esp32-csi-node/build/partition_table/partition-table.bin \ + 0xf000 firmware/esp32-csi-node/build/ota_data_initial.bin \ + 0x20000 firmware/esp32-csi-node/build/esp32-csi-node.bin ``` ### Serial Monitor @@ -285,7 +292,7 @@ All settings can be changed at runtime via Non-Volatile Storage (NVS) without re The easiest way to write NVS settings: ```bash -python scripts/provision.py --port COM7 \ +python firmware/esp32-csi-node/provision.py --port COM7 \ --ssid "MyWiFi" \ --password "MyPassword" \ --target-ip 192.168.1.20 diff --git a/verify b/verify index dd7eab57d6..02c50115f8 100755 --- a/verify +++ b/verify @@ -19,9 +19,9 @@ set -euo pipefail SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -PROOF_DIR="${SCRIPT_DIR}/v1/data/proof" +PROOF_DIR="${SCRIPT_DIR}/archive/v1/data/proof" VERIFY_PY="${PROOF_DIR}/verify.py" -V1_SRC="${SCRIPT_DIR}/v1/src" +V1_SRC="${SCRIPT_DIR}/archive/v1/src" # Colors (disabled if not a terminal) if [ -t 1 ]; then @@ -136,7 +136,7 @@ echo "" echo -e "${CYAN}[PHASE 3] PRODUCTION CODE INTEGRITY SCAN${RESET}" echo "" echo " Scanning ${V1_SRC} for np.random.rand / np.random.randn calls..." -echo " (Excluding v1/src/testing/ -- test helpers are allowed to use random.)" +echo " (Excluding archive/v1/src/testing/ -- test helpers are allowed to use random.)" echo "" MOCK_FINDINGS=0 @@ -204,7 +204,7 @@ elif [ $PIPELINE_EXIT -eq 2 ]; then echo -e " ${YELLOW}${BOLD}RESULT: SKIP${RESET}" echo "" echo " No expected hash file to compare against." - echo " Run: python v1/data/proof/verify.py --generate-hash" + echo " Run: python archive/v1/data/proof/verify.py --generate-hash" echo "" echo -e "${BOLD}======================================================================${RESET}" exit 2 From f396c447515cf7f47e31b1ec9de138b37be9f256 Mon Sep 17 00:00:00 2001 From: ruv Date: Thu, 14 May 2026 11:45:03 -0400 Subject: [PATCH 2/3] ci(verify-pipeline): fix stale v1/ working-directory + SECRET_KEY env MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Same drift as #559 but in CI: the workflow ran `working-directory: v1` on the two verify steps, but the Python codebase moved to `archive/v1/` ages ago. The job failed with: An error occurred trying to start process '/usr/bin/bash' with working directory '/home/runner/work/RuView/RuView/v1'. No such file or directory Fixed both occurrences (working-directory: v1 -> working-directory: archive/v1). Also added `SECRET_KEY` env var to both steps — `verify.py` transitively imports `src.app` -> `src.config.settings` (since PR #547 introduced pydantic-settings with a required `secret_key` field). The value is never used for any auth path in the proof pipeline; it just needs to satisfy the import chain. Same env-var workaround used locally to make `./verify` pass. After this commit, "Verify Pipeline Determinism (3.11)" should go green on this PR. Co-Authored-By: claude-flow --- .github/workflows/verify-pipeline.yml | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/.github/workflows/verify-pipeline.yml b/.github/workflows/verify-pipeline.yml index 0ba4dbf7be..2bb0d2848c 100644 --- a/.github/workflows/verify-pipeline.yml +++ b/.github/workflows/verify-pipeline.yml @@ -57,7 +57,13 @@ jobs: " - name: Run pipeline verification - working-directory: v1 + working-directory: archive/v1 + env: + # verify.py transitively imports src.app -> src.config.settings, which + # uses pydantic-settings with a required `secret_key` field. The proof + # only needs the import chain to resolve; the value is never used for + # any auth path in the proof pipeline. + SECRET_KEY: ci-proof-replay-only-not-a-real-secret run: | echo "=== Running pipeline verification ===" python data/proof/verify.py @@ -65,7 +71,9 @@ jobs: echo "Pipeline verification PASSED." - name: Run verification twice to confirm determinism - working-directory: v1 + working-directory: archive/v1 + env: + SECRET_KEY: ci-proof-replay-only-not-a-real-secret run: | echo "=== Second run for determinism confirmation ===" python data/proof/verify.py From 84638314a4a264282dae67670a7877c663cea465 Mon Sep 17 00:00:00 2001 From: ruv Date: Thu, 14 May 2026 13:38:58 -0400 Subject: [PATCH 3/3] fix(docker): bump rust 1.85 -> 1.90 + enforce LF on shell scripts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two real bugs found while pushing the v0.8.0 image to Docker Hub: ## Rust 1.85 -> 1.90 `hnsw_rs 0.3.4` (transitive via wifi-densepose-ruvector -> ruvector-attn-mincut -> hnsw_rs) calls `nbp.is_multiple_of(500_000)`. `is_multiple_of` on unsigned integers was stabilised in Rust 1.87 (rust-lang/rust#128101 — RFC 3565). On 1.85 the compile fails with: error[E0658]: use of unstable library feature `unsigned_is_multiple_of` --> hnsw_rs-0.3.4/src/hnswio.rs:736:20 Pinned to 1.90 for reproducibility — a comment in the Dockerfile flags the 1.87 MSRV requirement so a future downgrade can't quietly break it. ## .gitattributes — force LF on shell scripts + Dockerfile Without a `.gitattributes`, git's default `core.autocrlf=true` on Windows converts shell scripts to CRLF on checkout. `COPY`ing `docker/docker-entrypoint.sh` into a Linux image then preserves CRLF. The shebang line `#!/bin/sh\r\n` causes `exec /app/docker-entrypoint.sh` to fail with: exec /app/docker-entrypoint.sh: no such file or directory The kernel tries to look up an interpreter literally named `/bin/sh\r`, which doesn't exist. Container exits immediately. The first v0.8.0 image push (digest sha256:7957…44fa) suffered exactly this; the re-pushed image (digest sha256:e9f4…d38315) was built on a renormalised tree. The .gitattributes rule forces LF for: - *.sh / *.bash - Dockerfile* - docker/* (covers docker-entrypoint.sh + docker-compose.yml) - scripts/* - `verify` (the proof-replay wrapper — same root cause as if it had landed CRLF in someone's clone) Binary file globs (*.bin, *.wasm, *.rvf, *.pcap, etc.) explicitly marked binary so text-normalisation never touches them. ## CHANGELOG — drop the false `--introspection` flag claim The CHANGELOG entry for v0.8.0 said the introspection endpoints were "off by default, enabled via `--introspection`". That isn't true: `sensing-server --help` has no such flag. The routes are mounted unconditionally in `main.rs`. The per-frame `update()` p99 of 0.041 ms (~24× under D4's 1 ms budget) makes always-on viable; the "off by default" framing came from an earlier draft of ADR-099 that the implementation outgrew. Corrected. ## Verification End-to-end smoke test of the pushed image: docker run -d -p 13000:3000 -e CSI_SOURCE=simulated -e SENSING_BIND_ADDR=0.0.0.0 ruvnet/wifi-densepose:v0.8.0 /health -> {"status":"ok","source":"simulated",...} /api/v1/info -> {"backend":"rust","features":{"ruvector":true,"signal_processing":true,...}} /api/v1/introspection/snapshot -> {"regime":"unknown", "regime_changed":false,"top_k_similarity":[]} (ADR-099 shape exact) /ui/observatory.html -> HTTP 200, 15 KB Published manifest digests: ruvnet/wifi-densepose:v0.8.0 -> sha256:e9f4c5af…d38315 ruvnet/wifi-densepose:latest -> sha256:e9f4c5af…d38315 Co-Authored-By: claude-flow --- .gitattributes | 35 +++++++++++++++++++++++++++++++++++ CHANGELOG.md | 3 ++- docker/Dockerfile.rust | 6 +++++- 3 files changed, 42 insertions(+), 2 deletions(-) create mode 100644 .gitattributes diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000000..ecfa67fc73 --- /dev/null +++ b/.gitattributes @@ -0,0 +1,35 @@ +# Line-ending policy. +# +# `* text=auto` lets git normalise text files to LF in the repository and convert +# to the platform's native line endings on checkout. That default is fine for +# .md / .rs / .toml / .py — broken for shell scripts and Dockerfiles, where +# CRLF on the shebang line causes Linux exec to look for an interpreter named +# `/bin/sh\r` (or similar) and fail with "no such file or directory". +# +# Force LF for anything that ends up executed inside a Linux container or a +# POSIX shell. This is what prevented the v0.8.0 image from booting at first +# build until the entrypoint was renormalised. +* text=auto +*.sh text eol=lf +*.bash text eol=lf +verify text eol=lf +Dockerfile* text eol=lf +docker/* text eol=lf +scripts/* text eol=lf + +# Binary blobs that should never be touched by text-normalisation. +*.bin binary +*.png binary +*.jpg binary +*.jpeg binary +*.gif binary +*.ico binary +*.zip binary +*.tar binary +*.tgz binary +*.gz binary +*.wasm binary +*.rvf binary +*.task binary +*.csi.jsonl binary +*.pcap binary diff --git a/CHANGELOG.md b/CHANGELOG.md index 197b6f7c01..7fcc3fc3e8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,7 +14,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 regime classification) and `temporal-compare` (DTW pattern matching) as a **parallel tap** alongside RuView's existing event pipeline — no replacement, no behaviour change to the existing `/ws/sensing` fan-out or `wifi-densepose-signal` - DSP. Two new endpoints (off by default, enabled via `--introspection`): + DSP. Two new endpoints (always mounted — the tap is cheap enough at 0.041 ms p99 + per-frame `update()` to ship hot by default): - `GET /ws/introspection` — newline-delimited JSON snapshots streamed at the CSI frame rate. Each snapshot carries `frame_count`, `regime` (Idle / Periodic / Transient / Chaotic / Unknown), `lyapunov_exponent`, `attractor_dim`, diff --git a/docker/Dockerfile.rust b/docker/Dockerfile.rust index 018e8dadda..cb04626562 100644 --- a/docker/Dockerfile.rust +++ b/docker/Dockerfile.rust @@ -3,7 +3,11 @@ # Multi-stage build for minimal final image # Stage 1: Build -FROM rust:1.85-bookworm AS builder +# Rust 1.87+ is required: `hnsw_rs 0.3.4` (transitive via wifi-densepose-ruvector -> +# ruvector-attn-mincut) uses `u*::is_multiple_of`, stabilised in 1.87. Pinning to a +# recent stable (1.90) for reproducibility — bump cautiously since reproducible +# builds rely on this. +FROM rust:1.90-bookworm AS builder WORKDIR /build