Research writeup: ordvec ordinal-routing analysis — characterized mechanism + clean negative (not a crate change)#211
Conversation
Code Review by Qodo
1. Unsafe RankQuant bucketing
|
PR Summary by QodoAdd ordinal-routing research probes, CRT vernier oracle, and writeups (examples/ + benchmarks/) WalkthroughsDescription• Add research-grade routing probes and generators under examples/ (no ordvec API changes) • Document CRT vernier seam theorem, density-collapse tau signal, and rigidity bounds • Include real-corpus runbook, citation audit, and Lean skeleton for CRT formalization Diagramgraph TD
U["Researcher"] --> E["embed_ollama.py"] --> N[(".npy embeddings")]
U --> G["gen_corpus.rs"] --> N
N --> P["Rust probes (examples)"] --> R["benchmarks/*.md findings"]
P --> L["Lean skeleton"] --> R
High-Level AssessmentThe following are alternative approaches to this PR: 1. Factor shared .npy reader/writer into a single helper module
2. Use an existing npy/ndarray crate for IO
3. Promote key probes into a dedicated research harness crate/workspace
Recommendation: Current approach (keeping everything in examples/ and benchmarks/ with explicit tiering/withdrawals) is appropriate for exploratory research and avoids impacting ordvec’s API. The main improvement worth considering is consolidating the repeated .npy IO + row-normalization code into a small internal helper to reduce duplication-driven bugs (similar to the RNG-desync and geometry-sharing issues already called out and fixed). File ChangesEnhancement (8)
Documentation (12)
|
e0fd8d4 to
c5b4098
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive suite of benchmarks, mathematical proofs, and experimental probes to investigate the 'prime mile-marker / spectral index' conjecture for training-free routing in ordvec. It includes a Lean formalization of the vernier CRT seam-density theorem, Python scripts for generating real embeddings, and several Rust examples analyzing intrinsic dimension, number variance, density collapse, and shard recall. The review feedback highlights several opportunities to improve performance and robustness: precomputing knot values and removing redundant sorting in spectral_probe.rs, avoiding unnecessary HashSet allocations and preventing out-of-bounds panics in shard_recall.rs, handling potential NaN values during sorting in density_collapse.rs and twonn_id.rs, and adding a missing NeZero instance argument in the Lean proof to ensure successful compilation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
54c5a8d to
d1c9bd1
Compare
|
Reframed as a research writeup — no The decisive bake-off (tau_rerank_bakeoff_results.md) What this branch is, then: a characterized mechanism + a clean negative + verified The Lean theorem work has moved to its proper home: Suggested disposition: merge as a labeled research/analysis artifact if the |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
7202976
8de43ce to
5610799
Compare
5610799 to
ade6995
Compare
…er formalization
Investigates whether prime/spectral structure can improve training-free vector
routing over ordvec codes. Net: the genuinely defensible results are the CRT
vernier seam structure and the rigidity theory; the empirical rigidity/routing
probes are exploratory with identified confounds (documented inline).
Deliverables:
- examples/crt_seam_oracle.rs: exhaustive finite proof of the CRT seam-density
structure (capped form, lcm spacing, single coincidence, honest no-floor result)
- examples/{spectral_probe,shard_recall,twonn_id,gen_corpus}.rs: routing-key
probes + corpus-zoo generator (npy I/O matching bench_rank)
- benchmarks/rigidity_impossibility_proofs.md: Thm 2/3 (key is not
number-variance-rigid; Theorem 2 binomial value L(1-L/n); over-broad
optimality claim retracted per review)
- benchmarks/crt_seam_oracle_results.md: capped density form min(2t+1,m_i)/m_i
- benchmarks/lean/VernierSeamDensity.lean: Lean 4 skeleton for ordvec-formalization
(crtEquiv crux carries required NeZero hypothesis)
- benchmarks/ADVERSARIAL_REVIEW.md: three hostile-reviewer findings, with each
contested claim tiered SOUND / CORRECTED / EXPLORATORY / RETRACTED
Adversarial-reviewed before commit; correctness fixes applied, empirical
overclaims demoted to exploratory with salvage paths.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Todd Baur <523330+toadkicker@users.noreply.github.com>
…rted probe Acting on three internal adversarial reviews: Correctness fixes: - gen_corpus Bug O: corpus and queries now share one latent geometry (A + prototypes from a dedicated seed), so shard_recall ground truth is valid. - shard_recall Bug L: build_projs seeds direction/phase RNGs separately and identically across arms -> aligned vs random-offset share directions, clean ablation. Post-fix: tied (0.9095 vs 0.9080), "random offsets redundant" holds. - CRT density: capped form min(2t+1,m_i)/m_i (uncapped exceeds 1 at t>=2). - Lean crtEquiv: now carries required [forall i, NeZero (m i)] (ZMod 0 = Z would make the Equiv false); hcongr lemma plan corrected. - rigidity proofs: Theorem 2 restated with binomial value L(1-L/n) (not "Poisson, exactly L"); over-broad "optimal over all partitions" claim RETRACTED as a non-sequitur. Withdrawn: - Number-variance "super-Poisson" finding: the salvage (smooth empirical unfold) INVERTED the isotropic/clustered ordering, proving the probe's unfold is uncalibrated. Withdrawn pending an estimator calibrated against a process with known closed-form variance. Theory is unaffected. All findings tiered in benchmarks/ADVERSARIAL_REVIEW.md. Examples build clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Todd Baur <523330+toadkicker@users.noreply.github.com>
…tion signal Reframes the prime/Sacks "dense-region structure" thread into the question it was actually reaching for, on the right substrate (the permutohedron S_D, not the integer line). Findings (examples/density_collapse.rs, FP32 ground truth): - Density collapse is NOT code collision (b=2 codes are length-D permutation- bucket sequences; exact collision ~0 even at noise=0.08). It is NEAR-collision: docs with Hamming-close b=2 codes the 2-bit scorer cannot rank apart. - The lost signal IS recoverable: among b2-lookalikes, FP32-true neighbours have lower top-k Kendall-tau (coordinate order) than far lookalikes — 87-91% of probes, both densities. That order is already in the Rank code, discarded by b=2, so a tau-rerank breaks dense-region ties with no new storage. Conclusion: exploitable dense-region structure lives in S_D (data's own order), not on N (primes/Sacks act on the index, carry no corpus information). Open: confirm on real embeddings; show tau-rerank beats b=4 at matched bytes on R@10. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Todd Baur <523330+toadkicker@users.noreply.github.com>
…ext, GPU) Adds examples/embed_ollama.py (GPU embeddings via ollama -> .npy) and a --corpus-npy loader to density_collapse. Runs the intra-code Kendall-tau test on 8665 real repo sentences embedded with nomic-embed-text (768-d, RTX 5080). Real-data results: - TwoNN intrinsic dim of nomic-embed-text ~= 13 (ambient 768), matches the predicted low-tens range. - Real embeddings are far more b=2-entangled than synthetic (~5083/8665 docs in each probe's lookalike shell vs ~60-120 synthetic). - The tau-signal HOLDS and sharpens monotonically with top-k: win rate 0.667 (k=8) -> 0.683 (16) -> 0.847 (32) -> 0.930 (64). At k=64, 93% of probes: FP32-true neighbours have lower intra-code Kendall-tau than the b2-lookalikes the code conflates them with. Confirms on real encoder geometry: the signal b=2 discards (fine permutation order, already in the Rank code) recovers dense-region true neighbours. Open: broader/larger corpus; show a tau-rerank beats b=4 at matched bytes on R@10. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Todd Baur <523330+toadkicker@users.noreply.github.com>
- Lean blindspot_card_product: add required [∀ i, NeZero (m i)] (ZMod 0 has no Fintype instance; the "easy half" needs it too, not just crtEquiv). - shard_recall: guard COPRIME_PERIODS[i] with i % len() (OOB panic if r>16); drop redundant truth_set HashSet, filter truth directly against cand. - spectral_probe: precompute knot (pos,val) arrays once instead of recomputing knot_rank inside the per-key binary search. - density_collapse / twonn_id: NaN-safe sort comparators (unwrap_or Equal). All probe outputs bit-identical after the changes (behavior-preserving): real-embedding density win 0.683 @ k=16, shard envelope 0.9095/0.9080, sphere control 6.99. Examples build clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Todd Baur <523330+toadkicker@users.noreply.github.com>
…en pipeline Acting on a second hostile review + Gemini/qodo PR bots on the real-embedding material: CRITICAL — density-collapse headline corrected (was an artifact): - M2: the win-rate climb 0.667->0.930 with top-k was estimator-variance, not a sharpening effect. Replaced win rate with the tau GAP (effect size) + bootstrap 95% CI over probes. - M1: tau was computed on the probe's OWN top-k coords, coupling it to cosine (near-tautological). Now uses the per-PAIR union of top coords, chosen independently of the cosine ranking. - Result survives but is MODEST and FLAT: gap ~= 0.04, CI strictly > 0 at every top-k. "Sharpening / signature of a real effect" RETRACTED; small real separation stands. qodo: bucketing bug — density_collapse used rank/(d/2^bits) (panics at d/2^bits==0; wrong for non-divisible dims). Now uses ordvec::rank::rank_to_bucket so it measures real RankQuant behavior. bits: u32 -> u8. embed_ollama.py: abort on embedding-count mismatch (E2 row-misalignment) and empty corpus (E3). Record the repo-sentence extraction procedure in results.md (E4/E5 repro gap). Soften single-corpus generality language (G1). Examples build clean; corrected gap result stable across topk. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Todd Baur <523330+toadkicker@users.noreply.github.com>
…hdrawn/ Makes the 11-doc benchmark set navigable for review without changing any findings: - benchmarks/README.md: tiered index (SOUND / THEORY / WITHDRAWN), a 3-minute reviewer path leading with the headline density-collapse result, and the open follow-up. No entry point existed before. - Move the WITHDRAWN number-variance docs (spectral_probe_results, corpus_zoo_results) to benchmarks/withdrawn/ with a README marking them record-only; fix the two cross-references. Top level now shows only what survived review. Pure documentation reorg — no code, no result, no API change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Todd Baur <523330+toadkicker@users.noreply.github.com>
The deployment question for the density-collapse finding, answered. Ceiling experiment (tau uses full stored ranks): does a Kendall-tau rerank of b=2 survivors beat spending the bits on b=4 at R@10 vs FP32? Result — NO, decisively, on synthetic AND real (nomic-embed-text) embeddings: b2 0.898 | b4 0.942 | tau-rerank 0.597 | fp32-rerank 1.000 (real, topk128/M50) - fp32-rerank=1.0 proves the b=2 candidate pool contains every true neighbour; candidate generation is fine. - tau-rerank (0.597) scores BELOW b=2's own ordering (0.898): the ~0.04 tau gap is a real binary discriminator but far too weak to ORDER candidates. - Even at the ceiling (full ranks, best params) tau only ties b=2, never reaches b=4. No compact tau codec could beat this, so codec work is not justified. Verdict: the density-collapse tau-signal is real-but-inert. Just use b=4. No ordvec feature follows. This closes the research line honestly: a characterized mechanism + a clean negative — the prime/spectral/permutation ideas do not beat the boring baseline. Confirms this branch is RESEARCH, not a contribution to the crate. Updates density_collapse_results.md (open question now RESOLVED) and the benchmarks README (verdict). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Todd Baur <523330+toadkicker@users.noreply.github.com>
…writeup The vernier CRT seam-density Lean work belongs in the companion proof repo, not here. Removed benchmarks/lean/ from this branch and opened it as a draft there: ordvec-formalization#17 (kept out of that repo's sorry-free audited tree). Cross-links updated: benchmarks/README and crt_seam_oracle_results.md now point at the formalization PR. This branch is now purely the research writeup — probes, findings, proofs-on-paper, and the decisive bake-off negative — with no code destined for the ordvec crate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Todd Baur <523330+toadkicker@users.noreply.github.com>
…ucibility + honesty nits Review of this research writeup before merge (no ordvec crate / public-API change — research, not a feature): - Relocate all 21 files (13 docs + 8 probe scripts) from benchmarks/ + examples/ into experiments/ordinal-routing-research/, and add experiments/ to [package].exclude. Keeps the research out of the published crate and off the compiled-examples / clippy path, and stops it mixing with the real BEIR harness now living in benchmarks/. - Regenerate the stale synthetic table in density_collapse_results.md from the current de-circularized density_collapse.rs (tau 0.3345/0.3656 and 0.1423/0.1602; gap + 95% CI) and drop the retracted win-rate column the doc itself disowns 30 lines later. - Honesty fixes: rigidity_impossibility_proofs.md drops a non-reproducing Σ²/L scalar (keeps the quotient argument, which is correct); README citation row lists only audited names (Ethayarajh, Broughan-Barnett); crt_seam doc updated to reflect ordvec-formalization#17 is an open, sorry-free PR (not a draft with proof debt); de-fragilized the doc-count phrasing. - README reframed for the new layout + a note on running the probes (copy a .rs into examples/ and cargo run --example). Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Codex stop-gate: the TwoNN intrinsic-dimension result was over-evidenced.
twonn_id_results.md is self-labeled "PARTIAL" (OLS-through-origin estimator,
biased, "prefer MLE before quoting exact values", no real-data measurement —
only synthetic + sphere validations), yet the front-door README listed it under
SOUND ("proven or real-data confirmed") and asserted nomic-embed-text "ID ≈ 13".
- Move the twonn_id row from SOUND to THEORY, marked PARTIAL: the chord-metric
fix is sound (sphere-validated), but the ID is an indicative low-tens LOWER
BOUND, not a confirmed value.
- Soften the "≈ 13" assertions in README + density_collapse_results.md to "low
tens (lower bound)".
- Fix the README front-door crt_seam row that still said "(draft)" — the Lean PR
is open + sorry-free (matches the crt_seam doc).
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
ade6995 to
0f73071
Compare
…across 5 encoders
Tests whether data-oblivious low-discrepancy structure (golden-angle, Fibonacci,
Sobol, Kronecker) improves training-free routing. Eight pre-registered probes
(thresholds fixed in source headers before running), real embeddings, GPU-measured.
Lives in experiments/ordinal-routing-research/ alongside the round-1 probes.
Verdict: CLASS-DEAD, replicated. Oblivious low-discrepancy directions do not beat
iid-random for routing across nomic / bge-m3 / bge-large / snowflake-arctic-v2 /
harrier-oss at real intrinsic dim 18-24. The one mid-ladder flicker (bge-m3 sobol
+0.024) failed to replicate at near-identical ID — caught by the >=2-corpora rule.
Probes + findings (oblivious_directions_results.md):
- uniformity_lemma: rank transform whitens marginals -> equal-width bucketing is
entropy-optimal; golden boundaries strictly waste bits. Marginal-only — blind to
the joint structure the directions probes target.
- overlap_decomp: ordvec's hypergeometric null is ~100% cone (hubness) on real
data; per-coord centering removes it and amplifies true-nbr overlap 2-5x.
- centering_recall: centering helps b=1 prefilter (+0.03..0.07 R@10) but FAILS at
b=4 (-0.07..-0.15, penalty grows with encoder capacity). Replicated x4 corpora.
- subspace_directions: center -> project to k-dim PCA subspace -> {random, sobol,
kronecker, pca-axes}. Oblivious sequences tie/lose random; only data-aligned
(pca-axes) leads at higher ID. The lever is data-alignment, which training-free
forbids.
- partition_balance: centering balances coarse cells (BALANCE pass) but not their
alignment (PRUNING fail) — no sublinear win.
Resolves the twonn_id PARTIAL: real-corpus ID measured ~18-24 across 5 encoders,
and intrinsic dimension is a CORPUS property (repo~13 vs fiqa~24, same encoder),
not an encoder constant. Corrects an in-session "ID~13" anchoring error rather
than dropping it.
No crate or public-API changes — experiments/ only.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Todd Baur <todd@baursoftware.com>
209065e to
f9b754b
Compare
…founded Synthetic enterprise lake = union of 3 nomic domains (fiqa+nq+quora = 281,729 docs, 3,000 queries spanning cones) + optional templated near-duplicate hub injection (make_lake.py). Three pre-registered questions, validated probes from round 2: A. Global centering breaks on multi-cone? NO — still collapses cone_base to the uniform null (B=4: 11.2->3.25), under the 0.30 "insufficient" bar. 3 domains share a removable common mean. (Falsifies the "no single removable mean" pred.) B. Higher union ID revives oblivious structure? Premise FALSE — union ID = 8.2 (vs ~24/domain). Far-apart clusters read as discrete blobs = LOWER intrinsic dim. Cones add separation, not dimensions. (Falsifies the "union->higher ID" pred.) C. Brittle to templated hubs? NO — b=4 R@10 flat across 0/5/10/15% hub prevalence (0.843 -> 0.841, drop -0.002, essentially immune). Fixed-mass rank code has no global frequency/IDF term for a hub to poison. Verdict: ordvec's training-free encoding does NOT degrade on multi-cone / hub-heavy lake geometry — if anything the geometry is easier (lower ID, globally centerable, hub-immune). Both pre-registered "messy lake = harder" intuitions were falsified by the validated probes — the payoff of the negative-verification discipline. CAVEAT: synthetic union of CURATED corpora; models multi-cone + hubs, NOT the OCR/mixed-language/length heterogeneity of a true S3 dump. Established: multi-domain union is benign. Not established: raw dirty S3 sludge is benign. No crate or public-API changes — experiments/ only. Signed-off-by: Todd Baur <todd@baursoftware.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The last synthetic-lake pathology capturable without real dirty data. Embed the
same 57.6k fiqa docs at four chunk lengths {128,256,512,1100} and union into a
230k-doc length-mixture lake; re-run the incumbent centering_recall b=4 raw arm.
Two pre-registered findings:
- The "chunk length is a third geometry axis" claim: real but SMALL and CO-AXIAL.
R̄ (cone tightness) spreads only 0.705->0.723 over an 8.6x length range and the
four cone axes are >=0.986 aligned — not the distinct geometries the "mixture of
geometries" framing imagined. Direction (longer->tighter) holds; magnitude tiny.
- b=4 raw R@10 is IMMUNE to the mixture: 0.8230 (single-length) -> 0.8253 (lake),
Δ +0.002 flat to noise, CR@100 stays 1.0. Same mechanism: per-vector rank codes
discard the magnitude a tightness-shift moves, so global bin edges can't be
poisoned — the same training-free property that gave Phase B hub-immunity.
With Phase B (multi-domain cones + hubs) this leaves every synthetic lake pathology
benign for "spend the bits, b=4, raw ranks." Only un-run test left needs real dirty
data (OCR/multilingual S3 sludge), uncapturable from clean embeddings.
Probes: make_length_lake.py + centering_recall.rs. Reconciles README tiers, the
deployment-question section, and the Path B forward-pointer in the directions doc.
Signed-off-by: Todd Baur <todd@baursoftware.com>
Closeout — the research line is completeFinal commit Verdict: research, not a feature. Every constructed alternative to "spend the bits,
Two earlier headline claims were overturned and corrected in place rather than The only non-derivative test remaining needs real dirty data — an actual S3 Reviewer's guide: 🤖 Generated with Claude Code |
The one reusable positive from the ordinal-routing research arc: ordvec is robust-by-construction on heterogeneous corpora, and it's a direct corollary of the constant-composition mechanism already documented in RANK_MODES — ranks discard magnitude and there's no global IDF term, so the two failure modes that corrupt learned codebooks (near-duplicate hubs, mixed chunk lengths) have nothing to grab. Promotes it out of experiments/ (research record) into the product surface: - docs/RANK_MODES.md: new "A consequence: robust by construction on messy corpora" section right after the mechanism, with the measured numbers (hub −0.002 through 15%; chunk-length mixture +0.002) and honest scope (synthetic-messy curated corpora; does NOT model OCR/multilingual/broken-encoding). - README "What's different": a discoverable bullet linking to the section. Numbers and full pre-registered record stay in experiments/ordinal-routing-research/; this is the citation, not new claims. Signed-off-by: Todd Baur <todd@baursoftware.com>
Exploratory research into ordvec's density behavior and whether prime / spectral /
oblivious structure can improve training-free routing. All work lives in
experiments/ordinal-routing-research/(probes*.rsnext to the findings*.mdthat produced them) — no changes to the
ordveccrate or its public API. Thedirectory is
package.excluded: it ships with the source tree but not the publishedcrate. Self-reviewed by three adversarial agents plus the PR bots; findings are tiered
by what survived scrutiny.
Bottom line — a characterized mechanism and a clean negative
Research, not a feature. The prime / spectral / permutation / oblivious-structure
ideas for dense-region retrieval do not beat the boring baseline. The honest verdict:
spend the bits — use b=4 raw rank codes. Every constructed alternative was
pre-registered with pass/fail thresholds before running, and every one came back
negative or inert.
SOUND — proven or real-data confirmed
real but does not beat b=4 at matched bytes: real embeddings b4 R@10 0.942 vs
b2 0.898 vs tau-rerank 0.597 (fp32-rerank 1.000). The b=2 candidate pool contains
every true neighbour, but the ~0.04 tau gap is too weak to order them — it scores
below b=2's own ordering. Real-but-inert. (
tau_rerank_bakeoff_results.md)separate; among those lookalikes true neighbours have lower intra-code Kendall-tau
(gap ≈ 0.04, CI > 0). Small but real. (
density_collapse_results.md)directions (golden-angle / Sobol / Kronecker) do not beat iid-random for
training-free routing — across 5 encoders (nomic, bge-m3, bge-large,
snowflake-arctic-v2, harrier-oss) at real intrinsic dim 18–24. CLASS-DEAD,
pre-registered, replicated (the one mid-ladder flicker failed to replicate). The one
robust positive: data-aligned (PCA) directions lead at higher ID — the lever is
data-alignment, which training-free forbids. (
oblivious_directions_results.md)coincidence/period, capped density
∏ min(2t+1,m_i)/m_i. Lean 4 formalization issorry-free in the companion repoordvec-formalization#17.
(
crt_seam_oracle_results.md)RNG-desync fix). (
shard_recall_results.md)Deployment-robustness sub-arc — RESOLVED (negative across every synthetic pathology)
Does training-free b=4 routing degrade on messy enterprise-lake geometry? Three
pre-registered fears, all unfounded:
injection). Union ID is lower (8.2 vs ~24/domain), global centering still works,
and b=4 R@10 is immune to templated hubs (flat −0.002 through 15% prevalence).
(
oblivious_directions_results.md, Phase B section)lengths {128,256,512,1100} → 230k-doc lake). b=4 raw R@10 immune: 0.823 → 0.825
(+0.002, CR@100 = 1.0). Bonus: the "chunk length is a third geometry axis" claim
is real but small and co-axial — R̄ spreads only 0.705→0.723 over an 8.6× length
range, cone axes ≥0.986 aligned. (
length_mixture_lake_results.md)Mechanism for the robustness: per-vector rank codes discard the magnitude a hub or a
chunk-length tightness-shift moves, and there's no global IDF/frequency term to poison.
Training-free + rank-based is the property that confers the immunity. The only un-run
test left needs real dirty data (OCR / multilingual S3 sludge), uncapturable from
clean embeddings.
THEORY — directionally right, restated honestly
L(1−L/n)); theover-broad "quantile optimal over all partitions" claim is retracted as a
non-sequitur. (
rigidity_impossibility_proofs.md)corrected. (
conjecture_citation_audit.md)estimator is biased. Real-corpus ID (~18–24) is established in the directions arc,
not here. (
twonn_id_results.md)WITHDRAWN — see
withdrawn/Number-variance "super-Poisson" finding: its unfold is uncalibrated (a salvage attempt
inverted the result). Kept for the record, not as a claim. The theory above does not
depend on it.
Conjecture verdict
Prime / Li(x) / Sacks-spiral constructions don't help retrieval: they act on the index
(ℕ) and carry no corpus information. The exploitable structure lives on the
permutohedron
S_D— the data's own order — which is the density-collapse result, andthat turned out inert against b=4. Closed: research, not a feature.
The discipline that held the line: pre-register pass/fail thresholds in each probe's
source header before running, and require replication across ≥2 corpora (this caught
a seed-1 false positive and a bge-m3 flicker). Reviewer's guide and full tiering:
experiments/ordinal-routing-research/README.mdandADVERSARIAL_REVIEW.md.🤖 Generated with Claude Code