Ixon serialization rewrite with large speed and size reduction #288

johnchandlerburnham · 2026-01-14T11:57:22Z

Major rewrite of the Ixon serialization format optimized for space efficiency.
Now Ix + Lean in Lean environment compiles in 10.21 seconds with
serialized size of 1.6 GB

Key changes:

Split ixon module into separate files (constant, env, expr, univ, metadata, serialize, sharing, tag, comm, error)
New tag-based encoding (Tag0, Tag2, Tag4) for compact representation
Refs table: deduplicates constant references using indices instead of inline 32-byte addresses
Univs table: deduplicates universe terms per constant
Global name indexing: shares names across entire environment, metadata uses u64 indices instead of 32-byte addresses
Sharing analysis: identifies shared subexpressions within constants
Pre-order traversal indexing for expression metadata

The old ixon module preserved as ixon_old.rs for migration reference.

Major rewrite of the Ixon serialization format optimized for space efficiency. Now Ix + Lean in Lean environment compiles in 10.21 seconds with serialized size of 1.6 GB Key changes: - Split ixon module into separate files (constant, env, expr, univ, metadata, serialize, sharing, tag, comm, error) - New tag-based encoding (Tag0, Tag2, Tag4) for compact representation - Refs table: deduplicates constant references using indices instead of inline 32-byte addresses - Univs table: deduplicates universe terms per constant - Global name indexing: shares names across entire environment, metadata uses u64 indices instead of 32-byte addresses - Sharing analysis: identifies shared subexpressions within constants - Pre-order traversal indexing for expression metadata The old ixon module preserved as ixon_old.rs for migration reference.

…n test

Main fix: build_sharing_vec now re-sorts shared hashes in topological order (leaves first) instead of using gross-benefit order from decide_sharing. Previously, large compound expressions came first and couldn't reference their children via Share(idx), causing each entry to inline entire subtrees (30-35KB per entry instead of 3-13 bytes). This caused 1140x blow-up for pathological blocks like denote_blastDivSubtractShift_q. Also fixes hash-consed size calculation to include structural overhead for Inductive/Constructor/Recursor data structures, not just expressions. This makes the comparison fair - previously simple enums like SymbolKind showed Ixon as worse than hash-consing (1.91x) when it's actually 21x better. Results after fix: - Overall ratio: 0.071x (Ixon is 14x more efficient than hash-consing) - 0 blocks with ratio >= 1.0x (previously 4) - 11,722 blocks with 20x+ compression - Total serialized: 159MB vs 2.24GB hash-consed Other changes: - Remove early-break bug in decide_sharing that skipped profitable candidates - Add TRACK_HASH_CONSED_SIZE and ANALYZE_SHARING debug flags - Add analyze_sharing_stats() for debugging pathological cases - Add test_early_break_bug unit test - Add granular ratio distribution buckets in size analysis output

…tion, refactor FFI layer into focused modules, and migrate pipeline to Ix types Introduce a pure Lean compiler (CompileM) that produces bit-identical Ixon output to the existing Rust compiler, enabling cross-implementation verification. The compiler supports both sequential and wave-based parallel compilation with optional fail-fast mismatch detection against the Rust reference implementation. Lean-side additions: - Ix/CompileM.lean: ~2000-line pure Lean compiler with reader/state/except monad stack. Supports parallel compilation via wave-based scheduling with dedicated worker threads and channels. - Ix/DecompileV2.lean: Decompiler from Ixon format back to Lean expressions, resolving refs/univs tables and expanding sharing references. - Ix/ShardMap.lean: Concurrent hashmap with power-of-2 sharding, cache-line padding, double-checked locking, and bulk insert parallelism. - Ix/Sharing.lean: Pure Lean sharing analysis using alpha-invariant Merkle-tree hashing (blake3). Two-phase O(n) algorithm with DAG construction via post-order traversal and profitability heuristic. Lean-side migrations: - GraphM, CondenseM, Mutual: migrated from Lean.Name/Lean.Expr to Ix.Name/Ix.Expr throughout the compilation pipeline. - Ix/Ixon.lean: expanded with metadata types (ExprMetaData, ExprMetaArena, ConstantMeta, Named, Comm), full serialization for metadata, and array-based RawEnv types for FFI boundaries. - MutCtx definition moved from Common.lean to Environment.lean. Rust-side refactoring: - Split monolithic src/lean/ffi/ixon.rs (1959 lines) into ixon/ module with focused files: compare, constant, enums, env, expr, meta, serialize, sharing, univ. - Split src/lean/ffi/lean_env_builder.rs (2551 lines) into builder.rs, compile.rs, graph.rs, and ix/ submodule (address, constant, data, env, expr, level, name). - Added primitives.rs for basic Lean type FFI roundtrips. - Preserved legacy FFI in ixon_legacy.rs. - Updated compile.rs and decompile.rs for new metadata and canonicalization pipeline. Test infrastructure: - Reorganized test generators into Tests/Gen/{Basic,Ix,Ixon}.lean. - Reorganized FFI tests into Tests/FFI/{Basic,Ix,Ixon}.lean. - Added property-based roundtrip tests for all FFI types using SlimCheck. - Added cross-implementation serialization comparison tests (Lean vs Rust byte-level equality for univs, exprs, constants, and envs). - New domain tests: Tests/Ix/{CondenseM,GraphM,Sharing}.lean, Tests/ShardMap.lean. - Removed obsolete Tests/FFIConsistency.lean. Key design decisions: - Arena-based expression metadata separates semantic content (alpha-invariant constants) from non-semantic metadata (binder names, reducibility hints). - Array-based "Raw" types bridge the FFI boundary, avoiding hash function mismatches between Lean and Rust. - All serialized sections sorted by address for deterministic output. - Name indices use DFS topological sort with address-based tie-breaking.

docs/Ixon.md

gabriel-barrett · 2026-02-02T12:52:04Z

docs/Ixon.md

+The sharing vector is built incrementally:
+- Each entry can only reference earlier entries (no forward references)
+- Entries are sorted by profitability (most savings first)
+- Root expressions are rewritten using all available share indices


What happens if someone does not follow these rules? Specifically

what if the entries are not sorted by profitability? Is that really a problem?

what if early entries have forward references?

what if there's an expression in the sharing vector that is not used anywhere. Will that be type checked? Or perhaps it will just be garbage?

The second one can be mitigated by a simple check, and I see that indeed it might be a problem since it could lead to infinite expressions without using mutual blocks. However, the first one seems to be expensive to check

If entries aren't sorted, or aren't in topological order, then the constant is non-optimal or degenerate. That will result in a different hash for the expression, but might be fine semantically. Or it might result in exponential blow-up.

The practical implication is that if someone manually constructs a degenerate constant, they can roll a different hash. That doesn't affect proofs of typechecking at all, but can create confusion on proofs of evaluation, since now there can be multiple hashes for the normalized output.

That said, I don't think this is such a big deal, since the commitment feature does this already, so we can treat degenerate constants as a kind of ad-hoc commitment if the preimage is private. And if the pre-image is public its trivial to check that the constant is optimal by re-running the sharing analysis.

This suggests we might want to also be able to prove non-degeneracy for evaluation, and I think that's doable and a good idea, but not something that has to be an immediate priority

To your specific questions:

what if the entries are not sorted by profitability? Is that really a problem?

Shouldn't really be a problem other than rolling the hash

what if early entries have forward references?

could cause a blow-up on expansion, like a zip-bomb. We probably should have special detection for this

what if there's an expression in the sharing vector that is not used anywhere. Will that be type checked? Or perhaps it will just be garbage?

I think it's just garbage that rolls the hash and increases the size, otherwise neutral

Regarding my third question, I think indeed it should be treated as garbage. We can't even typecheck since there's no context to give meaning to its potential free variables. Plus, decompiling the constant will automatically remove these references

- Fix 219 clippy warnings across 21 Rust files (ptr_as_ptr, unnecessary_unsafe, unused imports, redundant closures, etc.) - Delete ixon_old.rs (2431 lines), IxonOld.lean (952 lines), CompileMOld.lean (1161 lines), DecompileMOld.lean (704 lines) - Migrate all dependents to new Ixon APIs: Store.lean, Serde.lean, IxVM.lean, Prover.lean, Verifier.lean, mutual.rs, compile.rs - Relocate ixon_legacy.rs FFI functions to proper modules: sharing analysis → ixon/sharing.rs, env serde → ixon/serialize.rs, compilation FFI → ffi/compile.rs - Remove old-format FFI section from serialize.rs (~480 lines) - Delete unused put_u32/get_u32 from metadata.rs - Delete TODO:DELETEME block from address.rs

…n README Annotate all public types, fields, functions, and constants in the Ix pipeline (env, address, graph, condense, ground, mutual, store) with rustdoc and Lean doc comments. Add a Testing section to the README covering lake test suites and cargo test. Rename DecompileV2 to DecompileM.

Rewrite DecompileM to decompile Ixon format to Ix types (not Lean types), enabling cheap Blake3 hash comparison for roundtrip verification. Add arena-based metadata restoration (binder names, binder info, mdata), Syntax/DataValue blob deserialization, and chunked parallel decompilation. Extend Rust FFI to pass the full names table (1.3M entries including binder names and level params) through RawEnv, fixing lookup failures for non-constant-name addresses. Add standalone `lake test -- decompile` suite: 187K constants roundtrip with 0 errors and 0 mismatches.

arthurpaulino

I'm still reviewing the code, learning about the new Ixon. But I've already left some comments.
There's also #297, which avoids several unnecessary copies.

docs/Ixon.md

Tests/Ix/FFI.lean

Tests/Main.lean

…e endpoints Unify CompileError, DecompileError, and SerializeError across Lean and Rust so constructor tag order matches exactly, enabling direct FFI roundtripping. Replace catch-all MissingName variant with specific error types (BadConstantFormat, BlobNotFound, BadBlobFormat). Change CompileError fields from Name/Expr to String for FFI compatibility and add UnknownUnivParam variant. Add new Rust FFI endpoints: rs_ser_env/rs_des_env for Ixon.Env serialization, and rs_decompile_env for full decompilation. Consolidate duplicated RawEnv→Env conversion into RawEnv.toEnv, include full names table for round-trip fidelity. Add roundtrip tests for all three error types (unit + property-based) and integration tests for Rust serialization and decompilation pipelines.

…acon

…nerators The staged diff covers 11 instances improved with real shrinking logic (sub-term extraction, index halving, array popping, variant simplification), plus the RawEnv fix for the missing names field. The Shrinkable Ix.Name instance was also moved earlier in Tests/Gen/Ix.lean so SyntaxPreresolved can delegate to it

The Lean compiler's `storeNameStrings` only stored string blobs but never populated a names map during compilation. Names were derived solely from `named` entries at aggregation time, producing ~239K names vs Rust's ~1.3M (which tracks every name encountered during compilation). Replace `storeNameStrings` with `compileName` that both stores string blobs and records each name component in a new `BlockState.blockNames` map. Add missing `compileName` calls for binder names, level params, MData keys, DataValue names, Syntax node kinds, ident values, and preresolved namespace/decl names. Merge per-block names during aggregation in both sequential and parallel compilation paths. Also fix `SerializeError.toString` to use `String.ofList` instead of the removed `List.asString`.

…ce checks Introduce Ix.Claim (claim types + serde) and Ix.Commit (compilation, commitment, and claim construction). The commitment pipeline compiles a definition, creates a blinded commitment via Comm, and includes a runtime alpha-invariance guard that recompiles under the commit name to verify the content address is stable. - Claim types: eval, check, reveal with tagged binary encoding (0xE4/E3/E6) - RevealConstantInfo: selective field revelation for all constant variants - compileDef: single-definition compilation with optional name parameter - commitDef: full pipeline with alpha-invariance recompilation check - openConstantInfo/openCommitment: extract revealed fields from compiled constants - Comm serde support in Ixon (tag 0xE5) on both Lean and Rust sides - Property-based and unit test suites for claims and commits - Migrate ZKVoting Prover/Verifier to new Commit API - Format byte counts with units (B/kB/MB) in compile test output

* cargo fmt * Clippy and fix Nix build * Move slow tests to separate binaries * chore: cargo-deny

- Add unified Tag4 reference table to docs/Ixon.md showing all flags 0x0-0xF - Fix ExprMetaData serialization table (was completely wrong: Leaf/App missing, tag numbers off by 2) and update to arena-based representation - Add ConstantMeta serialization table (was missing entirely) - Add module headers to CondenseM.lean (Tarjan's SCC algorithm), Claim.lean (ZK claim types), and CanonM.lean (canonicalization) - Document CondenseState fields and visit function with algorithm explanation - Remove commented-out debug traces and dead counters in CanonM.lean - Add doc comments to 14 public comparison/sorting functions in compile.rs - Add module-level docs to src/ix.rs, nat.rs, sarray.rs, ctor.rs, lean_env.rs, compile.rs (FFI), and proof.rs - Fix duplicate comment and wrong range in expr.rs flag constants - Replace panic!() with unreachable!() in FFI match arms (compile.rs) - Delete empty todo.md - Rename all mod.rs to modern x.rs convention (8 files)

…ming Test graph (9), ground (12), mutual (12), ixon::env (10), and strong_ordering (17) — 60 new tests covering the reference graph construction, groundedness checking, mutual block context indexing, Ixon environment CRUD, and comparison algebra. Also add compile phase timing to rs_tmp_decode_const_map.

Tests/Main.lean

Correctness: - Fix mutual block projection indexing for alpha-equivalent constants. Only push one representative per equivalence class into Muts array, matching the class-based indexing used by MutConst::ctx and projections. Previously, all constants were pushed flat, causing index misalignment when an equivalence class had >1 member. Fixed on both Rust and Lean sides. - Fix compare_const cross-kind ordering to use MutConstKind discriminant instead of always returning Lt. - Fix compare_const/compare_ctor cache key inversion by tracking swap state. - Fix check_decompile to return Err on mismatches instead of always Ok. - Fix CString::new().unwrap() panics by adding safe_cstring helper that filters null bytes instead of panicking. - Fix Box::leak memory leaks in decode_serialize_error by changing SerializeError fields from &'static str to String. - Fix buildSharingVec call sites in Tests/Ix/Sharing.lean after removing unused _topoOrder parameter. Performance: - Replace busy-wait polling (sleep 100us) with Condvar in compile_env work-stealing loop. - Eliminate double serialization in Lean compilation by adding blockBytes and blockAddr to BlockResult, computed once via BlockResult.mk'. - Deduplicate DefnInfo/ThmInfo/OpaqueInfo compilation into compile_single_def helper. Cleanup: - Remove debug leftovers: BLE' dbg_trace filter, ~90 lines of commented-out code in Common.lean, unused progress tracking variables. - Remove unused _topoOrder parameter from buildSharingVec (Lean). - Deduplicate u64ByteCount by referencing Ixon.u64ByteCount in CompileM. - Guard division by zero in SharingStats::Display. - Improve put_idx panic message with index size context. - Fix doc comment typo: Bytes1 -> Bytes2 in bytes2.rs. Tests: - Add test_compile_mutual_alpha_equivalent_defs (2 alpha-equiv defs → 1 Muts entry). - Add test_compile_mutual_alpha_equiv_with_different_third (2 classes → 2 Muts entries).

…acon

Address redundant field names, unnecessary let binding, needless pass-by-value, and rustfmt formatting across the compilation pipeline.

- Fix compare_const antisymmetry bug: cross-kind MutConst pairs always returned Lt regardless of order, violating total ordering. Now uses kind ordinals. Also fix cache key normalization that didn't flip results. - Rewrite unshare_expr from recursive to iterative stack-based traversal to prevent stack overflow on deep expressions. - Validate QuotKind deserialization tags instead of silently mapping invalid values to .ind. - Return CheckResult from check_decompile with match/mismatch/missing counts instead of always Ok(()). - Fix println to eprintln for diagnostic output in FFI compile. - Remove dead code: Tests/Ix.lean (empty suite), Tests/Ix/Common.lean, Tests/Ix/IR.lean, c_lean_cstr_to_nat C wrapper. - Clean up Tests/Main.lean commented-out entries, re-enable sharing-io. - Fix garbled comment in src/lean/ffi.rs. - Update README test suite documentation to match actual suites.

- Add ffi_io_guard to catch panics across extern "C" boundary and return IO errors instead of triggering undefined behavior - Return proper IO errors from compilation FFI instead of empty results - Add null-pointer guards on RustCompiledEnv FFI functions - Cap Vec pre-allocation during deserialization to prevent OOM from malformed input (capped_capacity helper) - Sort blobs, consts, named, comms, and names by address before serialization for deterministic output matching Lean - Serialize ReducibilityHints.regular as u32 instead of u64 to match its actual Lean type - Throw errors on missing name/parent addresses during deserialization instead of silently falling back to anonymous names - Rename RustCompiledEnv → RustBlockEnv to avoid confusion with the full compilation environment - Remove dead commented-out code and unused imports

Variable-length Tag0 encoding is 1 byte for typical heights (< 128) vs the previous fixed 4-byte u32 LE encoding.

- Remove all eprintln!/println! debug output from FFI functions (compile.rs, lean_env.rs) — errors propagated via return values - Make serialization fallible: Env::put, put_idx, metadata serialization now return Result instead of panicking on missing name indices - Fix unsafe memory access: use read_unaligned for u64 casts in ctor.rs and primitives.rs, add bounds check in rs_bytearray_to_u64_le - Replace get! with safe get? via Option.bind in CanonM.lean lookups - Add SAFETY comments to get! sites in CondenseM.lean - Strengthen test equality: ConstantInfo uses BEq (all fields), add BEq deriving; document size-only comparison limitations - Tighten test assertions: sharing tests, decompile success checks - Remove HashMap.find? shim, rename genString to genIxString, deduplicate genAddress, set panic=abort in Cargo profiles - Fix CompileM: revert incorrect find? → get? on Batteries.RBMap - Use correct error variant (UnknownUnivParam) for level comparison - Normalize compare_const cache to use ordered key pairs

johnchandlerburnham changed the title ~~WIP: Ixon serialization rewrite with 85% size reduction~~ WIP: Ixon serialization rewrite with large speed and size reduction Jan 14, 2026

johnchandlerburnham added 9 commits January 14, 2026 07:34

cargo clippy, fmt, plus a per-constant size utility in the integratio…

1868bbb

…n test

WIP: Docs and serialization improvements

f776074

WIP: Docs

1727e5b

fix docs

0c733ef

tmp

d32d667

Merge branch 'main' of github.com:argumentcomputer/ix into jcb/lacon

818acc3

canonicalization

6854e0f

gabriel-barrett requested review from arthurpaulino, gabriel-barrett and samuelburnham February 1, 2026 13:25

gabriel-barrett reviewed Feb 2, 2026

View reviewed changes

docs/Ixon.md Show resolved Hide resolved

gabriel-barrett reviewed Feb 2, 2026

View reviewed changes

johnchandlerburnham added 3 commits February 2, 2026 11:46

arthurpaulino reviewed Feb 2, 2026

View reviewed changes

docs/Ixon.md Outdated Show resolved Hide resolved

Tests/Ix/FFI.lean Outdated Show resolved Hide resolved

Tests/Main.lean Outdated Show resolved Hide resolved

arthurpaulino and others added 10 commits February 2, 2026 23:35

chore: Name::get_hash can return a reference instead of copying (#297)

faa2cce

Merge branch 'jcb/lacon' of github.com:argumentcomputer/ix into jcb/l…

afa4e6c

…acon

Remove dead Tests/Ix/FFI.lean, all content migrated to Tests/FFI/*

77fbe6e

fix: Nix build and tests (#299)

03ac8cd

* cargo fmt * Clippy and fix Nix build * Move slow tests to separate binaries * chore: cargo-deny

fix breaking module doc comments

3169839

johnchandlerburnham added 2 commits February 5, 2026 18:03

remove Toy.lean file

575e030

johnchandlerburnham changed the title ~~WIP: Ixon serialization rewrite with large speed and size reduction~~ Ixon serialization rewrite with large speed and size reduction Feb 5, 2026

fix: Nix tests

1013beb

arthurpaulino requested changes Feb 6, 2026

View reviewed changes

Tests/Main.lean Show resolved Hide resolved

johnchandlerburnham added 3 commits February 5, 2026 20:13

Merge branch 'jcb/lacon' of github.com:argumentcomputer/ix into jcb/l…

8796804

…acon

fix: Resolve clippy warnings and formatting issues

ef54770

Address redundant field names, unnecessary let binding, needless pass-by-value, and rustfmt formatting across the compilation pipeline.

arthurpaulino previously approved these changes Feb 6, 2026

View reviewed changes

johnchandlerburnham dismissed arthurpaulino’s stale review via 43cfc22 February 6, 2026 02:53

johnchandlerburnham added 5 commits February 5, 2026 23:06

Use Tag0 for ReducibilityHints.regular height serialization

9236fea

Variable-length Tag0 encoding is 1 byte for typical heights (< 128) vs the previous fixed 4-byte u32 LE encoding.

formatting

5368d72

xclippy warnings

364231d

johnchandlerburnham merged commit d8d7af5 into main Feb 6, 2026
19 checks passed

johnchandlerburnham deleted the jcb/lacon branch February 6, 2026 06:25

arthurpaulino mentioned this pull request Feb 6, 2026

Use reverse FFI to improve data coming from Rust #302

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ixon serialization rewrite with large speed and size reduction #288

Ixon serialization rewrite with large speed and size reduction #288

Uh oh!

johnchandlerburnham commented Jan 14, 2026

Uh oh!

Uh oh!

gabriel-barrett Feb 2, 2026 •

edited

Loading

Uh oh!

johnchandlerburnham Feb 2, 2026

Uh oh!

gabriel-barrett Feb 4, 2026

Uh oh!

arthurpaulino left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Ixon serialization rewrite with large speed and size reduction #288

Ixon serialization rewrite with large speed and size reduction #288

Uh oh!

Conversation

johnchandlerburnham commented Jan 14, 2026

Uh oh!

Uh oh!

gabriel-barrett Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johnchandlerburnham Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gabriel-barrett Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

arthurpaulino left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gabriel-barrett Feb 2, 2026 •

edited

Loading