Add offline decoding from Stim detector-sample files by kvmto · Pull Request #83 · NVIDIA/Ising-Decoding

kvmto · 2026-05-28T15:31:04Z

Summary

Adds a file-based path that decodes Stim detector samples (samples_{X,Z}.dets + metadata_{X,Z}.json) with the same Ising pre-decoder + PyMatching pipeline used in-memory, plus a generator workflow for reproducible reference inputs.

What's new

qec/surface_code/stim_sample_io.py — on-disk Stim sample contract, with structural and optional noise-fingerprint validation against a rebuilt memory circuit before decoding.
data/predecoder_transform.py — canonical Stim detectors → (trainX, x_syn_diff, z_syn_diff) transform; shared by the file-based datapipe and the buffer-fused GPU module. A parity test in tests/test_offline_stim_decoding.py enforces algorithmic equality.
workflows/run.py — new workflow.task=generate_stim_data and PREDECODER_DECODE_MODE controls: pymatching_only (baseline; loads torch.nn.Identity(), no checkpoint required) and ising_decoding_pymatching (Ising pre-decoder + PyMatching).
scripts/offline_smoketest.sh + tests/test_offline_stim_decoding.py — end-to-end smoke and IO/transform/decoder coverage.
README.md + cookbook/predecoder.ipynb — file contract docs, BYO-samples instructions, generator usage, strict-validation semantics.

Plumbing through data/datapipe_stim.py, data/factory.py, evaluation/{inference,logical_error_rate}.py, export/generate_test_data.py, scripts/local_run.sh; opt-in [Inference Summary] JSON marker via PREDECODER_EMIT_INFERENCE_SUMMARY=1 for downstream tooling.

Validation behaviour

Structural mismatches (distance, rounds, basis, orientation, detector count, observable presence) are always fatal.
Noise mismatches (p_error, noise_model_sha256) are fatal by default; downgrade to warnings with PREDECODER_STIM_STRICT_NOISE=0 (for p_error sweeps / calibration studies).
Files written before the noise-fingerprint fields existed continue to load unchanged.

Smoke test

Ran end-to-end against the shipped checkpoints at both supported windows. Numbers line up with the reference example in README.md.

d=7, n_rounds=7, O1, Fast (R=9, model_id=1), 262,144 shots/basis

	No pre-decoder	After pre-decoder
LER X	0.002682	0.002254
LER Z	0.002678	0.002281
LER Avg	0.002680	0.002268
PyMatching latency Avg (µs/round)	0.807	0.445
PyMatching speedup	—	1.813×

d=13, n_rounds=13, O1, Accurate (R=13, model_id=4), 262,144 shots/basis

	No pre-decoder	After pre-decoder
LER X	0.000278	0.000263
LER Z	0.000244	0.000172
LER Avg	0.000261	0.000217
PyMatching latency Avg (µs/round)	2.183	0.976
PyMatching speedup	—	2.237×

Test plan

bash code/scripts/offline_smoketest.sh (d=7, Fast)
Manual run at d=13 with Accurate model
[Inference Summary] JSON marker parses for both decode modes
Metadata fingerprint cross-check fires on rebuild (no false negatives on the matched circuits)
CI: unit-tests (covers new test_offline_stim_decoding.py)
CI: spdx-header-check on all 4 new files

Introduces a file-based path that decodes Stim detector samples (samples_{X,Z}.dets + metadata_{X,Z}.json) with the same Ising pre-decoder + PyMatching pipeline used in-memory, and a generator workflow for reproducible reference inputs. New surface area: - qec/surface_code/stim_sample_io.py: on-disk Stim sample contract, structural + optional noise-fingerprint validation against a rebuilt memory circuit. - data/predecoder_transform.py: canonical Stim detectors -> (trainX, x_syn_diff, z_syn_diff); shared by file-based datapipe and the buffer-fused GPU module (parity test enforces this). - workflows/run.py: workflow.task=generate_stim_data and PREDECODER_DECODE_MODE={pymatching_only, ising_decoding_pymatching}. - scripts/offline_smoketest.sh + tests/test_offline_stim_decoding.py: end-to-end smoke + IO/transform/decoder coverage. - README + cookbook: BYO-samples contract, generator usage, strict-validation semantics. Plumbing in data/datapipe_stim.py, data/factory.py, evaluation/{inference,logical_error_rate}.py, export/generate_test_data.py, scripts/local_run.sh; opt-in [Inference Summary] JSON marker via PREDECODER_EMIT_INFERENCE_SUMMARY=1. Smoke-tested at d=7/n_rounds=7 (Fast, R=9) and d=13/n_rounds=13 (Accurate, R=13); LERs and PyMatching speedup match the reference example in README.md. Signed-off-by: kvmto <kmato@nvidia.com>

Run yapf --style=.style.yapf (based_on_style=google, column_limit=100, dedent_closing_brackets=true, split_before_closing_bracket=true) over the Python files touched by the previous commit. No semantic changes. Signed-off-by: kvmto <kmato@nvidia.com>

bmhowe23

Thanks, Kevin. Most of my comments are about code duplication.

bmhowe23 · 2026-05-28T16:06:38Z

+# limitations under the License.
+"""Canonical transformation from Stim detector samples to pre-decoder inputs.
+
+This is the single source of truth for converting Stim detector bits into the


The next paragraph about a separate implementation seems to contradict this "single source of truth" claim. Why can't we use the code that already exists for this? (E.g. PreDecoderMemoryEvalModule._batch_to_trainx_and_syndromes)

Solved in 9059d78.

bmhowe23 · 2026-05-28T16:10:25Z

+        return str(value).rjust(width)
+
+
+print("\nOffline Stim inference summary")


This whole block reproduces the table that code/evaluation/inference.py:196-212 already prints — same fields, same X/Z/Avg layout, slightly different formatting. The two will silently drift the next time someone updates one side. The structured [Inference Summary] JSON marker is the contract worth preserving; the human-readable table doesn't need a second canonical copy. Could the smoketest just tail / grep the inference.py output that's already in the same log, or print only the fields that the smoketest specifically wants to highlight (e.g. just the avg LER and speedup) so it's clearly an additional line and not a replacement?

Solved in 369f56f.

bmhowe23 · 2026-05-28T16:12:18Z

+                        self.assertTrue(torch.equal(z_syn, expected_z))
+                        self.assertTrue(torch.equal(train_x, expected_train_x))
+
+    def test_canonical_helper_matches_eval_module_batch_transform(self):


I believe the need for this test would go away if you addressed my previous code duplication comment.

Solved in 9059d78.

bmhowe23 · 2026-05-28T16:12:42Z

+                self.assertTrue(torch.equal(z_syn_m.to(torch.int32), expected_z_syn))
+                self.assertTrue(torch.allclose(train_x_m, expected_train_x, atol=0.0, rtol=0.0))
+
+    def test_in_memory_datapipe_matches_canonical_helper_on_its_own_dets(self):


I believe the need for this test would go away if you addressed my previous code duplication comment.

Solved in 9059d78.

bmhowe23 · 2026-05-28T16:13:58Z

+    """Independent oracle: build (x_syn_diff, z_syn_diff, trainX) from raw
+    Stim measurements via XOR-differencing.
+
+    This is the *fourth* implementation by design: it is the slowest, simplest


Reducing code duplication would make this the second, not the fourth (I think).

Suggested change

This is the *fourth* implementation by design: it is the slowest, simplest

This is the *second* implementation by design: it is the slowest, simplest

Solved in 9059d78.

ivanbasov · 2026-05-28T17:36:30Z

+_SUPPORTED_FORMATS = {"dets"}
+
+#: Current Stim sample metadata schema version.
+SCHEMA_VERSION = 2


schema_version is written here (and in build_stim_sample_metadata) but read nowhere — validate_stim_sample_metadata does not inspect it, and a metadata file claiming "schema_version": 1 or 99 would load identically. Two options that would make this field load-bearing:

Drop it — fingerprinted noise + structural checks already cover compatibility, so the version field is just visual noise.

Actually validate: reject schema_version > SCHEMA_VERSION, and document what (if anything) v1 looked like so older files can be handled deliberately.

As written, it looks like a check that exists but isn't enforced.

Solved in 369f56f.

Verified in 369f56f — schema_version is load-bearing now: missing/legacy defaults to v1, and non-int / bool / > SCHEMA_VERSION values are rejected. Thanks.

ivanbasov · 2026-05-28T17:36:34Z

+        idx_x = torch.cat([sentinel, x_bulk_idx[:-1], sentinel]) if T > 1 \
+            else torch.cat([sentinel, sentinel])
+
+    if T == 1:


The block at lines 121-130 unconditionally assigns idx_x/idx_z (including a torch.cat([sentinel, sentinel]) fallback for T==1, which has length 2). Then this block overwrites them for T==1. So the prior branch is not just dead for T==1 — it produces wrong-length tensors that would fail torch.index_select if they ever leaked through.

Cleaner to short-circuit:

if T == 1: if basis_upper == 'X': idx_x, idx_z = zero_idx, sentinel else: idx_z, idx_x = zero_idx, sentinel else: if basis_upper == 'X': idx_x = torch.cat([zero_idx, x_bulk_idx]) idx_z = torch.cat([sentinel, z_bulk_idx[:-1], sentinel]) else: idx_z = torch.cat([zero_idx, z_bulk_idx]) idx_x = torch.cat([sentinel, x_bulk_idx[:-1], sentinel])

Same semantics, no throwaway tensors, no malformed intermediates.

Solved in 369f56f.

Verified in 369f56f — short-circuited as suggested, no more wrong-length intermediates on the T==1 path. Thanks.

@ivanbasov

- predecoder_transform.py: short-circuit the T==1 / T>=2 detector-index selection so the T==1 path no longer materializes a wrong-length intermediate. Per @ivanbasov. - stim_sample_io.py: make schema_version load-bearing — reject values newer than SCHEMA_VERSION (and non-int garbage); default missing key to v1 so legacy files keep loading. Per @ivanbasov. - offline_smoketest.sh: stop reprinting the per-basis table that code/evaluation/inference.py already emits. Keep the [Inference Summary] JSON-marker parse and print one headline line for CI scrapers. README example updated. Per @bmhowe23. Signed-off-by: kvmto <kmato@nvidia.com>

Extract the buffer-fused dets -> (trainX, x_syn_diff, z_syn_diff) transform into `_predecoder_transform_core` in `code/data/predecoder_transform.py`. Both the file-datapipe entry point (`dets_to_predecoder_inputs`) and the GPU/ONNX export path (`PreDecoderMemoryEvalModule._batch_to_trainx_and_syndromes`) now delegate to it. - The helper builds the perms/grids ad-hoc per call from `compute_stab*_to_data_index_map` and `normalized_weight_mapping_*`, then casts syndromes to int32 at the return boundary to preserve its public contract. - The eval module forwards its pre-registered buffers straight through; `__init__` is unchanged. The ONNX export graph is byte-identical before and after (383 nodes / 34697 bytes for X, 388 nodes / 35074 bytes for Z). Drops the two parity tests that locked the previous duplicates together and corrects the cross-check oracle's docstring count ("fourth" -> "second"). The `predecoder_transform.py` module docstring no longer claims a separate GPU implementation; it points at the shared core. Signed-off-by: kvmto <kmato@nvidia.com>

Append ++model_id=${MODEL_ID} to the ising_decoding_pymatching phase's EXTRA_PARAMS when MODEL_ID is set, so running the smoketest with the Accurate checkpoint (R=13) no longer requires manually overriding hydra params. Signed-off-by: kvmto <kmato@nvidia.com>

ivanbasov

Re-reviewed after the latest push. The schema_version and T==1 fixes are both clean (replied on those threads), and the eval-module dedup in 9059d78 is the right call — nicely done verifying the ONNX graph is byte-identical.

One concern remains about the second deleted test — details inline.

ivanbasov · 2026-05-29T17:53:18Z

+    """Independent oracle: build (x_syn_diff, z_syn_diff, trainX) from raw
+    Stim measurements via XOR-differencing.
+
+    This is the *second* implementation by design: it is the slowest, simplest


This count is now off, and it points at a test I don't think should have been dropped.

The dedup in 9059d78 correctly merged PreDecoderMemoryEvalModule._batch_to_trainx_and_syndromes into _predecoder_transform_core, so dropping test_canonical_helper_matches_eval_module_batch_transform is fine — those two paths share code now.

But the in-memory inference datapipe QCDataPipePreDecoder_Memory_inference._precompute_transformations (code/data/datapipe_stim.py:197-256, plus the _X/_Z variants) is a separate implementation of this same dets → (trainX, x_syn_diff, z_syn_diff) transform — raw-measurement XOR-differencing. It predates this PR, was never deduplicated, and is a production path (code/data/factory.py:137 instantiates it for inference).

So there are still three independent implementations (canonical core, in-memory XOR-diff datapipe, this oracle), which makes the oracle the third, not the second. More importantly, test_in_memory_datapipe_matches_canonical_helper_on_its_own_dets was the only thing asserting the new file/canonical path agrees with the in-memory path — it wasn't a duplicate-locking test in @bmhowe23's sense, it locked two genuinely independent implementations together. With it gone, drift between in-memory and file inference would go uncaught.

Could we restore that test (it's CPU-only and fast) and bump this docstring back to third? Folding the in-memory datapipe into the shared core too would be cleaner, but that's arguably a separate PR.

Could we restore that test (it's CPU-only and fast) and bump this docstring back to third? Folding the in-memory datapipe into the shared core too would be cleaner, but that's arguably a separate PR.

This sounds like a good plan to me. Fixing pre-existing duplicated code should be handled be a future PR, but if one of the tests that I proposed deleting (test_in_memory_datapipe_matches_canonical_helper_on_its_own_dets) is actually still needed, then restoring it sounds good to me.

kvmto requested review from bmhowe23 and ivanbasov May 28, 2026 15:32

bmhowe23 reviewed May 28, 2026

View reviewed changes

ivanbasov reviewed May 28, 2026

View reviewed changes

kvmto added 3 commits May 29, 2026 10:45

ivanbasov reviewed May 29, 2026

View reviewed changes

		return str(value).rjust(width)


		print("\nOffline Stim inference summary")

	This is the fourth implementation by design: it is the slowest, simplest
	This is the second implementation by design: it is the slowest, simplest

Conversation

kvmto commented May 28, 2026

Summary

What's new

Validation behaviour

Smoke test

Test plan

Uh oh!

bmhowe23 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivanbasov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants