Add offline decoding from Stim detector-sample files#83
Conversation
Introduces a file-based path that decodes Stim detector samples (samples_{X,Z}.dets + metadata_{X,Z}.json) with the same Ising pre-decoder + PyMatching pipeline used in-memory, and a generator workflow for reproducible reference inputs.
New surface area:
- qec/surface_code/stim_sample_io.py: on-disk Stim sample contract, structural + optional noise-fingerprint validation against a rebuilt memory circuit.
- data/predecoder_transform.py: canonical Stim detectors -> (trainX, x_syn_diff, z_syn_diff); shared by file-based datapipe and the buffer-fused GPU module (parity test enforces this).
- workflows/run.py: workflow.task=generate_stim_data and PREDECODER_DECODE_MODE={pymatching_only, ising_decoding_pymatching}.
- scripts/offline_smoketest.sh + tests/test_offline_stim_decoding.py: end-to-end smoke + IO/transform/decoder coverage.
- README + cookbook: BYO-samples contract, generator usage, strict-validation semantics.
Plumbing in data/datapipe_stim.py, data/factory.py, evaluation/{inference,logical_error_rate}.py, export/generate_test_data.py, scripts/local_run.sh; opt-in [Inference Summary] JSON marker via PREDECODER_EMIT_INFERENCE_SUMMARY=1.
Smoke-tested at d=7/n_rounds=7 (Fast, R=9) and d=13/n_rounds=13 (Accurate, R=13); LERs and PyMatching speedup match the reference example in README.md.
Signed-off-by: kvmto <kmato@nvidia.com>
Run yapf --style=.style.yapf (based_on_style=google, column_limit=100, dedent_closing_brackets=true, split_before_closing_bracket=true) over the Python files touched by the previous commit. No semantic changes. Signed-off-by: kvmto <kmato@nvidia.com>
bmhowe23
left a comment
There was a problem hiding this comment.
Thanks, Kevin. Most of my comments are about code duplication.
| # limitations under the License. | ||
| """Canonical transformation from Stim detector samples to pre-decoder inputs. | ||
|
|
||
| This is the single source of truth for converting Stim detector bits into the |
There was a problem hiding this comment.
The next paragraph about a separate implementation seems to contradict this "single source of truth" claim. Why can't we use the code that already exists for this? (E.g. PreDecoderMemoryEvalModule._batch_to_trainx_and_syndromes)
| return str(value).rjust(width) | ||
|
|
||
|
|
||
| print("\nOffline Stim inference summary") |
There was a problem hiding this comment.
This whole block reproduces the table that code/evaluation/inference.py:196-212 already prints — same fields, same X/Z/Avg layout, slightly different formatting. The two will silently drift the next time someone updates one side. The structured [Inference Summary] JSON marker is the contract worth preserving; the human-readable table doesn't need a second canonical copy. Could the smoketest just tail / grep the inference.py output that's already in the same log, or print only the fields that the smoketest specifically wants to highlight (e.g. just the avg LER and speedup) so it's clearly an additional line and not a replacement?
| self.assertTrue(torch.equal(z_syn, expected_z)) | ||
| self.assertTrue(torch.equal(train_x, expected_train_x)) | ||
|
|
||
| def test_canonical_helper_matches_eval_module_batch_transform(self): |
There was a problem hiding this comment.
I believe the need for this test would go away if you addressed my previous code duplication comment.
| self.assertTrue(torch.equal(z_syn_m.to(torch.int32), expected_z_syn)) | ||
| self.assertTrue(torch.allclose(train_x_m, expected_train_x, atol=0.0, rtol=0.0)) | ||
|
|
||
| def test_in_memory_datapipe_matches_canonical_helper_on_its_own_dets(self): |
There was a problem hiding this comment.
I believe the need for this test would go away if you addressed my previous code duplication comment.
| """Independent oracle: build (x_syn_diff, z_syn_diff, trainX) from raw | ||
| Stim measurements via XOR-differencing. | ||
|
|
||
| This is the *fourth* implementation by design: it is the slowest, simplest |
There was a problem hiding this comment.
Reducing code duplication would make this the second, not the fourth (I think).
| This is the *fourth* implementation by design: it is the slowest, simplest | |
| This is the *second* implementation by design: it is the slowest, simplest |
| _SUPPORTED_FORMATS = {"dets"} | ||
|
|
||
| #: Current Stim sample metadata schema version. | ||
| SCHEMA_VERSION = 2 |
There was a problem hiding this comment.
schema_version is written here (and in build_stim_sample_metadata) but read nowhere — validate_stim_sample_metadata does not inspect it, and a metadata file claiming "schema_version": 1 or 99 would load identically. Two options that would make this field load-bearing:
- Drop it — fingerprinted noise + structural checks already cover compatibility, so the version field is just visual noise.
- Actually validate: reject
schema_version > SCHEMA_VERSION, and document what (if anything) v1 looked like so older files can be handled deliberately.
As written, it looks like a check that exists but isn't enforced.
There was a problem hiding this comment.
Verified in 369f56f — schema_version is load-bearing now: missing/legacy defaults to v1, and non-int / bool / > SCHEMA_VERSION values are rejected. Thanks.
| idx_x = torch.cat([sentinel, x_bulk_idx[:-1], sentinel]) if T > 1 \ | ||
| else torch.cat([sentinel, sentinel]) | ||
|
|
||
| if T == 1: |
There was a problem hiding this comment.
The block at lines 121-130 unconditionally assigns idx_x/idx_z (including a torch.cat([sentinel, sentinel]) fallback for T==1, which has length 2). Then this block overwrites them for T==1. So the prior branch is not just dead for T==1 — it produces wrong-length tensors that would fail torch.index_select if they ever leaked through.
Cleaner to short-circuit:
if T == 1:
if basis_upper == 'X':
idx_x, idx_z = zero_idx, sentinel
else:
idx_z, idx_x = zero_idx, sentinel
else:
if basis_upper == 'X':
idx_x = torch.cat([zero_idx, x_bulk_idx])
idx_z = torch.cat([sentinel, z_bulk_idx[:-1], sentinel])
else:
idx_z = torch.cat([zero_idx, z_bulk_idx])
idx_x = torch.cat([sentinel, x_bulk_idx[:-1], sentinel])Same semantics, no throwaway tensors, no malformed intermediates.
There was a problem hiding this comment.
Verified in 369f56f — short-circuited as suggested, no more wrong-length intermediates on the T==1 path. Thanks.
- predecoder_transform.py: short-circuit the T==1 / T>=2 detector-index selection so the T==1 path no longer materializes a wrong-length intermediate. Per @ivanbasov. - stim_sample_io.py: make schema_version load-bearing — reject values newer than SCHEMA_VERSION (and non-int garbage); default missing key to v1 so legacy files keep loading. Per @ivanbasov. - offline_smoketest.sh: stop reprinting the per-basis table that code/evaluation/inference.py already emits. Keep the [Inference Summary] JSON-marker parse and print one headline line for CI scrapers. README example updated. Per @bmhowe23. Signed-off-by: kvmto <kmato@nvidia.com>
Extract the buffer-fused dets -> (trainX, x_syn_diff, z_syn_diff)
transform into `_predecoder_transform_core` in
`code/data/predecoder_transform.py`. Both the file-datapipe entry point
(`dets_to_predecoder_inputs`) and the GPU/ONNX export path
(`PreDecoderMemoryEvalModule._batch_to_trainx_and_syndromes`) now
delegate to it.
- The helper builds the perms/grids ad-hoc per call from
`compute_stab*_to_data_index_map` and `normalized_weight_mapping_*`,
then casts syndromes to int32 at the return boundary to preserve its
public contract.
- The eval module forwards its pre-registered buffers straight through;
`__init__` is unchanged. The ONNX export graph is byte-identical
before and after (383 nodes / 34697 bytes for X, 388 nodes / 35074
bytes for Z).
Drops the two parity tests that locked the previous duplicates
together and corrects the cross-check oracle's docstring count
("fourth" -> "second"). The `predecoder_transform.py` module docstring
no longer claims a separate GPU implementation; it points at the
shared core.
Signed-off-by: kvmto <kmato@nvidia.com>
Append ++model_id=${MODEL_ID} to the ising_decoding_pymatching phase's
EXTRA_PARAMS when MODEL_ID is set, so running the smoketest with the
Accurate checkpoint (R=13) no longer requires manually overriding
hydra params.
Signed-off-by: kvmto <kmato@nvidia.com>
ivanbasov
left a comment
There was a problem hiding this comment.
Re-reviewed after the latest push. The schema_version and T==1 fixes are both clean (replied on those threads), and the eval-module dedup in 9059d78 is the right call — nicely done verifying the ONNX graph is byte-identical.
One concern remains about the second deleted test — details inline.
| """Independent oracle: build (x_syn_diff, z_syn_diff, trainX) from raw | ||
| Stim measurements via XOR-differencing. | ||
|
|
||
| This is the *second* implementation by design: it is the slowest, simplest |
There was a problem hiding this comment.
This count is now off, and it points at a test I don't think should have been dropped.
The dedup in 9059d78 correctly merged PreDecoderMemoryEvalModule._batch_to_trainx_and_syndromes into _predecoder_transform_core, so dropping test_canonical_helper_matches_eval_module_batch_transform is fine — those two paths share code now.
But the in-memory inference datapipe QCDataPipePreDecoder_Memory_inference._precompute_transformations (code/data/datapipe_stim.py:197-256, plus the _X/_Z variants) is a separate implementation of this same dets → (trainX, x_syn_diff, z_syn_diff) transform — raw-measurement XOR-differencing. It predates this PR, was never deduplicated, and is a production path (code/data/factory.py:137 instantiates it for inference).
So there are still three independent implementations (canonical core, in-memory XOR-diff datapipe, this oracle), which makes the oracle the third, not the second. More importantly, test_in_memory_datapipe_matches_canonical_helper_on_its_own_dets was the only thing asserting the new file/canonical path agrees with the in-memory path — it wasn't a duplicate-locking test in @bmhowe23's sense, it locked two genuinely independent implementations together. With it gone, drift between in-memory and file inference would go uncaught.
Could we restore that test (it's CPU-only and fast) and bump this docstring back to third? Folding the in-memory datapipe into the shared core too would be cleaner, but that's arguably a separate PR.
There was a problem hiding this comment.
Could we restore that test (it's CPU-only and fast) and bump this docstring back to third? Folding the in-memory datapipe into the shared core too would be cleaner, but that's arguably a separate PR.
This sounds like a good plan to me. Fixing pre-existing duplicated code should be handled be a future PR, but if one of the tests that I proposed deleting (test_in_memory_datapipe_matches_canonical_helper_on_its_own_dets) is actually still needed, then restoring it sounds good to me.
Summary
Adds a file-based path that decodes Stim detector samples (
samples_{X,Z}.dets+metadata_{X,Z}.json) with the same Ising pre-decoder + PyMatching pipeline used in-memory, plus a generator workflow for reproducible reference inputs.What's new
qec/surface_code/stim_sample_io.py— on-disk Stim sample contract, with structural and optional noise-fingerprint validation against a rebuilt memory circuit before decoding.data/predecoder_transform.py— canonical Stim detectors →(trainX, x_syn_diff, z_syn_diff)transform; shared by the file-based datapipe and the buffer-fused GPU module. A parity test intests/test_offline_stim_decoding.pyenforces algorithmic equality.workflows/run.py— newworkflow.task=generate_stim_dataandPREDECODER_DECODE_MODEcontrols:pymatching_only(baseline; loadstorch.nn.Identity(), no checkpoint required) andising_decoding_pymatching(Ising pre-decoder + PyMatching).scripts/offline_smoketest.sh+tests/test_offline_stim_decoding.py— end-to-end smoke and IO/transform/decoder coverage.README.md+cookbook/predecoder.ipynb— file contract docs, BYO-samples instructions, generator usage, strict-validation semantics.Plumbing through
data/datapipe_stim.py,data/factory.py,evaluation/{inference,logical_error_rate}.py,export/generate_test_data.py,scripts/local_run.sh; opt-in[Inference Summary]JSON marker viaPREDECODER_EMIT_INFERENCE_SUMMARY=1for downstream tooling.Validation behaviour
p_error,noise_model_sha256) are fatal by default; downgrade to warnings withPREDECODER_STIM_STRICT_NOISE=0(for p_error sweeps / calibration studies).Smoke test
Ran end-to-end against the shipped checkpoints at both supported windows. Numbers line up with the reference example in
README.md.d=7, n_rounds=7, O1, Fast (R=9, model_id=1), 262,144 shots/basis
d=13, n_rounds=13, O1, Accurate (R=13, model_id=4), 262,144 shots/basis
Test plan
bash code/scripts/offline_smoketest.sh(d=7, Fast)[Inference Summary]JSON marker parses for both decode modesunit-tests(covers newtest_offline_stim_decoding.py)spdx-header-checkon all 4 new files