Skip to content

Add offline decoding from Stim detector-sample files#83

Open
kvmto wants to merge 5 commits into
NVIDIA:mainfrom
kvmto:feat/offline-stim-decoding
Open

Add offline decoding from Stim detector-sample files#83
kvmto wants to merge 5 commits into
NVIDIA:mainfrom
kvmto:feat/offline-stim-decoding

Conversation

@kvmto
Copy link
Copy Markdown
Collaborator

@kvmto kvmto commented May 28, 2026

Summary

Adds a file-based path that decodes Stim detector samples (samples_{X,Z}.dets + metadata_{X,Z}.json) with the same Ising pre-decoder + PyMatching pipeline used in-memory, plus a generator workflow for reproducible reference inputs.

What's new

  • qec/surface_code/stim_sample_io.py — on-disk Stim sample contract, with structural and optional noise-fingerprint validation against a rebuilt memory circuit before decoding.
  • data/predecoder_transform.py — canonical Stim detectors → (trainX, x_syn_diff, z_syn_diff) transform; shared by the file-based datapipe and the buffer-fused GPU module. A parity test in tests/test_offline_stim_decoding.py enforces algorithmic equality.
  • workflows/run.py — new workflow.task=generate_stim_data and PREDECODER_DECODE_MODE controls: pymatching_only (baseline; loads torch.nn.Identity(), no checkpoint required) and ising_decoding_pymatching (Ising pre-decoder + PyMatching).
  • scripts/offline_smoketest.sh + tests/test_offline_stim_decoding.py — end-to-end smoke and IO/transform/decoder coverage.
  • README.md + cookbook/predecoder.ipynb — file contract docs, BYO-samples instructions, generator usage, strict-validation semantics.

Plumbing through data/datapipe_stim.py, data/factory.py, evaluation/{inference,logical_error_rate}.py, export/generate_test_data.py, scripts/local_run.sh; opt-in [Inference Summary] JSON marker via PREDECODER_EMIT_INFERENCE_SUMMARY=1 for downstream tooling.

Validation behaviour

  • Structural mismatches (distance, rounds, basis, orientation, detector count, observable presence) are always fatal.
  • Noise mismatches (p_error, noise_model_sha256) are fatal by default; downgrade to warnings with PREDECODER_STIM_STRICT_NOISE=0 (for p_error sweeps / calibration studies).
  • Files written before the noise-fingerprint fields existed continue to load unchanged.

Smoke test

Ran end-to-end against the shipped checkpoints at both supported windows. Numbers line up with the reference example in README.md.

d=7, n_rounds=7, O1, Fast (R=9, model_id=1), 262,144 shots/basis

No pre-decoder After pre-decoder
LER X 0.002682 0.002254
LER Z 0.002678 0.002281
LER Avg 0.002680 0.002268
PyMatching latency Avg (µs/round) 0.807 0.445
PyMatching speedup 1.813×

d=13, n_rounds=13, O1, Accurate (R=13, model_id=4), 262,144 shots/basis

No pre-decoder After pre-decoder
LER X 0.000278 0.000263
LER Z 0.000244 0.000172
LER Avg 0.000261 0.000217
PyMatching latency Avg (µs/round) 2.183 0.976
PyMatching speedup 2.237×

Test plan

  • bash code/scripts/offline_smoketest.sh (d=7, Fast)
  • Manual run at d=13 with Accurate model
  • [Inference Summary] JSON marker parses for both decode modes
  • Metadata fingerprint cross-check fires on rebuild (no false negatives on the matched circuits)
  • CI: unit-tests (covers new test_offline_stim_decoding.py)
  • CI: spdx-header-check on all 4 new files

Introduces a file-based path that decodes Stim detector samples (samples_{X,Z}.dets + metadata_{X,Z}.json) with the same Ising pre-decoder + PyMatching pipeline used in-memory, and a generator workflow for reproducible reference inputs.

New surface area:
- qec/surface_code/stim_sample_io.py: on-disk Stim sample contract, structural + optional noise-fingerprint validation against a rebuilt memory circuit.
- data/predecoder_transform.py: canonical Stim detectors -> (trainX, x_syn_diff, z_syn_diff); shared by file-based datapipe and the buffer-fused GPU module (parity test enforces this).
- workflows/run.py: workflow.task=generate_stim_data and PREDECODER_DECODE_MODE={pymatching_only, ising_decoding_pymatching}.
- scripts/offline_smoketest.sh + tests/test_offline_stim_decoding.py: end-to-end smoke + IO/transform/decoder coverage.
- README + cookbook: BYO-samples contract, generator usage, strict-validation semantics.

Plumbing in data/datapipe_stim.py, data/factory.py, evaluation/{inference,logical_error_rate}.py, export/generate_test_data.py, scripts/local_run.sh; opt-in [Inference Summary] JSON marker via PREDECODER_EMIT_INFERENCE_SUMMARY=1.

Smoke-tested at d=7/n_rounds=7 (Fast, R=9) and d=13/n_rounds=13 (Accurate, R=13); LERs and PyMatching speedup match the reference example in README.md.

Signed-off-by: kvmto <kmato@nvidia.com>
@kvmto kvmto requested review from bmhowe23 and ivanbasov May 28, 2026 15:32
Run yapf --style=.style.yapf (based_on_style=google, column_limit=100, dedent_closing_brackets=true, split_before_closing_bracket=true) over the Python files touched by the previous commit. No semantic changes.

Signed-off-by: kvmto <kmato@nvidia.com>
Copy link
Copy Markdown
Collaborator

@bmhowe23 bmhowe23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Kevin. Most of my comments are about code duplication.

Comment thread code/data/predecoder_transform.py Outdated
# limitations under the License.
"""Canonical transformation from Stim detector samples to pre-decoder inputs.

This is the single source of truth for converting Stim detector bits into the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next paragraph about a separate implementation seems to contradict this "single source of truth" claim. Why can't we use the code that already exists for this? (E.g. PreDecoderMemoryEvalModule._batch_to_trainx_and_syndromes)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in 9059d78.

Comment thread code/scripts/offline_smoketest.sh Outdated
return str(value).rjust(width)


print("\nOffline Stim inference summary")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole block reproduces the table that code/evaluation/inference.py:196-212 already prints — same fields, same X/Z/Avg layout, slightly different formatting. The two will silently drift the next time someone updates one side. The structured [Inference Summary] JSON marker is the contract worth preserving; the human-readable table doesn't need a second canonical copy. Could the smoketest just tail / grep the inference.py output that's already in the same log, or print only the fields that the smoketest specifically wants to highlight (e.g. just the avg LER and speedup) so it's clearly an additional line and not a replacement?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in 369f56f.

self.assertTrue(torch.equal(z_syn, expected_z))
self.assertTrue(torch.equal(train_x, expected_train_x))

def test_canonical_helper_matches_eval_module_batch_transform(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the need for this test would go away if you addressed my previous code duplication comment.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in 9059d78.

self.assertTrue(torch.equal(z_syn_m.to(torch.int32), expected_z_syn))
self.assertTrue(torch.allclose(train_x_m, expected_train_x, atol=0.0, rtol=0.0))

def test_in_memory_datapipe_matches_canonical_helper_on_its_own_dets(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the need for this test would go away if you addressed my previous code duplication comment.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in 9059d78.

"""Independent oracle: build (x_syn_diff, z_syn_diff, trainX) from raw
Stim measurements via XOR-differencing.

This is the *fourth* implementation by design: it is the slowest, simplest
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reducing code duplication would make this the second, not the fourth (I think).

Suggested change
This is the *fourth* implementation by design: it is the slowest, simplest
This is the *second* implementation by design: it is the slowest, simplest

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in 9059d78.

_SUPPORTED_FORMATS = {"dets"}

#: Current Stim sample metadata schema version.
SCHEMA_VERSION = 2
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema_version is written here (and in build_stim_sample_metadata) but read nowhere — validate_stim_sample_metadata does not inspect it, and a metadata file claiming "schema_version": 1 or 99 would load identically. Two options that would make this field load-bearing:

  1. Drop it — fingerprinted noise + structural checks already cover compatibility, so the version field is just visual noise.
  2. Actually validate: reject schema_version > SCHEMA_VERSION, and document what (if anything) v1 looked like so older files can be handled deliberately.

As written, it looks like a check that exists but isn't enforced.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in 369f56f.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified in 369f56fschema_version is load-bearing now: missing/legacy defaults to v1, and non-int / bool / > SCHEMA_VERSION values are rejected. Thanks.

Comment thread code/data/predecoder_transform.py Outdated
idx_x = torch.cat([sentinel, x_bulk_idx[:-1], sentinel]) if T > 1 \
else torch.cat([sentinel, sentinel])

if T == 1:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block at lines 121-130 unconditionally assigns idx_x/idx_z (including a torch.cat([sentinel, sentinel]) fallback for T==1, which has length 2). Then this block overwrites them for T==1. So the prior branch is not just dead for T==1 — it produces wrong-length tensors that would fail torch.index_select if they ever leaked through.

Cleaner to short-circuit:

if T == 1:
    if basis_upper == 'X':
        idx_x, idx_z = zero_idx, sentinel
    else:
        idx_z, idx_x = zero_idx, sentinel
else:
    if basis_upper == 'X':
        idx_x = torch.cat([zero_idx, x_bulk_idx])
        idx_z = torch.cat([sentinel, z_bulk_idx[:-1], sentinel])
    else:
        idx_z = torch.cat([zero_idx, z_bulk_idx])
        idx_x = torch.cat([sentinel, x_bulk_idx[:-1], sentinel])

Same semantics, no throwaway tensors, no malformed intermediates.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in 369f56f.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified in 369f56f — short-circuited as suggested, no more wrong-length intermediates on the T==1 path. Thanks.

kvmto added 3 commits May 29, 2026 10:45
- predecoder_transform.py: short-circuit the T==1 / T>=2 detector-index
  selection so the T==1 path no longer materializes a wrong-length
  intermediate. Per @ivanbasov.

- stim_sample_io.py: make schema_version load-bearing — reject values
  newer than SCHEMA_VERSION (and non-int garbage); default missing key
  to v1 so legacy files keep loading. Per @ivanbasov.

- offline_smoketest.sh: stop reprinting the per-basis table that
  code/evaluation/inference.py already emits. Keep the [Inference
  Summary] JSON-marker parse and print one headline line for CI
  scrapers. README example updated. Per @bmhowe23.

Signed-off-by: kvmto <kmato@nvidia.com>
Extract the buffer-fused dets -> (trainX, x_syn_diff, z_syn_diff)
transform into `_predecoder_transform_core` in
`code/data/predecoder_transform.py`. Both the file-datapipe entry point
(`dets_to_predecoder_inputs`) and the GPU/ONNX export path
(`PreDecoderMemoryEvalModule._batch_to_trainx_and_syndromes`) now
delegate to it.

- The helper builds the perms/grids ad-hoc per call from
  `compute_stab*_to_data_index_map` and `normalized_weight_mapping_*`,
  then casts syndromes to int32 at the return boundary to preserve its
  public contract.
- The eval module forwards its pre-registered buffers straight through;
  `__init__` is unchanged. The ONNX export graph is byte-identical
  before and after (383 nodes / 34697 bytes for X, 388 nodes / 35074
  bytes for Z).

Drops the two parity tests that locked the previous duplicates
together and corrects the cross-check oracle's docstring count
("fourth" -> "second"). The `predecoder_transform.py` module docstring
no longer claims a separate GPU implementation; it points at the
shared core.

Signed-off-by: kvmto <kmato@nvidia.com>
Append ++model_id=${MODEL_ID} to the ising_decoding_pymatching phase's
EXTRA_PARAMS when MODEL_ID is set, so running the smoketest with the
Accurate checkpoint (R=13) no longer requires manually overriding
hydra params.

Signed-off-by: kvmto <kmato@nvidia.com>
Copy link
Copy Markdown
Collaborator

@ivanbasov ivanbasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after the latest push. The schema_version and T==1 fixes are both clean (replied on those threads), and the eval-module dedup in 9059d78 is the right call — nicely done verifying the ONNX graph is byte-identical.

One concern remains about the second deleted test — details inline.

"""Independent oracle: build (x_syn_diff, z_syn_diff, trainX) from raw
Stim measurements via XOR-differencing.

This is the *second* implementation by design: it is the slowest, simplest
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This count is now off, and it points at a test I don't think should have been dropped.

The dedup in 9059d78 correctly merged PreDecoderMemoryEvalModule._batch_to_trainx_and_syndromes into _predecoder_transform_core, so dropping test_canonical_helper_matches_eval_module_batch_transform is fine — those two paths share code now.

But the in-memory inference datapipe QCDataPipePreDecoder_Memory_inference._precompute_transformations (code/data/datapipe_stim.py:197-256, plus the _X/_Z variants) is a separate implementation of this same dets → (trainX, x_syn_diff, z_syn_diff) transform — raw-measurement XOR-differencing. It predates this PR, was never deduplicated, and is a production path (code/data/factory.py:137 instantiates it for inference).

So there are still three independent implementations (canonical core, in-memory XOR-diff datapipe, this oracle), which makes the oracle the third, not the second. More importantly, test_in_memory_datapipe_matches_canonical_helper_on_its_own_dets was the only thing asserting the new file/canonical path agrees with the in-memory path — it wasn't a duplicate-locking test in @bmhowe23's sense, it locked two genuinely independent implementations together. With it gone, drift between in-memory and file inference would go uncaught.

Could we restore that test (it's CPU-only and fast) and bump this docstring back to third? Folding the in-memory datapipe into the shared core too would be cleaner, but that's arguably a separate PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we restore that test (it's CPU-only and fast) and bump this docstring back to third? Folding the in-memory datapipe into the shared core too would be cleaner, but that's arguably a separate PR.

This sounds like a good plan to me. Fixing pre-existing duplicated code should be handled be a future PR, but if one of the tests that I proposed deleting (test_in_memory_datapipe_matches_canonical_helper_on_its_own_dets) is actually still needed, then restoring it sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants