Skip to content

ds4_test: make metal-tensor-equivalence test deterministic on CUDA/ROCm#308

Open
alantsev wants to merge 1 commit into
antirez:mainfrom
alantsev:ds4_test_metal
Open

ds4_test: make metal-tensor-equivalence test deterministic on CUDA/ROCm#308
alantsev wants to merge 1 commit into
antirez:mainfrom
alantsev:ds4_test_metal

Conversation

@alantsev
Copy link
Copy Markdown
Contributor

@alantsev alantsev commented May 31, 2026

make metal-tensor-equivalence test deterministic to avoid flaky test runs

Keep ./ds4_test --metal-tensor-equivalence execution deterministic

$ ./ds4_test --metal-tensor-equivalence

metal-tensor-equivalence:
ds4: CUDA backend initialized on AMD Radeon 8060S Graphics (sm_115)
ds4: CUDA registered 80.76 GiB model mapping for device access
ds4: CUDA preparing model tensor mappings: 80.24 GiB
ds4: CUDA q8 fp16 cache limit reached; using q8 kernels (request=64.00 MiB cached=7.94 GiB limit=8.00 GiB)
ds4: CUDA startup model preparation covered 80.76 GiB of tensor spans in 0.314s
ds4: cuda backend initialized for graph diagnostics
ds4-test: Tensor equivalence candidate route=auto
ds4: CUDA backend initialized on AMD Radeon 8060S Graphics (sm_115)
ds4: CUDA registered 80.76 GiB model mapping for device access
ds4: CUDA preparing model tensor mappings: 80.24 GiB
ds4: CUDA q8 fp16 cache limit reached; using q8 kernels (request=64.00 MiB cached=7.94 GiB limit=8.00 GiB)
ds4: CUDA startup model preparation covered 80.76 GiB of tensor spans in 0.346s
ds4: cuda backend initialized for graph diagnostics
ds4-test: Tensor equivalence short_italian_fact top1 ref=108149 cand=108149 top5_overlap=5/5 overlap=20/20 max_rank_delta=0 rms=0 max_abs=0 top20_max_abs=0
ds4-test: Tensor equivalence short_italian_fact largest deltas: id=0 ref=-15.0155 cand=-15.0155 abs=0 id=1 ref=19.9358 cand=19.9358 abs=0 id=2 ref=-55.9084 cand=-55.9084 abs=0 id=3 ref=17.8982 cand=17.8982 abs=0 id=4 ref=26.0747 cand=26.0747 abs=0
ds4-test: Tensor equivalence short_code_completion top1 ref=9854 cand=9854 top5_overlap=5/5 overlap=20/20 max_rank_delta=0 rms=0 max_abs=0 top20_max_abs=0
ds4-test: Tensor equivalence short_code_completion largest deltas: id=0 ref=-2.66161 cand=-2.66161 abs=0 id=1 ref=21.3162 cand=21.3162 abs=0 id=2 ref=-45.7824 cand=-45.7824 abs=0 id=3 ref=10.9651 cand=10.9651 abs=0 id=4 ref=25.8229 cand=25.8229 abs=0
ds4-test: Tensor equivalence short_reasoning_plain top1 ref=926 cand=926 top5_overlap=5/5 overlap=20/20 max_rank_delta=0 rms=0 max_abs=0 top20_max_abs=0
ds4-test: Tensor equivalence short_reasoning_plain largest deltas: id=0 ref=-3.03494 cand=-3.03494 abs=0 id=1 ref=23.3849 cand=23.3849 abs=0 id=2 ref=-42.7991 cand=-42.7991 abs=0 id=3 ref=16.0927 cand=16.0927 abs=0 id=4 ref=18.5051 cand=18.5051 abs=0
ds4-test: Tensor equivalence long_memory_archive top1 ref=32111 cand=32111 top5_overlap=5/5 overlap=20/20 max_rank_delta=0 rms=0 max_abs=0 top20_max_abs=0
ds4-test: Tensor equivalence long_memory_archive largest deltas: id=0 ref=-8.42831 cand=-8.42831 abs=0 id=1 ref=19.284 cand=19.284 abs=0 id=2 ref=-50.653 cand=-50.653 abs=0 id=3 ref=10.6968 cand=10.6968 abs=0 id=4 ref=21.0302 cand=21.0302 abs=0
ds4-test: Tensor equivalence long_code_audit top1 ref=671 cand=671 top5_overlap=5/5 overlap=20/20 max_rank_delta=0 rms=0 max_abs=0 top20_max_abs=0
ds4-test: Tensor equivalence long_code_audit largest deltas: id=0 ref=-4.50487 cand=-4.50487 abs=0 id=1 ref=19.7669 cand=19.7669 abs=0 id=2 ref=-47.0626 cand=-47.0626 abs=0 id=3 ref=16.7405 cand=16.7405 abs=0 id=4 ref=23.0197 cand=23.0197 abs=0
ds4-test: Tensor summary route=auto cases=5 capture_fail=0 logits_fail=0 greedy_fail=0 top1_mismatch=0 min_top5_overlap=5/5 min_overlap=20/20 worst_rank_delta=0 worst_rms=0 worst_max_abs=0 worst_top20_max_abs=0
metal-tensor-equivalence: OK
ds4 tests: ok

With this change I got 100% reproducible test (I use ROCm version).

without this change I am getting ~6% failures for this test on a long context due to enabled non-deterministic atomics.
If the current behaviour is expected - i.e. we test non-determinism on CUDA/ROCm - we could increase the thresholds instead.

… flaky test runs

Keep ./ds4_test --metal-tensor-equivalence execution determenistic

```
$ ./ds4_test --metal-tensor-equivalence

metal-tensor-equivalence:
ds4: CUDA backend initialized on AMD Radeon 8060S Graphics (sm_115)
ds4: CUDA registered 80.76 GiB model mapping for device access
ds4: CUDA preparing model tensor mappings: 80.24 GiB
ds4: CUDA q8 fp16 cache limit reached; using q8 kernels (request=64.00 MiB cached=7.94 GiB limit=8.00 GiB)
ds4: CUDA startup model preparation covered 80.76 GiB of tensor spans in 0.314s
ds4: cuda backend initialized for graph diagnostics
ds4-test: Tensor equivalence candidate route=auto
ds4: CUDA backend initialized on AMD Radeon 8060S Graphics (sm_115)
ds4: CUDA registered 80.76 GiB model mapping for device access
ds4: CUDA preparing model tensor mappings: 80.24 GiB
ds4: CUDA q8 fp16 cache limit reached; using q8 kernels (request=64.00 MiB cached=7.94 GiB limit=8.00 GiB)
ds4: CUDA startup model preparation covered 80.76 GiB of tensor spans in 0.346s
ds4: cuda backend initialized for graph diagnostics
ds4-test: Tensor equivalence short_italian_fact top1 ref=108149 cand=108149 top5_overlap=5/5 overlap=20/20 max_rank_delta=0 rms=0 max_abs=0 top20_max_abs=0
ds4-test: Tensor equivalence short_italian_fact largest deltas: id=0 ref=-15.0155 cand=-15.0155 abs=0 id=1 ref=19.9358 cand=19.9358 abs=0 id=2 ref=-55.9084 cand=-55.9084 abs=0 id=3 ref=17.8982 cand=17.8982 abs=0 id=4 ref=26.0747 cand=26.0747 abs=0
ds4-test: Tensor equivalence short_code_completion top1 ref=9854 cand=9854 top5_overlap=5/5 overlap=20/20 max_rank_delta=0 rms=0 max_abs=0 top20_max_abs=0
ds4-test: Tensor equivalence short_code_completion largest deltas: id=0 ref=-2.66161 cand=-2.66161 abs=0 id=1 ref=21.3162 cand=21.3162 abs=0 id=2 ref=-45.7824 cand=-45.7824 abs=0 id=3 ref=10.9651 cand=10.9651 abs=0 id=4 ref=25.8229 cand=25.8229 abs=0
ds4-test: Tensor equivalence short_reasoning_plain top1 ref=926 cand=926 top5_overlap=5/5 overlap=20/20 max_rank_delta=0 rms=0 max_abs=0 top20_max_abs=0
ds4-test: Tensor equivalence short_reasoning_plain largest deltas: id=0 ref=-3.03494 cand=-3.03494 abs=0 id=1 ref=23.3849 cand=23.3849 abs=0 id=2 ref=-42.7991 cand=-42.7991 abs=0 id=3 ref=16.0927 cand=16.0927 abs=0 id=4 ref=18.5051 cand=18.5051 abs=0
ds4-test: Tensor equivalence long_memory_archive top1 ref=32111 cand=32111 top5_overlap=5/5 overlap=20/20 max_rank_delta=0 rms=0 max_abs=0 top20_max_abs=0
ds4-test: Tensor equivalence long_memory_archive largest deltas: id=0 ref=-8.42831 cand=-8.42831 abs=0 id=1 ref=19.284 cand=19.284 abs=0 id=2 ref=-50.653 cand=-50.653 abs=0 id=3 ref=10.6968 cand=10.6968 abs=0 id=4 ref=21.0302 cand=21.0302 abs=0
ds4-test: Tensor equivalence long_code_audit top1 ref=671 cand=671 top5_overlap=5/5 overlap=20/20 max_rank_delta=0 rms=0 max_abs=0 top20_max_abs=0
ds4-test: Tensor equivalence long_code_audit largest deltas: id=0 ref=-4.50487 cand=-4.50487 abs=0 id=1 ref=19.7669 cand=19.7669 abs=0 id=2 ref=-47.0626 cand=-47.0626 abs=0 id=3 ref=16.7405 cand=16.7405 abs=0 id=4 ref=23.0197 cand=23.0197 abs=0
ds4-test: Tensor summary route=auto cases=5 capture_fail=0 logits_fail=0 greedy_fail=0 top1_mismatch=0 min_top5_overlap=5/5 min_overlap=20/20 worst_rank_delta=0 worst_rms=0 worst_max_abs=0 worst_top20_max_abs=0
metal-tensor-equivalence: OK
ds4 tests: ok
```
@alantsev alantsev changed the title ds4_test: make metal-tensor-equivalence test determenistic - to avoid… ds4_test: make metal-tensor-equivalence test determenistic May 31, 2026
@alantsev alantsev changed the title ds4_test: make metal-tensor-equivalence test determenistic ds4_test: make metal-tensor-equivalence test deterministic May 31, 2026
@alantsev alantsev changed the title ds4_test: make metal-tensor-equivalence test deterministic ds4_test: make metal-tensor-equivalence test deterministic on CUDA/ROCm May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant