Skip to content

test(code-image): refresh fixtures and assertions to md5-tree-v2#554

Open
FileSystemGuy wants to merge 1 commit into
mainfrom
fix/group-C-md5-tree-v2-fixtures
Open

test(code-image): refresh fixtures and assertions to md5-tree-v2#554
FileSystemGuy wants to merge 1 commit into
mainfrom
fix/group-C-md5-tree-v2-fixtures

Conversation

@FileSystemGuy

Copy link
Copy Markdown
Contributor

Summary

Commit `d8b8f0c` (issue #505 fix) bumped the code-tree hash algorithm identifier from `md5-tree-v1` to `md5-tree-v2` in `submission_checker.tools.code_image._ALGORITHM` so stale pre-#512 captures surface as `MalformedHashFile`. Production rejects any other value as "Unknown algorithm" before checking other fields (code_image.py:470).

Two test files still encoded v1, producing 20 failures across them:

  • `test_code_image.py`

    • `test_load_happy_path` asserted `img.algorithm == "md5-tree-v1"`
    • `test_schema_invariants` asserted `payload["algorithm"] == "md5-tree-v1"`
    • `test_load_malformed_json_raises[Invalid MD5 hash format]` payload's algorithm was `md5-tree-v1`, so production raised "Unknown algorithm" first and the test never reached the hash-format check it was meant to exercise. Same for `[Invalid captured_at]` and `[Invalid git_sha]`. All flipped to `md5-tree-v2` so the intended field-validation branch fires.
    • The `[Unknown algorithm]` parameter (algorithm=`"v2"`) is preserved — it IS testing the unknown-algorithm branch.
  • `test_submission_checker_structure.py`

    • `_write_valid_hash_json` fixture wrote algorithm `md5-tree-v1`; every layered self-consistency test downstream depends on that fixture being valid. Flipping the one fixture string to `md5-tree-v2` unblocks all 15 failures in that file.

Test plan

  • `uv run python -m pytest mlpstorage_py/tests/test_code_image.py mlpstorage_py/tests/test_submission_checker_structure.py -q` → 131 passed (was 116 passed + 20 failed)

Commit d8b8f0c (PR for issue #505) bumped the code-tree hash algorithm
identifier from md5-tree-v1 to md5-tree-v2 so stale pre-#512 captures
surface as MalformedHashFile. Production reads from
submission_checker.tools.code_image._ALGORITHM = "md5-tree-v2" and
load_code_image() rejects any other value with "Unknown algorithm"
*before* checking other fields (see code_image.py:470).

Two test files still encoded v1:

* test_code_image.py
  - test_load_happy_path: asserted img.algorithm == "md5-tree-v1"
  - test_schema_invariants: asserted payload["algorithm"] == "md5-tree-v1"
  - test_load_malformed_json_raises[Invalid MD5 hash format]: payload's
    algorithm was "md5-tree-v1", so production raised "Unknown algorithm"
    first and the test never reached the hash-format check it was meant
    to exercise. Same for the [Invalid captured_at] and [Invalid git_sha]
    parameter sets — all need the current algorithm so the *specific*
    field-validation path under test fires.
  - The [Unknown algorithm] case (algorithm="v2") is preserved as-is —
    that one IS testing the unknown-algorithm branch.

* test_submission_checker_structure.py
  - _write_valid_hash_json fixture wrote algorithm "md5-tree-v1", which
    every layered self-consistency test depends on for a "valid" .code-
    hash.json. All 15 downstream failures resolve once this single
    fixture string flips to v2.

Result: 116 passed + 20 failed → 131 passed across both files.
@FileSystemGuy FileSystemGuy requested a review from a team June 26, 2026 22:50
@github-actions

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant