Skip to content

fix(backend): thumbnail collisions, cancellable indexing, assert hardening#265

Merged
lstein merged 2 commits into
masterfrom
lstein/fix/backend-fragility
May 22, 2026
Merged

fix(backend): thumbnail collisions, cancellable indexing, assert hardening#265
lstein merged 2 commits into
masterfrom
lstein/fix/backend-fragility

Conversation

@lstein
Copy link
Copy Markdown
Owner

@lstein lstein commented May 22, 2026

Three fragility items, batched

Each is small but they share router / helper touch points.

1. Thumbnail cache filename collisions

The cache filename was `Path(relative_path.replace("/", "")).stem + ".png"`. Two real collision modes:

  • `/a/b.jpg` and `/a_b.jpg` both mangled to `_a_b` → same cache file.
  • `cat.png` and `cat.jpg` both had stem `cat` → same cache file.

In either case the thumbnail of one image could be served for another. Switched to `hashlib.blake2b(relative_path.encode(), digest_size=16).hexdigest()` — 128-bit digest, collision probability effectively zero, and the full path including extension is hashed so same-stem files don't alias. Existing cache files become orphaned on first request and are regenerated.

2. `cancel_index_operation` actually cancels now

The old implementation called `progress_tracker.set_error(...)` — which flipped the status the frontend polls but didn't stop the background indexing loop. The encoder kept burning GPU through the rest of the dataset and (in the create-new-index path) wrote the index file at the end.

New cooperative-cancel scheme:

  • `ProgressTracker.request_cancel` / `is_cancel_requested` — set of pending album_keys, cleared by the next `start_operation` for the same key.
  • `IndexingCancelled` exception in `progress.py`.
  • `_process_images_batch_async`'s per-image `progress_cb` polls the flag and raises at the next batch boundary — cancel-to-stop latency well under a second on a typical GPU.
  • `_update_index_background_async` catches `IndexingCancelled` cleanly (logs at INFO, not ERROR) so the background-task summary doesn't look like a crash.
  • The cancel endpoint flips the status text immediately so the frontend's next poll sees the cancel even before the next batch.

The status text is now "Indexing cancelled by user" instead of "Operation cancelled by user" — matches the language the inner layer uses on the exception path.

3. Assert hardening

Five `assert`s converted to explicit raises so they survive `python -O`:

Site Before After
`embeddings.py:1011` (`update_index`) `assert exists` `raise FileNotFoundError`
`embeddings.py:1087` (`update_index_async`) `assert exists` `raise FileNotFoundError`
`embeddings.py:1406` (`find_duplicate_clusters`) `assert isinstance(ndarray)` `raise TypeError`
`invoke_metadata_view.py:137` (`_v5_control_layer_to_tuple`) `assert ca is not None` `raise ValueError`
`filetree.py:96` (`get_directories`) `assert ROOT_DIR is not None` `raise RuntimeError`
`search.py:221` (`serve_thumbnail`) `assert relative_path is not None` `raise HTTPException(500, ...)`

Test plan

  • `ruff check photomap tests` — clean
  • `pytest tests/backend` — 256 passed
  • Manual: create two files `a/b.jpg` and `a_b.jpg` in the same album with different content; request both thumbnails; confirm they look different (was previously a cache-poisoning bug returning the same image).
  • Manual: start indexing a large album; press Cancel mid-way; confirm `nvidia-smi` shows GPU utilization drop within ~1 second and that no `embeddings.npz` is written for that album.
  • Manual: run with `PYTHONOPTIMIZE=1` and exercise the assert paths (e.g. delete the `embeddings.npz` while a re-index is queued); confirm errors surface as proper exceptions instead of silently passing.

Net +80 lines across 6 files.

🤖 Generated with Claude Code

lstein and others added 2 commits May 21, 2026 21:30
…ening

Three fragility items from the code review, batched because each is
small but they share a router-or-helper touch point.

## Thumbnail cache filename collisions

Previously the cache filename was derived from
``relative_path.replace("/", "_").replace("\\", "_")`` plus ``Path(...).stem``.
Two real collision modes:

* ``/a/b.jpg`` and ``/a_b.jpg`` mangled to the same ``_a_b`` stem.
* ``cat.png`` and ``cat.jpg`` shared the same stem ``cat``.

Either case meant the thumbnail of one image could be served for another.
Switch to ``hashlib.blake2b(relative_path.encode(), digest_size=16).hexdigest()`` —
collision probability for 128 bits of digest is effectively zero, and the
full original path (including extension) is hashed so same-stem files
don't alias. Existing cache files become orphaned on first request and
are regenerated; no data loss.

## ``cancel_index_operation`` actually cancels now

The old implementation called ``progress_tracker.set_error(..., "Operation
cancelled by user")`` — which flipped the status the frontend polls but
didn't stop the background indexing loop. The encoder kept burning GPU
through the rest of the dataset and (in the create-new-index path) wrote
the index file at the end.

This change introduces cooperative cancellation:

* ``ProgressTracker.request_cancel`` / ``is_cancel_requested`` — a set
  of pending album_keys, cleared by the next ``start_operation`` for
  the same key.
* New ``IndexingCancelled`` exception in ``progress.py``.
* ``_process_images_batch_async``'s per-image ``progress_cb`` polls
  the flag and raises ``IndexingCancelled`` at the next batch boundary
  — cancel-to-stop latency well under a second on any sane GPU.
* ``_update_index_background_async`` catches ``IndexingCancelled``
  cleanly (logs at INFO, not ERROR) so the background task summary
  doesn't look like a crash.
* The cancel endpoint flips the status text immediately so the
  frontend's next poll sees the cancel even before the next batch.

The status text the user sees is now "Indexing cancelled by user"
instead of "Operation cancelled by user"; matches the language the
inner layer uses on the exception path.

## Assert hardening

Five ``assert``s converted to explicit raises so they survive
``python -O`` optimization:

* ``embeddings.py:1011`` / ``:1087`` — preconditions on
  ``update_index`` / ``update_index_async`` that ``self.embeddings_path``
  exists. Now ``raise FileNotFoundError`` with the same helpful message.
* ``embeddings.py:1406`` — defensive ``isinstance(np.ndarray)`` check
  after ``_l2_normalize``. Now ``raise TypeError`` with the actual type.
* ``invoke_metadata_view.py:137`` — ``_v5_control_layer_to_tuple``'s
  trust-the-caller assert that ``layer.control_adapter`` isn't None.
  Now ``raise ValueError`` — the caller filter (in ``_V5Strategy``)
  still drops None layers first, so the raise is only reachable if
  that contract is broken.
* ``filetree.py:96`` — type-narrowing assert that ``ROOT_DIR`` is set
  on non-Windows. Now ``raise RuntimeError`` with an explicit message
  about ``PHOTOMAP_ALBUM_ROOT``.
* ``search.py:221`` — ``relative_path is not None`` from
  ``get_relative_path``. Now ``raise HTTPException(500, ...)`` since
  it's a server-side mis-configuration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lstein lstein merged commit ba3412d into master May 22, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant