Feature/cluster representative images#42
Open
NetZissou wants to merge 4 commits into
Open
Conversation
Surfaces the members closest to each cluster centroid as a representative-image panel across both apps. This feature already exisits in the embed&explore app, now it's made available on the precalculated embeddings app. Adds a shared compute/render core and a reusable, thread-sfae image-fetching layer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…sentative-images # Conflicts: # apps/precalculated/app.py # shared/components/summary.py
[project].version (static, 1.0.0) is now the sole source of truth. Drop the dynamic [tool.hatch.version], and read __version__ from installed metadata via `importlib.metadata`. Also remove [tool.hatch.metadata] allow-direct-references, obsolete now that the hpc-inference git dependency is gone.
There was a problem hiding this comment.
Pull request overview
Adds shared cluster-representative image selection + rendering so both embed_explore and the precalculated app can display “closest-to-centroid” representative images per cluster, with robust fallback when images can’t be loaded.
Changes:
- Introduces
find_cluster_representatives()(pure ranking) and a shared Streamlit renderer for representative images. - Adds shared, app-agnostic URL image fetching + process cache utilities and wires representative rendering into the precalculated app.
- Declares
requestsas a direct dependency and single-sourcesshared.__version__from installed package metadata.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_representatives.py | Adds unit tests for representative ranking logic (centroid distance ordering, index correctness, oversample cap, label typing). |
| shared/utils/representatives.py | New shared utility to compute ranked representative candidate indices per cluster. |
| shared/utils/images.py | New shared URL image resolution/downloading/concurrent prefetch + process-level cache utilities. |
| shared/services/clustering_service.py | Switches clustering summary representative selection to the shared representative-ranking utility. |
| shared/components/summary.py | Uses shared representative renderer for embed_explore summary panel instead of inline image rendering logic. |
| shared/components/representatives.py | New shared Streamlit renderer for per-cluster representative images with fallback behavior. |
| shared/init.py | Reads __version__ from installed package metadata (fallback for source-tree runs). |
| pyproject.toml | Adds requests dependency; removes Hatch version-from-file config in favor of [project].version. |
| apps/precalculated/components/data_preview.py | Replaces per-component URL image fetching with shared image utils; adds representative images panel for KMeans runs. |
| apps/precalculated/app.py | Renders the new representative images section in the precalculated app layout. |
Comment on lines
+86
to
+91
| try: | ||
| resp = _get_session().get(url, timeout=timeout, stream=True) | ||
| resp.raise_for_status() | ||
| if not resp.headers.get('content-type', '').lower().startswith('image/'): | ||
| return None | ||
| return resp.content |
| representatives: Dict[object, List[int]], | ||
| resolve_image: Callable[[int], Optional[Any]], | ||
| n_per_cluster: int = 3, | ||
| caption_fn: Optional[Callable[[int], str]] = None, |
Comment on lines
+276
to
+284
| def _resolve(idx): | ||
| url = resolve_record_image_url(df_plot.iloc[idx]) | ||
| if not url: | ||
| return None | ||
| # Prefetched URLs hit the process cache; anything deeper falls back to | ||
| # a single synchronous fetch (also cached). | ||
| if url in _IMAGE_CACHE: | ||
| return _IMAGE_CACHE[url] | ||
| return get_image_from_url(url) |
Comment on lines
+12
to
+18
| from shared.utils.images import ( | ||
| IMAGE_URL_COLUMNS, | ||
| fetch_images_concurrent, | ||
| get_image_from_url, | ||
| resolve_record_image_url, | ||
| _IMAGE_CACHE, | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The embed_explore app already shows representative images per cluster; the precalculated app didn't. This adds that capability to precalculated and factors the logic into shared components so both apps render representatives through one patch.
Representative Images panel per KMeans run in the precalculated app: the members closest to each cluster centroid (computed on the full-dimensional embeddings). A selector picks which KMeans (k=N) run to view (multi-run aware).
Images are fetched from each record's URL column; unreachable/broken images are skipped and the next-closest candidate is shown.
The pkg now declares
requestslib explicitly, previously only transitive.Single-sources the package version to
pyproject.toml version. Andshared.__version__reads it viaimportlib.metadata, and the image fetch User-Agent reports itemb-explorer/1.0.0 (+https://github.com/Imageomics/emb-explorer)Closes #39