Skip to content

Feature/cluster representative images#42

Open
NetZissou wants to merge 4 commits into
mainfrom
feature/cluster-representative-images
Open

Feature/cluster representative images#42
NetZissou wants to merge 4 commits into
mainfrom
feature/cluster-representative-images

Conversation

@NetZissou

Copy link
Copy Markdown
Collaborator

The embed_explore app already shows representative images per cluster; the precalculated app didn't. This adds that capability to precalculated and factors the logic into shared components so both apps render representatives through one patch.

Representative Images panel per KMeans run in the precalculated app: the members closest to each cluster centroid (computed on the full-dimensional embeddings). A selector picks which KMeans (k=N) run to view (multi-run aware).

Images are fetched from each record's URL column; unreachable/broken images are skipped and the next-closest candidate is shown.

The pkg now declares requests lib explicitly, previously only transitive.

Single-sources the package version to pyproject.toml version. And shared.__version__ reads it via importlib.metadata, and the image fetch User-Agent reports it emb-explorer/1.0.0 (+https://github.com/Imageomics/emb-explorer)

Closes #39

NetZissou and others added 4 commits June 15, 2026 15:53
Surfaces the members closest to each cluster centroid as a
representative-image panel across both apps. This feature already
exisits in the embed&explore app, now it's made available on the
precalculated embeddings app.

Adds a shared compute/render core and a reusable, thread-sfae
image-fetching layer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…sentative-images

# Conflicts:
#	apps/precalculated/app.py
#	shared/components/summary.py
[project].version (static, 1.0.0) is now the sole source of truth.

Drop the dynamic [tool.hatch.version], and read __version__ from
installed metadata via `importlib.metadata`.

Also remove [tool.hatch.metadata] allow-direct-references, obsolete now that the hpc-inference git dependency is gone.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds shared cluster-representative image selection + rendering so both embed_explore and the precalculated app can display “closest-to-centroid” representative images per cluster, with robust fallback when images can’t be loaded.

Changes:

  • Introduces find_cluster_representatives() (pure ranking) and a shared Streamlit renderer for representative images.
  • Adds shared, app-agnostic URL image fetching + process cache utilities and wires representative rendering into the precalculated app.
  • Declares requests as a direct dependency and single-sources shared.__version__ from installed package metadata.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_representatives.py Adds unit tests for representative ranking logic (centroid distance ordering, index correctness, oversample cap, label typing).
shared/utils/representatives.py New shared utility to compute ranked representative candidate indices per cluster.
shared/utils/images.py New shared URL image resolution/downloading/concurrent prefetch + process-level cache utilities.
shared/services/clustering_service.py Switches clustering summary representative selection to the shared representative-ranking utility.
shared/components/summary.py Uses shared representative renderer for embed_explore summary panel instead of inline image rendering logic.
shared/components/representatives.py New shared Streamlit renderer for per-cluster representative images with fallback behavior.
shared/init.py Reads __version__ from installed package metadata (fallback for source-tree runs).
pyproject.toml Adds requests dependency; removes Hatch version-from-file config in favor of [project].version.
apps/precalculated/components/data_preview.py Replaces per-component URL image fetching with shared image utils; adds representative images panel for KMeans runs.
apps/precalculated/app.py Renders the new representative images section in the precalculated app layout.

Comment thread shared/utils/images.py
Comment on lines +86 to +91
try:
resp = _get_session().get(url, timeout=timeout, stream=True)
resp.raise_for_status()
if not resp.headers.get('content-type', '').lower().startswith('image/'):
return None
return resp.content
representatives: Dict[object, List[int]],
resolve_image: Callable[[int], Optional[Any]],
n_per_cluster: int = 3,
caption_fn: Optional[Callable[[int], str]] = None,
Comment on lines +276 to +284
def _resolve(idx):
url = resolve_record_image_url(df_plot.iloc[idx])
if not url:
return None
# Prefetched URLs hit the process cache; anything deeper falls back to
# a single synchronous fetch (also cached).
if url in _IMAGE_CACHE:
return _IMAGE_CACHE[url]
return get_image_from_url(url)
Comment on lines +12 to +18
from shared.utils.images import (
IMAGE_URL_COLUMNS,
fetch_images_concurrent,
get_image_from_url,
resolve_record_image_url,
_IMAGE_CACHE,
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add per-cluster representative images to the precalculated app

2 participants