Whole Slide Image (WSI) dataset quality control and analytics platform for computational pathology. Provides comprehensive tile-level QC, class imbalance analysis, and optional integration with AtlasPatch for SAM2-based tissue detection.
- Triple pipeline architecture: NativeQC (built-in Otsu/OD masking + grid tiling), StampQC (STAMP-style brightness + Canny edge filtering), or AtlasPatchQC (SAM2-based tissue detection via external tool)
- Per-tile QC scoring: tissue fraction, brightness, blur (Laplacian variance), white/background fraction, pen marks, folds
- Multi-level statistics: tile, slide, patient, and label-level summaries with bag ratio tracking
- Class imbalance analysis: entropy, effective N, imbalance ratio, before/after QC comparison
- Interactive UI: Streamlit app with 3-tab workflow (Input, Parameters, Run & Results)
- Headless CLI: batch processing with YAML config, presets, and CLI overrides
- Exportable reports: HTML reports with embedded Plotly plots, CSV/Parquet exports
- Multi-backend WSI reading: OpenSlide, tifffile, pyvips (uses whichever is installed)
pip install -r requirements.txtInstall at least one WSI backend:
# OpenSlide (recommended)
pip install openslide-python
# macOS: brew install openslide
# Linux: apt install openslide-tools
# OR tifffile
pip install tifffile imagecodecs
# OR pyvips
pip install pyvips
# macOS: brew install vips
# Linux: apt install libvips-devpython selfcheck.pystreamlit run app.py# Preview run (30 slides)
python cli.py --csv /path/to/slides.csv
# Full run
python cli.py --csv /path/to/slides.csv --full --output runs/
# With AtlasPatch pipeline
python cli.py --csv /path/to/slides.csv --pipeline AtlasPatchQC --full
# Apply a preset
python cli.py --csv /path/to/slides.csv --preset conservative_tissue --full
# Custom config
python cli.py --csv /path/to/slides.csv --config configs/default.yaml --full
# Generate HTML report
python cli.py --csv /path/to/slides.csv --full --report
# Check available backends
python cli.py --backends
# Validate config
python cli.py --csv /path/to/slides.csv --validateThe input CSV must have these columns:
| Column | Description |
|---|---|
patient_id |
Patient identifier |
slide_id |
Unique slide identifier |
slide_path |
Absolute path to WSI file (.svs, .ndpi, .tif, .mrxs, etc.) |
label |
Class label (e.g., tumor, normal) |
Example:
patient_id,slide_id,slide_path,label
P001,S001,/data/slides/slide_001.svs,tumor
P001,S002,/data/slides/slide_002.svs,tumor
P002,S003,/data/slides/slide_003.svs,normalWSI-Analytics/
├── app.py # Streamlit UI
├── cli.py # Headless batch CLI
├── selfcheck.py # Self-validation test suite (17 tests)
├── requirements.txt
├── configs/
│ └── default.yaml # Default configuration
├── wsi_analyticspro/ # Core package
│ ├── __init__.py
│ ├── config.py # YAML config + validation + presets
│ ├── io/
│ │ ├── csv_ingest.py # CSV loading + preview subset
│ │ └── artifacts.py # Run directory + manifest + exports
│ ├── wsi/
│ │ ├── reader.py # WSI reader + masking + tiling
│ │ └── metadata.py # MPP/mag utilities + AtlasPatch check
│ ├── pipelines/
│ │ ├── qc_native.py # NativeQC pipeline + SlideResult
│ │ ├── qc_stamp.py # StampQC pipeline (STAMP-style tile selection)
│ │ ├── atlaspatch_wrapper.py # AtlasPatch CLI wrapper + HDF5 parsing
│ │ └── orchestrator.py # Pipeline dispatcher + PipelineResult
│ ├── stats/
│ │ ├── tile_stats.py # Per-tile statistics
│ │ ├── slide_stats.py # Per-slide statistics
│ │ ├── patient_stats.py # Per-patient statistics
│ │ ├── label_stats.py # Per-label statistics
│ │ └── imbalance.py # Class imbalance analysis
│ └── viz/
│ ├── plots.py # Plotly plots + montages
│ ├── overlays.py # Mask/grid overlays
│ └── report.py # HTML report builder
├── wsi_reader.py # (legacy) Multi-backend WSI reader
├── masking.py # (legacy) Tissue masking
├── tiling.py # (legacy) Tile generation + QC scoring
├── metrics.py # (legacy) Statistics
├── viz.py # (legacy) Visualizations
├── sweeps.py # (legacy) Parameter sweep engine
└── qc_pipeline.py # (legacy) Pipeline orchestration
Built-in pipeline using Otsu/OD thresholding for tissue masking and grid-based tiling:
- Read slide with auto-detected backend
- Generate thumbnail and tissue mask
- Generate tile coordinates on tissue regions
- Score each tile (tissue fraction, brightness, blur, white fraction)
- Accept/reject based on QC thresholds
Replicates the STAMP tile-selection pipeline for compatibility with STAMP-based MIL workflows:
- Generate non-overlapping tile grid (stride = tile size, no partial edge tiles)
- Coarse brightness rejection at supertile level (grayscale intensity ≥ cutoff → background)
- Fine Canny edge filter per tile (edge fraction < cutoff → discard)
- Tiles surviving both filters are accepted
STAMP QC defaults (applied automatically when pipeline is StampQC):
| Parameter | Default | Description |
|---|---|---|
tile_size_um |
256.0 | Physical tile size in µm |
tile_size_px |
224 | Output tile size in pixels |
brightness_cutoff |
240 | Supertile brightness threshold (set to null to disable) |
canny_cutoff |
0.02 | Canny edge fraction threshold (set to null to disable) |
Hardcoded constants (not configurable):
- Canny low/high thresholds: 40 / 100
- Overlap: 0 (non-overlapping grid)
- Read level: 0 (full resolution)
- Max supertile size: 1024 px
CLI example:
python cli.py --csv /path/to/slides.csv --pipeline StampQC --full
python cli.py --csv /path/to/slides.csv --pipeline StampQC --stamp-brightness-cutoff 230
python cli.py --csv /path/to/slides.csv --pipeline StampQC --stamp-disable-brightnessWraps the external AtlasPatch tool (CC-BY-NC-SA-4.0):
- Run AtlasPatch CLI on slide (SAM2-based tissue detection)
- Parse HDF5 output (coords dataset: N x 5 int32)
- Optionally score tiles with native QC metrics
- Apply same accept/reject logic
Setup AtlasPatch:
pip install atlas-patch
# or clone and install from source| Preset | Description |
|---|---|
fast_preview |
Low thresholds, fast iteration |
conservative_tissue |
High tissue fraction, strict brightness |
high_purity |
Maximum purity with blur + artifact detection |
| Parameter | Default | Description |
|---|---|---|
tile_size_um |
256.0 | Physical tile size in micrometers |
tile_size_px |
224 | Tile size in pixels |
min_tissue_fraction |
0.5 | Minimum tissue to accept tile |
brightness_max |
220 | Maximum mean brightness (0-255) |
blur_threshold |
15.0 | Minimum Laplacian variance |
min_tiles_per_slide |
5 | Below this, slide is QC_FAIL |
The target resolution is derived from tile sizing defaults:
- Default:
tile_size_um = 256.0,tile_size_px = 224 - Derived:
target_mpp = 256.0 / 224 = 1.143 um/px
You can override any one of these three values and the others will be computed.
Each run creates a directory under runs/ with:
runs/<run_id>/
├── manifest.json # Reproducibility metadata
├── config.yaml # Frozen config
├── tiles.csv # All tile records
├── tiles.parquet # Tiles in Parquet format
├── slides_summary.csv # Per-slide statistics
├── patients_summary.csv # Per-patient statistics
├── labels_summary.csv # Per-label statistics
├── report.html # HTML report (if --report)
├── run.log # Processing log
├── thumbnails/ # Per-slide thumbnails + overlays
└── debug/ # Debug artifacts
- Tissue fraction, brightness, blur score, white fraction distributions
- Reject reason breakdown with percentages
- Borderline tile counts (near QC thresholds)
- Acceptance rate, bag ratio (accepted/candidates)
- Per-slide metric distributions
- Low-acceptance slide detection
- Tiles per patient, slides per patient
- Patient outlier detection (>10% tile share)
- Before/after QC tile counts per label
- Per-label acceptance rates and bag ratios
- Shannon entropy and effective N classes
- Imbalance ratio (max/min)
- Before vs after QC comparison
- Automatic warnings for worsened or severe imbalance
brew install openslide # macOS
apt install openslide-tools # Linux
pip install openslide-pythonIf you see Library not loaded: libopenslide.0.dylib:
export DYLD_LIBRARY_PATH=$(brew --prefix openslide)/lib:$DYLD_LIBRARY_PATHThe tool will still run but mark all slides as unreadable. Install at least one backend (openslide recommended).
streamlit run app.py --server.port 8502- Use preview mode first (default: 30 slides)
- Increase
mask_workscale_mpp(e.g., 16.0) for faster masking - Set
max_tiles_per_slideto limit tile count - Use the CLI for batch processing
