Skip to content

feat: Add Error Analysis tab for semantic segmentation#550

Open
YUVAN0907 wants to merge 9 commits into
JdeRobot:masterfrom
YUVAN0907:feature/error-analysis-clean
Open

feat: Add Error Analysis tab for semantic segmentation#550
YUVAN0907 wants to merge 9 commits into
JdeRobot:masterfrom
YUVAN0907:feature/error-analysis-clean

Conversation

@YUVAN0907
Copy link
Copy Markdown
Contributor

Summary

This PR introduces a new "Error Analysis" tab to the PerceptionMetrics GUI.

It provides an interactive way to compare ground-truth and predicted segmentation masks at a pixel level — without requiring model inference to be re-run.

The implementation is modular:

  • Two new files are added (tabs/error_analysis.py, utils/error_utils.py)
  • Only minimal integration is done in app.py (3 lines)

No existing tabs, sidebar logic, or session state are modified.


Motivation

The current GUI supports dataset exploration, inference, and evaluation metrics, but lacks a way to visually inspect where a model fails on a specific image.

In practical segmentation workflows, it is important to:

  • identify failure regions
  • understand class-wise performance
  • debug model behavior visually

This feature addresses that gap by enabling direct pixel-level inspection using saved prediction masks.


Changes

New files

File Description
utils/error_utils.py Lightweight NumPy-based utilities for mask comparison and metrics
tabs/error_analysis.py Streamlit UI for error visualization

Modified files

File Change
app.py Added import + tab entry + render call (no other changes)

Feature overview

Inputs

  • Ground Truth Mask (.png) — single-channel mask with class IDs
  • Prediction Mask (.png) — same format, generated externally
  • Original Image (optional) — used for overlay visualization

All inputs are placed in the main panel to avoid interfering with existing sidebar functionality.


Outputs

  1. Summary metrics

    • Overall pixel accuracy
    • Correct / incorrect pixel counts
  2. Visual comparison

    • Ground truth
    • Prediction
    • Error map (green = correct, red = incorrect)
  3. Overlay view

    • Error map blended with original image
    • Toggle between raw and overlay views
  4. Class-wise accuracy

    • Table + bar chart
    • Computed for each class in ground truth

Screenshots

Screenshot 2026-04-14 000110

Visual Comparison (GT vs Prediction vs Error Map)

Screenshot 2026-04-14 000309

Class-wise Accuracy

Screenshot 2026-04-14 000408

Demo

Demo video: https://youtu.be/L0MEw628Kko?si=mUzkr9NYOWuEMR8t


Design decisions

No deep learning dependencies

This module operates on already-generated mask files and uses only NumPy + Pillow.

Keeping it independent of PyTorch/TensorFlow:

  • avoids unnecessary overhead
  • keeps the GUI lightweight
  • ensures compatibility with any external inference pipeline

Future work

This module is designed to be easily extendable. Possible future improvements include:

  • Integration with the Evaluator tab to directly consume model outputs
  • Support for standard datasets such as Cityscapes, SemanticKITTI, and others
  • Additional metrics such as IoU / mIoU per class
  • Batch-level error analysis across multiple images

I would appreciate any feedback, and I’m happy to make further improvements based on your suggestions. 🙌

@YUVAN0907
Copy link
Copy Markdown
Contributor Author

Hi @dpascualhe and @SakhinetiPraveena ,

I have raised this pull request for the Error Analysis module.
I would greatly appreciate your review and any feedback or suggestions for improvement.

Looking forward to your thoughts.

Thank you!

@dpascualhe
Copy link
Copy Markdown
Collaborator

Hi @YUVAN0907 , thanks for raising your PR! First of all, I'll clarify that we won't merge this PR as it overlaps with our intended GSoC project. Nonetheless, if you want feedback, I'd say that the main issue with this proposal is that you are not using our metrics factory at all. Even if your implementation is independent from our usual evaluation workflow, you can still use SegmentationMetricsFactory to get some stats.

@YUVAN0907
Copy link
Copy Markdown
Contributor Author

Hi @YUVAN0907 , thanks for raising your PR! First of all, I'll clarify that we won't merge this PR as it overlaps with our intended GSoC project. Nonetheless, if you want feedback, I'd say that the main issue with this proposal is that you are not using our metrics factory at all. Even if your implementation is independent from our usual evaluation workflow, you can still use SegmentationMetricsFactory to get some stats.

Thank you for reviewing my PR and for the clarification.

I understand the concern regarding the overlap with the planned GSoC project and the decision not to merge it. I also appreciate your feedback about not utilizing the existing metrics pipeline.

You’re right, I didn’t integrate the implementation with the "SegmentationMetricsFactory", as I initially approached it as a standalone evaluation workflow. However, I see now that aligning with the existing metrics factory would make the implementation more consistent with the project’s architecture and reusable within the framework.

I’ll take this into account and explore how I can adapt my approach to leverage "SegmentationMetricsFactory" for computing the required metrics.

Thanks again for your guidance😄.

Best regards,
Yuvansankar

@YUVAN0907
Copy link
Copy Markdown
Contributor Author

Add Error Analysis tab for semantic segmentation

Summary

Adds a fourth tab — Error Analysis — to the PerceptionMetrics GUI,
providing an interactive way to visually inspect and debug semantic segmentation results using pre-generated masks.

This update aligns with the existing PerceptionMetrics evaluation pipeline by reusing the built-in SegmentationMetrics instead of introducing custom metric logic.


New files

File Purpose
utils/error_utils.py Pure-computation utilities. Loads masks, generates error maps, and computes metrics using SegmentationMetrics.
tabs/error_analysis.py Streamlit UI layer for visualization, overlays, and metric display.

Changed files

File Change
app.py Minimal integration (3 lines): import + tab addition + render call

Design decisions

Aligned with existing evaluation pipeline

  • Metric computation now uses
    perceptionmetrics.segmentation.image.metrics.SegmentationMetrics
  • Ensures consistency with CLI (pm_evaluate)
  • Avoids duplicating evaluation logic

No custom metric implementation

  • Removed previous numpy-based metric calculations
  • Fully relies on the repository’s internal evaluation system

No torch dependency

  • SegmentationMetrics is numpy-based
  • This tab works without GPU or deep learning frameworks
  • Designed for post-hoc analysis of saved predictions

Minimal and non-intrusive integration

  • No changes to existing tabs or sidebar
  • Feature added as an independent tab
  • Only 3 lines modified in app.py

Modular separation

  • error_utils.py → computation only
  • error_analysis.py → UI only
  • Follows existing tab architecture pattern

Widget key isolation

  • All Streamlit keys prefixed with ea_ to avoid conflicts

Screenshots

image image image image

Features

  • Pixel-wise error visualization

    • Green → correct predictions
    • Red → incorrect predictions
  • Overlay view

    • Error map blended with original image
  • Metrics (via SegmentationMetrics)

    • Pixel Accuracy
    • Mean IoU (mIoU)
    • Mean Dice score
    • Correct / incorrect pixel counts
  • Per-class breakdown

    • Table + bar charts for IoU, Dice, and accuracy

@YUVAN0907
Copy link
Copy Markdown
Contributor Author

Hi @dpascualhe,

I have updated the Error Analysis feature based on your feedback by integrating the existing SegmentationMetrics from the codebase, aligning it with the current evaluation pipeline.

Could you please take a look and share your suggestions?🙌🏻

Thanks😄.

Best regards,
Yuvansankar

@YUVAN0907 YUVAN0907 force-pushed the feature/error-analysis-clean branch from 4342534 to faba9e1 Compare April 20, 2026 13:21
@YUVAN0907
Copy link
Copy Markdown
Contributor Author

Hi @dpascualhe @SakhinetiPraveena

I’ve updated the implementation based on your feedback by integrating the existing SegmentationMetrics and aligning it with the current evaluation pipeline.

I also resolved several CI and pytest-related issues (dependency sync and test compatibility), and all checks are now passing.

No new PR is needed , the updates are pushed to the same branch.

I would appreciate any further feedback or suggestions.

Thanks.

Best regards,
Yuvansankar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants