feat: Add Error Analysis tab for semantic segmentation by YUVAN0907 · Pull Request #550 · JdeRobot/PerceptionMetrics

YUVAN0907 · 2026-04-14T19:38:09Z

Summary

This PR introduces a new "Error Analysis" tab to the PerceptionMetrics GUI.

It provides an interactive way to compare ground-truth and predicted segmentation masks at a pixel level — without requiring model inference to be re-run.

The implementation is modular:

Two new files are added (tabs/error_analysis.py, utils/error_utils.py)
Only minimal integration is done in app.py (3 lines)

No existing tabs, sidebar logic, or session state are modified.

Motivation

The current GUI supports dataset exploration, inference, and evaluation metrics, but lacks a way to visually inspect where a model fails on a specific image.

In practical segmentation workflows, it is important to:

identify failure regions
understand class-wise performance
debug model behavior visually

This feature addresses that gap by enabling direct pixel-level inspection using saved prediction masks.

Changes

New files

File	Description
`utils/error_utils.py`	Lightweight NumPy-based utilities for mask comparison and metrics
`tabs/error_analysis.py`	Streamlit UI for error visualization

Modified files

File	Change
`app.py`	Added import + tab entry + render call (no other changes)

Feature overview

Inputs

Ground Truth Mask (.png) — single-channel mask with class IDs
Prediction Mask (.png) — same format, generated externally
Original Image (optional) — used for overlay visualization

All inputs are placed in the main panel to avoid interfering with existing sidebar functionality.

Outputs

Summary metrics
- Overall pixel accuracy
- Correct / incorrect pixel counts
Visual comparison
- Ground truth
- Prediction
- Error map (green = correct, red = incorrect)
Overlay view
- Error map blended with original image
- Toggle between raw and overlay views
Class-wise accuracy
- Table + bar chart
- Computed for each class in ground truth

Screenshots

Visual Comparison (GT vs Prediction vs Error Map)

Class-wise Accuracy

Demo

Demo video: https://youtu.be/L0MEw628Kko?si=mUzkr9NYOWuEMR8t

Design decisions

No deep learning dependencies

This module operates on already-generated mask files and uses only NumPy + Pillow.

Keeping it independent of PyTorch/TensorFlow:

avoids unnecessary overhead
keeps the GUI lightweight
ensures compatibility with any external inference pipeline

Future work

This module is designed to be easily extendable. Possible future improvements include:

Integration with the Evaluator tab to directly consume model outputs
Support for standard datasets such as Cityscapes, SemanticKITTI, and others
Additional metrics such as IoU / mIoU per class
Batch-level error analysis across multiple images

I would appreciate any feedback, and I’m happy to make further improvements based on your suggestions. 🙌

YUVAN0907 · 2026-04-14T19:39:22Z

Hi @dpascualhe and @SakhinetiPraveena ,

I have raised this pull request for the Error Analysis module.
I would greatly appreciate your review and any feedback or suggestions for improvement.

Looking forward to your thoughts.

Thank you!

dpascualhe · 2026-04-17T15:52:00Z

Hi @YUVAN0907 , thanks for raising your PR! First of all, I'll clarify that we won't merge this PR as it overlaps with our intended GSoC project. Nonetheless, if you want feedback, I'd say that the main issue with this proposal is that you are not using our metrics factory at all. Even if your implementation is independent from our usual evaluation workflow, you can still use SegmentationMetricsFactory to get some stats.

YUVAN0907 · 2026-04-17T17:06:03Z

Hi @YUVAN0907 , thanks for raising your PR! First of all, I'll clarify that we won't merge this PR as it overlaps with our intended GSoC project. Nonetheless, if you want feedback, I'd say that the main issue with this proposal is that you are not using our metrics factory at all. Even if your implementation is independent from our usual evaluation workflow, you can still use SegmentationMetricsFactory to get some stats.

Thank you for reviewing my PR and for the clarification.

I understand the concern regarding the overlap with the planned GSoC project and the decision not to merge it. I also appreciate your feedback about not utilizing the existing metrics pipeline.

You’re right, I didn’t integrate the implementation with the "SegmentationMetricsFactory", as I initially approached it as a standalone evaluation workflow. However, I see now that aligning with the existing metrics factory would make the implementation more consistent with the project’s architecture and reusable within the framework.

I’ll take this into account and explore how I can adapt my approach to leverage "SegmentationMetricsFactory" for computing the required metrics.

Thanks again for your guidance😄.

Best regards,
Yuvansankar

YUVAN0907 · 2026-04-19T09:46:27Z

Add Error Analysis tab for semantic segmentation

Summary

Adds a fourth tab — Error Analysis — to the PerceptionMetrics GUI,
providing an interactive way to visually inspect and debug semantic segmentation results using pre-generated masks.

This update aligns with the existing PerceptionMetrics evaluation pipeline by reusing the built-in SegmentationMetrics instead of introducing custom metric logic.

New files

File	Purpose
`utils/error_utils.py`	Pure-computation utilities. Loads masks, generates error maps, and computes metrics using `SegmentationMetrics`.
`tabs/error_analysis.py`	Streamlit UI layer for visualization, overlays, and metric display.

Changed files

File	Change
`app.py`	Minimal integration (3 lines): import + tab addition + render call

Design decisions

Aligned with existing evaluation pipeline

Metric computation now uses
perceptionmetrics.segmentation.image.metrics.SegmentationMetrics
Ensures consistency with CLI (pm_evaluate)
Avoids duplicating evaluation logic

No custom metric implementation

Removed previous numpy-based metric calculations
Fully relies on the repository’s internal evaluation system

No torch dependency

SegmentationMetrics is numpy-based
This tab works without GPU or deep learning frameworks
Designed for post-hoc analysis of saved predictions

Minimal and non-intrusive integration

No changes to existing tabs or sidebar
Feature added as an independent tab
Only 3 lines modified in app.py

Modular separation

error_utils.py → computation only
error_analysis.py → UI only
Follows existing tab architecture pattern

Widget key isolation

All Streamlit keys prefixed with ea_ to avoid conflicts

Screenshots

Features

Pixel-wise error visualization
- Green → correct predictions
- Red → incorrect predictions
Overlay view
- Error map blended with original image
Metrics (via SegmentationMetrics)
- Pixel Accuracy
- Mean IoU (mIoU)
- Mean Dice score
- Correct / incorrect pixel counts
Per-class breakdown
- Table + bar charts for IoU, Dice, and accuracy

YUVAN0907 · 2026-04-19T09:53:24Z

Hi @dpascualhe,

I have updated the Error Analysis feature based on your feedback by integrating the existing SegmentationMetrics from the codebase, aligning it with the current evaluation pipeline.

Could you please take a look and share your suggestions?🙌🏻

Thanks😄.

Best regards,
Yuvansankar

…ation pipeline

YUVAN0907 · 2026-04-20T13:47:34Z

Hi @dpascualhe @SakhinetiPraveena

I’ve updated the implementation based on your feedback by integrating the existing SegmentationMetrics and aligning it with the current evaluation pipeline.

I also resolved several CI and pytest-related issues (dependency sync and test compatibility), and all checks are now passing.

No new PR is needed , the updates are pushed to the same branch.

I would appreciate any further feedback or suggestions.

Thanks.

Best regards,
Yuvansankar

YUVAN0907 added 7 commits April 20, 2026 18:28

feat: add Error Analysis tab for segmentation mask comparison

3d8bb87

fix: align error analysis with SegmentationMetrics and existing evalu…

55dde32

…ation pipeline

fix: update poetry.lock to match pyproject.toml

5ae68f9

fix: update poetry.lock to match pyproject.toml

0fbb47c

fix: sync poetry.lock with pyproject.toml for CI

1e9c8a4

fix: regenerate poetry.lock after rebase

faba9e1

fix: remove strict Open3D type checks for CI compatibility

dc8f280

YUVAN0907 force-pushed the feature/error-analysis-clean branch from 4342534 to faba9e1 Compare April 20, 2026 13:21

YUVAN0907 added 2 commits April 20, 2026 18:53

fix: handle mocked Open3D objects in build_point_cloud test

d4bc9b1

fix: skip Open3D data checks for mocked objects in CI

25cf799

YUVAN0907 mentioned this pull request Apr 20, 2026

CI Failure: poetry.lock and pyproject.toml mismatch #559

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Error Analysis tab for semantic segmentation#550

feat: Add Error Analysis tab for semantic segmentation#550
YUVAN0907 wants to merge 9 commits into
JdeRobot:masterfrom
YUVAN0907:feature/error-analysis-clean

YUVAN0907 commented Apr 14, 2026

Uh oh!

YUVAN0907 commented Apr 14, 2026

Uh oh!

dpascualhe commented Apr 17, 2026

Uh oh!

YUVAN0907 commented Apr 17, 2026

Uh oh!

YUVAN0907 commented Apr 19, 2026

Uh oh!

YUVAN0907 commented Apr 19, 2026

Uh oh!

YUVAN0907 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YUVAN0907 commented Apr 14, 2026

Summary

Motivation

Changes

New files

Modified files

Feature overview

Inputs

Outputs

Screenshots

Visual Comparison (GT vs Prediction vs Error Map)

Class-wise Accuracy

Demo

Design decisions

No deep learning dependencies

Future work

Uh oh!

YUVAN0907 commented Apr 14, 2026

Uh oh!

dpascualhe commented Apr 17, 2026

Uh oh!

YUVAN0907 commented Apr 17, 2026

Uh oh!

YUVAN0907 commented Apr 19, 2026

Add Error Analysis tab for semantic segmentation

Summary

New files

Changed files

Design decisions

Screenshots

Features

Uh oh!

YUVAN0907 commented Apr 19, 2026

Uh oh!

YUVAN0907 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants