fix: improve metrics reliability and edge-case handling#875

Open

AhmedAli58 wants to merge 3 commits intosunlabuiuc:masterfrom

AhmedAli58:bench/polars-streaming-loader-benchmark

AhmedAli58 commented Feb 25, 2026

Summary

This PR addresses three meaningful quality issues in the metrics layer, with regression tests.

Problem statements and fixes

regression_metrics_fn was order-dependent because kl_divergence mutated intermediate arrays used by later metrics.

Fixed by using local safe copies for KL computation so mse/mae remain unaffected by metric ordering.

multilabel_metrics_fn mutated y_pred while computing ddi, which could break subsequent metrics in the same call; it also returned only ddi_score instead of the requested ddi key.

Fixed by using a local pred_labels list (no mutation), returning ddi, and preserving ddi_score for backward compatibility.

ranking_metrics_fn lacked robust validation and failed unclearly on edge cases (invalid k_values, empty score output).

Added input validation for empty args and non-positive k_values, narrowed import error handling, and added an explicit error when evaluator output is empty.

Impact

Eliminates order-dependent metric behavior.
Improves reliability for mixed metric requests in multilabel tasks.
Produces clearer, earlier failures for ranking edge cases.

Tests

Added tests/core/test_metrics_quality.py with regression coverage for all three issues.
Verified with:
- python -m py_compile pyhealth/metrics/regression.py pyhealth/metrics/multilabel.py pyhealth/metrics/ranking.py tests/core/test_metrics_quality.py
- HOME=/tmp python -m unittest tests.core.test_metrics_quality -v

ahmed79x7 added 3 commits

February 23, 2026 09:29


          feat: add streaming vs in-memory loader benchmark

44f1d2b


          bench: add Polars streaming vs in-memory loader benchmark

c16e4aa


          fix: improve metrics reliability with regression tests

dab389f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet