Skip to content

fix: improve metrics reliability and edge-case handling#875

Open
AhmedAli58 wants to merge 3 commits intosunlabuiuc:masterfrom
AhmedAli58:bench/polars-streaming-loader-benchmark
Open

fix: improve metrics reliability and edge-case handling#875
AhmedAli58 wants to merge 3 commits intosunlabuiuc:masterfrom
AhmedAli58:bench/polars-streaming-loader-benchmark

Conversation

@AhmedAli58
Copy link

Summary

This PR addresses three meaningful quality issues in the metrics layer, with regression tests.

Problem statements and fixes

  1. regression_metrics_fn was order-dependent because kl_divergence mutated intermediate arrays used by later metrics.
  • Fixed by using local safe copies for KL computation so mse/mae remain unaffected by metric ordering.
  1. multilabel_metrics_fn mutated y_pred while computing ddi, which could break subsequent metrics in the same call; it also returned only ddi_score instead of the requested ddi key.
  • Fixed by using a local pred_labels list (no mutation), returning ddi, and preserving ddi_score for backward compatibility.
  1. ranking_metrics_fn lacked robust validation and failed unclearly on edge cases (invalid k_values, empty score output).
  • Added input validation for empty args and non-positive k_values, narrowed import error handling, and added an explicit error when evaluator output is empty.

Impact

  • Eliminates order-dependent metric behavior.
  • Improves reliability for mixed metric requests in multilabel tasks.
  • Produces clearer, earlier failures for ranking edge cases.

Tests

  • Added tests/core/test_metrics_quality.py with regression coverage for all three issues.
  • Verified with:
    • python -m py_compile pyhealth/metrics/regression.py pyhealth/metrics/multilabel.py pyhealth/metrics/ranking.py tests/core/test_metrics_quality.py
    • HOME=/tmp python -m unittest tests.core.test_metrics_quality -v

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants