feat: Allow failing toys by lukasheinrich · Pull Request #2128 · scikit-hep/pyhf

lukasheinrich · 2023-03-03T16:49:15Z

Pull Request Description

Resolves #1427.

When running toy-based hypothesis tests with ToyCalculator, any single failed fit (raised as FailedMinimization) would abort the entire toy loop. For large ntoys runs this is extremely frustrating — all progress is lost with no information about which toys failed or why.

This PR adds two things:

ToyResult dataclass — a frozen public dataclass that captures per-hypothesis diagnostics from the toy loop, stored on the calculator as toy_results after distributions() returns:
- successful: list[float] which is a list of test statistic values for toys where the fit converged
- failed: list[tuple[int, Tensor, FailedMinimization]] which is a list of (toy_index, sample, exception) for each failed toy, giving callers full access to the pseudo-data and underlying fit result via exception.result
failure_threshold parameter on ToyCalculator which replaces the draft boolean skip_failing_toys with a float (or None):
- None (default): any FailedMinimization propagates immediately, preserving existing behaviour
- [0.0, 1.0]: fraction of toys allowed to fail before raising; e.g. failure_threshold=0.1 permits up to 10% failures
- When failures occur within the threshold, a WARNING is logged with the count and fraction
- failure_threshold passes through hypotest() kwargs to the calculator automatically

The internal toy loop is also refactored into a private _collect_toy_teststats() method to eliminate the duplicated signal/background logic.

Checklist Before Requesting Reviewer

Tests are passing
"WIP" removed from the title of the pull request
Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

Summarize commit messages into a comprehensive review of the PR

* Replace the draft skip_failing_toys boolean with a failure_threshold (float | None)
  parameter on ToyCalculator. When None (default) any FailedMinimization propagates
  immediately, preserving existing behaviour. A float value sets the maximum fraction
  of toys allowed to fail before raising.
* Add ToyResult frozen dataclass to capture per-hypothesis diagnostics: successful 
  test statistic values and failed entries as (toy_index, sample, FailedMinimization)
  tuples. Stored on the calculator as toy_results after distributions() returns,
  giving callers full access to which toys failed and the underlying fit result.
* Refactor duplicated signal/background toy loops into a single private
  _collect_toy_teststats() method. Catches FailedMinimization specifically (not bare
  except or RuntimeError), logs a WARNING with failure count and fraction, and
  checks the running failure fraction against failure_threshold after each failure.
* Add ToyResult to the public API (__all__) of pyhf.infer.calculators.
* Add tests covering: no-failure baseline (toy_results populated correctly),
  default raises on first failure, threshold allows failures within limit, threshold
  exceeded raises, warning logged on failure, and failure_threshold forwarded
  through hypotest() kwargs to ToyCalculator.

Co-Authored-By: Giordon Stark <kratsg@gmail.com>

codecov · 2023-04-01T20:24:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.29%. Comparing base (24b1013) to head (e86fbf7).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2128   +/-   ##
=======================================
  Coverage   98.28%   98.29%           
=======================================
  Files          65       65           
  Lines        4305     4328   +23     
  Branches      465      467    +2     
=======================================
+ Hits         4231     4254   +23     
  Misses         46       46           
  Partials       28       28

Flag	Coverage Δ
contrib	`98.17% <100.00%> (+<0.01%)`	⬆️
doctest	`98.29% <100.00%> (+<0.01%)`	⬆️
unittests-3.10	`96.48% <100.00%> (+0.01%)`	⬆️
unittests-3.11	`96.48% <100.00%> (+0.01%)`	⬆️
unittests-3.12	`96.48% <100.00%> (+0.01%)`	⬆️
unittests-3.13	`96.48% <100.00%> (+0.01%)`	⬆️
unittests-3.9	`96.55% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

matthewfeickert · 2026-04-07T07:39:16Z

Before force-pushing this, I backed up it existing state to my fork as backup/lukas-toy_handling_test.

Replace the draft skip_failing_toys bool with a proper failure_threshold (float | None) that controls what fraction of toys are allowed to fail before raising. Any FailedMinimization exceptions are collected per hypothesis into a frozen ToyResult dataclass (toy_index, sample, exception) stored on the calculator as toy_results after distributions() returns, giving users full visibility into which toys failed and why. When threshold=None (default) the first failure propagates immediately, preserving existing behaviour. Closes #2128 Co-Authored-By: kratsg <kratsg@gmail.com>

for more information, see https://pre-commit.ci

Replace the draft skip_failing_toys bool with a proper failure_threshold (float | None) that controls what fraction of toys are allowed to fail before raising. Any FailedMinimization exceptions are collected per hypothesis into a frozen ToyResult dataclass (toy_index, sample, exception) stored on the calculator as toy_results after distributions() returns, giving users full visibility into which toys failed and why. When threshold=None (default) the first failure propagates immediately, preserving existing behaviour. Closes #2128 Co-Authored-By: kratsg <kratsg@gmail.com>

lukasheinrich changed the title ~~[WIP] first attempt on toy handling~~ [WIP] feat: allow failing toys Mar 3, 2023

lukasheinrich assigned matthewfeickert Mar 3, 2023

matthewfeickert assigned lukasheinrich Mar 3, 2023

matthewfeickert added the feat/enhancement New feature or request label Mar 3, 2023

matthewfeickert changed the title ~~[WIP] feat: allow failing toys~~ feat: Allow failing toys Mar 3, 2023

matthewfeickert marked this pull request as draft March 3, 2023 16:57

matthewfeickert force-pushed the toy_handling_test branch from 691e807 to 64a741a Compare April 1, 2023 20:10

matthewfeickert force-pushed the toy_handling_test branch from 64a741a to f93d811 Compare April 7, 2023 06:17

matthewfeickert force-pushed the toy_handling_test branch from f93d811 to e589871 Compare July 5, 2023 15:28

matthewfeickert force-pushed the toy_handling_test branch from e589871 to 803b842 Compare December 13, 2023 12:20

matthewfeickert added the API Changes the public API label Dec 13, 2023

kratsg force-pushed the toy_handling_test branch from 803b842 to c41bb4d Compare July 5, 2024 18:28

kratsg added the experiment/atlas Relevant to ATLAS's interests label Oct 18, 2024

matthewfeickert force-pushed the toy_handling_test branch from 89a5e54 to 116f995 Compare October 14, 2025 22:26

matthewfeickert force-pushed the toy_handling_test branch from 116f995 to 0a33c28 Compare April 7, 2026 07:39

matthewfeickert mentioned this pull request Apr 7, 2026

Release v0.8.0 burn down tracker #2681

Open

kratsg self-assigned this Apr 11, 2026

kratsg requested a review from matthewfeickert April 11, 2026 04:56

kratsg marked this pull request as ready for review April 11, 2026 04:58

kratsg force-pushed the toy_handling_test branch from 10d20e2 to 649a020 Compare April 11, 2026 04:58

kratsg mentioned this pull request Apr 11, 2026

Update toys to include documentation about allowing failing toys via threshold pyhf/pyhf-tutorial#121

Open

kratsg force-pushed the toy_handling_test branch from 0ef4806 to 0713922 Compare April 11, 2026 06:23

lukasheinrich and others added 3 commits April 11, 2026 15:49

first attempt

7d6d408

[pre-commit.ci] auto fixes from pre-commit.com hooks

85a6499

for more information, see https://pre-commit.ci

Apply Ruff formatting

1b24f23

kratsg added 3 commits April 11, 2026 15:49

imports at top

a2885fe

fix typing up a little...

e86fbf7

kratsg force-pushed the toy_handling_test branch from 0713922 to e86fbf7 Compare April 11, 2026 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Allow failing toys#2128

feat: Allow failing toys#2128
lukasheinrich wants to merge 6 commits intomainfrom
toy_handling_test

lukasheinrich commented Mar 3, 2023 •

edited by kratsg

Loading

Uh oh!

codecov Bot commented Apr 1, 2023 •

edited

Loading

Uh oh!

matthewfeickert commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lukasheinrich commented Mar 3, 2023 • edited by kratsg Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

Checklist Before Requesting Reviewer

Before Merging

Uh oh!

codecov Bot commented Apr 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

matthewfeickert commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lukasheinrich commented Mar 3, 2023 •

edited by kratsg

Loading

codecov Bot commented Apr 1, 2023 •

edited

Loading