feat: Allow failing toys#2128
Open
lukasheinrich wants to merge 6 commits intomainfrom
Open
Conversation
691e807 to
64a741a
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2128 +/- ##
=======================================
Coverage 98.28% 98.29%
=======================================
Files 65 65
Lines 4305 4328 +23
Branches 465 467 +2
=======================================
+ Hits 4231 4254 +23
Misses 46 46
Partials 28 28
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
64a741a to
f93d811
Compare
f93d811 to
e589871
Compare
e589871 to
803b842
Compare
89a5e54 to
116f995
Compare
Member
|
Before force-pushing this, I backed up it existing state to my fork as |
116f995 to
0a33c28
Compare
kratsg
added a commit
that referenced
this pull request
Apr 11, 2026
Replace the draft skip_failing_toys bool with a proper failure_threshold (float | None) that controls what fraction of toys are allowed to fail before raising. Any FailedMinimization exceptions are collected per hypothesis into a frozen ToyResult dataclass (toy_index, sample, exception) stored on the calculator as toy_results after distributions() returns, giving users full visibility into which toys failed and why. When threshold=None (default) the first failure propagates immediately, preserving existing behaviour. Closes #2128 Co-Authored-By: kratsg <kratsg@gmail.com>
kratsg
added a commit
that referenced
this pull request
Apr 11, 2026
Replace the draft skip_failing_toys bool with a proper failure_threshold (float | None) that controls what fraction of toys are allowed to fail before raising. Any FailedMinimization exceptions are collected per hypothesis into a frozen ToyResult dataclass (toy_index, sample, exception) stored on the calculator as toy_results after distributions() returns, giving users full visibility into which toys failed and why. When threshold=None (default) the first failure propagates immediately, preserving existing behaviour. Closes #2128 Co-Authored-By: kratsg <kratsg@gmail.com>
10d20e2 to
649a020
Compare
kratsg
added a commit
that referenced
this pull request
Apr 11, 2026
Replace the draft skip_failing_toys bool with a proper failure_threshold (float | None) that controls what fraction of toys are allowed to fail before raising. Any FailedMinimization exceptions are collected per hypothesis into a frozen ToyResult dataclass (toy_index, sample, exception) stored on the calculator as toy_results after distributions() returns, giving users full visibility into which toys failed and why. When threshold=None (default) the first failure propagates immediately, preserving existing behaviour. Closes #2128 Co-Authored-By: kratsg <kratsg@gmail.com>
0ef4806 to
0713922
Compare
for more information, see https://pre-commit.ci
Replace the draft skip_failing_toys bool with a proper failure_threshold (float | None) that controls what fraction of toys are allowed to fail before raising. Any FailedMinimization exceptions are collected per hypothesis into a frozen ToyResult dataclass (toy_index, sample, exception) stored on the calculator as toy_results after distributions() returns, giving users full visibility into which toys failed and why. When threshold=None (default) the first failure propagates immediately, preserving existing behaviour. Closes #2128 Co-Authored-By: kratsg <kratsg@gmail.com>
0713922 to
e86fbf7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Description
Resolves #1427.
When running toy-based hypothesis tests with
ToyCalculator, any single failed fit (raised asFailedMinimization) would abort the entire toy loop. For largentoysruns this is extremely frustrating — all progress is lost with no information about which toys failed or why.This PR adds two things:
ToyResultdataclass — a frozen public dataclass that captures per-hypothesis diagnostics from the toy loop, stored on the calculator astoy_resultsafterdistributions()returns:successful: list[float]which is a list of test statistic values for toys where the fit convergedfailed: list[tuple[int, Tensor, FailedMinimization]]which is a list of(toy_index, sample, exception)for each failed toy, giving callers full access to the pseudo-data and underlying fit result viaexception.resultfailure_thresholdparameter onToyCalculatorwhich replaces the draft booleanskip_failing_toyswith a float (orNone):None(default): anyFailedMinimizationpropagates immediately, preserving existing behaviour[0.0, 1.0]: fraction of toys allowed to fail before raising; e.g.failure_threshold=0.1permits up to 10% failuresWARNINGis logged with the count and fractionfailure_thresholdpasses throughhypotest()kwargs to the calculator automaticallyThe internal toy loop is also refactored into a private
_collect_toy_teststats()method to eliminate the duplicated signal/background logic.Checklist Before Requesting Reviewer
Before Merging
For the PR Assignees: