Skip to content

phd-runner option to defer guest cleanup on failure#1088

Open
lifning wants to merge 2 commits intomasterfrom
lif/phd-plot-armor
Open

phd-runner option to defer guest cleanup on failure#1088
lifning wants to merge 2 commits intomasterfrom
lif/phd-plot-armor

Conversation

@lifning
Copy link

@lifning lifning commented Mar 24, 2026

When --manual-stop-on-failure is passed, each propolis-server in failed test cases is left running (if it hadn't been shut down by the test case prior to failure explicitly), and its address is echoed to the operator such that they can e.g. connect to its serial console to investigate or debug whatever may have caused the test failure. The test suite pauses until the instances left in this state are shut down manually, then continues running further tests (unless interrupted).

This can be materially useful vs. reproducing test failures with manually-reconstructed scenarios via a transcription of a phd-test's instance spec and steps, which may result in unintended differences along the path to the moment of failure due to human-scale timing of guest shell command invocations, or possible errors in transcription of the instance spec. (I also believe this might be a nice convenience to have in general, even absent those factors.)

@lifning lifning requested a review from iximeow March 24, 2026 07:43
When `--manual-stop-on-failure` is passed, each propolis-server in failed
test cases is left running (if it hadn't been shut down by the test case
prior to failure explicitly), and its address is echoed to the operator
such that they can e.g. connect to its serial console to investigate or
debug whatever may have caused the test failure. The test suite pauses
until the instances left in this state are shut down manually, then
continues running further tests (unless interrupted).

Aside from convenience, this can be useful vs. reproducing test failures
with manually-reconstructed scenarios via a transcription of a phd-test's
instance spec and steps, which may have differences due to human-scale
timing of guest shell command invocations.
@lifning lifning force-pushed the lif/phd-plot-armor branch from 7be0c9e to d2b3f44 Compare March 25, 2026 04:40
Copy link
Member

@iximeow iximeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whole smattering of comments. this is neat! and it's cool that the plumbing is not too difficult here, I was a little worried on your behalf at first :D

);

if let Some(tx) = success_tx {
let succeeded = !matches!(&test_outcome, TestOutcome::Failed(_));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also if you clone the outcome and send that along it instead of just failed-or-not, might be nice to have "test failed because ..." as part of the prelude to a particular VM's informational blurb? might get too wordy. also not really attached to this idea as much as it'd be nice to distinguish the receivers versus the "is Option<bool> the test status, or is that the bool, hmm")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my experience the cause of the failure is usually right above this message in the log, so i didn't feel a particular pull to pass it along, but i definitely wouldn't be opposed if you can think of a case for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants