Conversation
Greptile SummaryThis PR fixes a shutdown deadlock in IsaacLab by registering an Key changes:
Issue found:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Main as User Script
participant AL as AppLauncher.__init__
participant SA as SimulationApp
participant AE as atexit module
participant OS as os._exit
Main->>AL: AppLauncher(fast_shutdown=True)
AL->>SA: _create_app()
SA->>AE: register(close) [LIFO position N]
AL->>AE: register(lambda: os._exit(0)) [LIFO position N+1]
Note over AL,AE: N+1 handler runs first at shutdown
Main->>Main: ... simulation runs ...
Main->>Main: sys.exit() / script ends
AE->>OS: os._exit(0) [runs first, LIFO]
Note over AE,OS: Kit's close() / shutdown_and_release_framework() is SKIPPED
Note over OS: Process terminates immediately (no deadlock)
Last reviewed commit: a931e03 |
| # Register our own atexit handler *after* SimulationApp's so it runs first | ||
| # (LIFO order) and force-exits cleanly when fast_shutdown is enabled. | ||
| if self._app.config.get("fast_shutdown", False): | ||
| atexit.register(lambda: os._exit(0)) |
There was a problem hiding this comment.
Hardcoded exit code 0 masks failures
os._exit(0) always signals success regardless of why the interpreter is shutting down. When fast_shutdown is enabled and the simulation exits due to an unhandled exception or a sys.exit(1), this atexit handler fires first (LIFO) and calls os._exit(0), returning a success code to the OS. Any CI/CD pipeline or wrapper script that checks the process exit code will see a false success.
A safer approach is to inspect the current exception state in order to preserve the intended exit code:
def _fast_exit():
import sys
exc_type, exc_val, _ = sys.exc_info()
if exc_type is SystemExit:
code = exc_val.code
os._exit(code if isinstance(code, int) else (1 if code else 0))
elif exc_type is not None:
os._exit(1)
else:
os._exit(0)
atexit.register(_fast_exit)This preserves the exit code that would have been returned without fast_shutdown, while still bypassing the deadlock-prone Kit shutdown sequence.
AntoineRichard
left a comment
There was a problem hiding this comment.
LGTM do we have a test for this?
|
@AntoineRichard this requires more investigation, the best test is to have our unit tests not to hang locally. I filed internal ticket to track this problem. |
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there