Skip to content

Fast shutdown invokes os.exit#4965

Draft
bareya wants to merge 2 commits intoisaac-sim:developfrom
bareya:pbarejko/fast-shutdown
Draft

Fast shutdown invokes os.exit#4965
bareya wants to merge 2 commits intoisaac-sim:developfrom
bareya:pbarejko/fast-shutdown

Conversation

@bareya
Copy link
Collaborator

@bareya bareya commented Mar 12, 2026

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (existing functionality will not work without user modification)
  • Documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions bot added the isaac-lab Related to Isaac Lab team label Mar 12, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR fixes a shutdown deadlock in IsaacLab by registering an atexit handler (after SimulationApp creates its own) that calls os._exit(0) to forcibly terminate the process when fast_shutdown is configured, bypassing the Kit event-loop pump in shutdown_and_release_framework() that can deadlock during interpreter teardown (e.g. in omni.ui.scene).

Key changes:

  • Adds import atexit to the module imports.
  • After self._create_app(), if self._app.config.get("fast_shutdown", False) is True, registers lambda: os._exit(0) as an atexit handler. Because atexit runs handlers in LIFO order, this handler fires before SimulationApp's own close() handler.

Issue found:

  • The hardcoded os._exit(0) always reports success (exit code 0) to the OS regardless of whether the interpreter is shutting down normally or due to an unhandled exception / sys.exit(non-zero). This silently masks failures and can mislead CI pipelines or scripts that check the process exit code.

Confidence Score: 3/5

  • Safe to merge with caution — the fix correctly resolves the shutdown deadlock but silently masks non-zero exit codes when fast_shutdown is enabled.
  • The approach is sound and the LIFO atexit ordering logic is correct. The one concrete issue is that os._exit(0) always exits with code 0, which can hide failures from CI and wrapper scripts. The fix is small and isolated to the fast_shutdown path, so it poses no risk to the default code path.
  • source/isaaclab/isaaclab/app/app_launcher.py — specifically the hardcoded os._exit(0) on line 173.

Important Files Changed

Filename Overview
source/isaaclab/isaaclab/app/app_launcher.py Registers an atexit handler that calls os._exit(0) when fast_shutdown is enabled, bypassing the deadlock-prone Kit shutdown sequence; exit code is hardcoded to 0, which masks errors/failures during shutdown.

Sequence Diagram

sequenceDiagram
    participant Main as User Script
    participant AL as AppLauncher.__init__
    participant SA as SimulationApp
    participant AE as atexit module
    participant OS as os._exit

    Main->>AL: AppLauncher(fast_shutdown=True)
    AL->>SA: _create_app()
    SA->>AE: register(close)  [LIFO position N]
    AL->>AE: register(lambda: os._exit(0))  [LIFO position N+1]
    Note over AL,AE: N+1 handler runs first at shutdown

    Main->>Main: ... simulation runs ...
    Main->>Main: sys.exit() / script ends

    AE->>OS: os._exit(0)  [runs first, LIFO]
    Note over AE,OS: Kit's close() / shutdown_and_release_framework() is SKIPPED
    Note over OS: Process terminates immediately (no deadlock)
Loading

Last reviewed commit: a931e03

# Register our own atexit handler *after* SimulationApp's so it runs first
# (LIFO order) and force-exits cleanly when fast_shutdown is enabled.
if self._app.config.get("fast_shutdown", False):
atexit.register(lambda: os._exit(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded exit code 0 masks failures

os._exit(0) always signals success regardless of why the interpreter is shutting down. When fast_shutdown is enabled and the simulation exits due to an unhandled exception or a sys.exit(1), this atexit handler fires first (LIFO) and calls os._exit(0), returning a success code to the OS. Any CI/CD pipeline or wrapper script that checks the process exit code will see a false success.

A safer approach is to inspect the current exception state in order to preserve the intended exit code:

def _fast_exit():
    import sys
    exc_type, exc_val, _ = sys.exc_info()
    if exc_type is SystemExit:
        code = exc_val.code
        os._exit(code if isinstance(code, int) else (1 if code else 0))
    elif exc_type is not None:
        os._exit(1)
    else:
        os._exit(0)

atexit.register(_fast_exit)

This preserves the exit code that would have been returned without fast_shutdown, while still bypassing the deadlock-prone Kit shutdown sequence.

Copy link
Collaborator

@AntoineRichard AntoineRichard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM do we have a test for this?

@bareya
Copy link
Collaborator Author

bareya commented Mar 12, 2026

@AntoineRichard this requires more investigation, the best test is to have our unit tests not to hang locally. I filed internal ticket to track this problem.

@bareya bareya marked this pull request as draft March 12, 2026 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants