Skip to content

signal+lnd: exit with non-zero code on internal shutdown#10688

Open
Antisys wants to merge 1 commit intolightningnetwork:masterfrom
Antisys:fix/nonzero-exit-on-internal-shutdown
Open

signal+lnd: exit with non-zero code on internal shutdown#10688
Antisys wants to merge 1 commit intolightningnetwork:masterfrom
Antisys:fix/nonzero-exit-on-internal-shutdown

Conversation

@Antisys
Copy link
Copy Markdown

@Antisys Antisys commented Mar 28, 2026

Summary

  • When lnd auto-shuts down due to an internal error (e.g. chain backend health check failure), Main() previously returned nil, causing exit code 0
  • Adds a shutdownRequested atomic flag to signal.Interceptor, set only by RequestShutdown() (not by OS signals)
  • Main() now returns ErrShutdownRequested on internal shutdown, flowing to os.Exit(1) — user-initiated shutdowns (SIGINT/SIGTERM) still exit 0

Test plan

  • Unit test: TestRequestShutdownSetsFlag — verifies RequestShutdown() sets the flag
  • Unit test: TestOSSignalDoesNotSetRequestedFlag — verifies OS signals do NOT set the flag
  • go build ./cmd/lnd/ compiles cleanly

🤖 Generated with Claude Code

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request improves the reliability of the application's exit behavior by distinguishing between graceful user-initiated shutdowns and forced shutdowns caused by internal errors. By implementing an atomic tracking mechanism, the system can now return a specific error code during internal failures, ensuring that monitoring tools and orchestrators can correctly identify and respond to non-zero exit states.

Highlights

  • Internal Shutdown Error Handling: Introduced ErrShutdownRequested to ensure the application exits with a non-zero status code when an internal error occurs.
  • Shutdown Tracking: Added an atomic shutdownRequested flag to the signal Interceptor to distinguish between internal shutdowns and OS-level signals.
  • Verification: Added comprehensive unit tests to verify that internal shutdown requests set the flag correctly while OS signals do not.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to distinguish between internal shutdown requests (e.g., health check failures) and external OS signals in lnd. It adds a shutdownRequested flag to the Interceptor, an ErrShutdownRequested error, and updates the Main function to return this error when an internal shutdown is triggered. Feedback suggests addressing a potential race condition where an internal shutdown request could override an external signal's exit code and recommends refactoring duplicated logic in Main into a helper function to improve maintainability.

Comment on lines 224 to 231
func (c *Interceptor) RequestShutdown() {
atomic.StoreInt32(&c.shutdownRequested, 1)

select {
case c.shutdownRequestChannel <- struct{}{}:
case <-c.quit:
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential race condition here that could lead to an incorrect exit code.

Consider this scenario:

  1. A user sends a SIGINT (Ctrl+C) to initiate a graceful shutdown (expecting exit code 0).
  2. The mainInterruptHandler receives the signal and starts the shutdown process.
  3. Concurrently, or shortly after, an internal component calls RequestShutdown().
  4. RequestShutdown() will set shutdownRequested to 1, even though the shutdown was already initiated by the user.
  5. As a result, Main() will return ErrShutdownRequested, and the process will exit with code 1, which is not what the user intended for a graceful shutdown.

The root cause is that setting the shutdownRequested flag is not atomic with the decision to shut down inside mainInterruptHandler. The first signal to initiate the shutdown should determine the cause and exit code.

A robust solution might involve more state management within the mainInterruptHandler itself to atomically determine and record the shutdown cause upon the first shutdown signal received.

lnd.go Outdated
Comment on lines 789 to 793
if interceptor.ShutdownWasRequested() {
return ErrShutdownRequested
}

return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of logic to check for an internally-requested shutdown is duplicated at the end of the Main function (lines 819-823). To improve maintainability and avoid duplication, consider extracting this logic into a helper function.

For example:

func getShutdownError(interceptor signal.Interceptor) error {
	if interceptor.ShutdownWasRequested() {
		return ErrShutdownRequested
	}
	return nil
}

Then you could replace this block with return getShutdownError(interceptor). This would make the code more DRY and easier to maintain if the shutdown logic needs to be changed in the future.

@Antisys Antisys force-pushed the fix/nonzero-exit-on-internal-shutdown branch from 5bf23f0 to bb622be Compare March 28, 2026 15:42
@Antisys
Copy link
Copy Markdown
Author

Antisys commented Mar 28, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to distinguish between internal shutdown requests and external OS signals by adding a shutdownRequested flag to the Interceptor and a corresponding ErrShutdownRequested error. The Main function is updated to return this error when an internal shutdown is detected, ensuring a non-zero exit code. Feedback includes a suggestion to refactor duplicated shutdown logic in lnd.go into a helper function and to use the more idiomatic atomic.Bool for the shutdown flag.

lnd.go Outdated
Comment on lines 789 to 793
if interceptor.ShutdownWasRequested() {
return ErrShutdownRequested
}

return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This shutdown handling logic is duplicated later in this file (lines 819-823). To improve maintainability and avoid code duplication, consider extracting this logic into a small helper function.

For example:

func handleShutdown(interceptor signal.Interceptor) error {
	if interceptor.ShutdownWasRequested() {
		return ErrShutdownRequested
	}
	return nil
}

This would simplify both call sites to return handleShutdown(interceptor) after waiting on interceptor.ShutdownChannel().

References
  1. Readable code is the most important requirement for any commit created. Refactoring duplicated code into a helper function improves readability and maintainability. (link)

signal/signal.go Outdated
// failure) rather than an external signal. This is a pointer so that
// it is shared between the Interceptor value returned by Intercept()
// and the goroutine running mainInterruptHandler.
shutdownRequested *int32
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved readability and to use more idiomatic Go (since Go 1.19), consider using *atomic.Bool instead of *int32 for this boolean flag. This would make the intent of the code clearer.

This would involve the following changes:

  • This field's type would become *atomic.Bool.
  • Initialization in Intercept() would be shutdownRequested: new(atomic.Bool).
  • Setting the flag in mainInterruptHandler() would become c.shutdownRequested.Store(true).
  • Checking the flag in ShutdownWasRequested() would become return c.shutdownRequested.Load().

The existing started global variable in this file could also be refactored to use atomic.Bool for consistency.

Suggested change
shutdownRequested *int32
shutdownRequested *atomic.Bool
References
  1. Readable code is the most important requirement for any commit created. Using atomic.Bool for a boolean flag makes the code's intent clearer and more readable. (link)

@Antisys Antisys force-pushed the fix/nonzero-exit-on-internal-shutdown branch from bb622be to 773eab9 Compare March 28, 2026 17:34
@Antisys
Copy link
Copy Markdown
Author

Antisys commented Mar 28, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logic to distinguish between internal and external shutdown requests, allowing lnd to exit with a non-zero code upon internal failures. It adds a new error type, updates the signal interceptor to track the shutdown source, and includes unit tests. Review feedback recommends formatting the error message more compactly and shortening a descriptive comment to better align with style guidelines.

lnd.go Outdated
Comment on lines +147 to +150
var ErrShutdownRequested = errors.New(
"lnd was shut down due to an internal request (e.g. chain backend " +
"failure)",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this formatting is not incorrect, it could be more concise to align with the style guide's preference for minimizing lines for error messages (lines 210-211, 228-229).

var ErrShutdownRequested = errors.New("lnd was shut down due to an internal " +
	"request (e.g. chain backend failure)")
References
  1. The style guide prefers minimizing lines for log and error messages. The current implementation spans four lines, which can be reduced. (link)

signal/signal.go Outdated
Comment on lines +118 to +119
// it is shared between the Interceptor value returned by Intercept()
// and the goroutine running mainInterruptHandler.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment for shutdownRequested is quite long and wraps over multiple lines. While the line length check is left to the linter, the comment could be made more concise for better readability, while still conveying the necessary information.

Suggested change
// it is shared between the Interceptor value returned by Intercept()
// and the goroutine running mainInterruptHandler.
// and the goroutine running mainInterruptHandler.

@Antisys Antisys force-pushed the fix/nonzero-exit-on-internal-shutdown branch from 773eab9 to e2775c4 Compare March 28, 2026 17:58
@Antisys
Copy link
Copy Markdown
Author

Antisys commented Mar 28, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables lnd to distinguish between internal shutdown requests and external OS signals, allowing for a non-zero exit code upon internal failures. It introduces a shutdownRequested flag in the signal interceptor and updates the main loop to return a specific error when this flag is set. A review comment identifies a potential source of flakiness in the new tests where time.Sleep is used for synchronization, recommending a polling approach with require.Eventually instead.

Comment on lines +70 to +72
// Give mainInterruptHandler time to process the signal and set
// isShutdown=true before the internal request arrives.
time.Sleep(50 * time.Millisecond)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using time.Sleep to orchestrate concurrent tests can lead to flakiness, as the sleep duration is arbitrary and may not be sufficient on a slow machine or under heavy load. A more robust approach is to use a polling mechanism to wait for the desired state.

In this case, you can wait for the shutdown process to be initiated by polling interceptor.Alive(). This method returns false once the quit channel is closed, which happens inside the shutdown() function called by the interrupt handler. Using require.Eventually from the testify suite ensures the test waits just long enough for the state to change, making it both faster on average and more reliable.

Suggested change
// Give mainInterruptHandler time to process the signal and set
// isShutdown=true before the internal request arrives.
time.Sleep(50 * time.Millisecond)
// Wait for the shutdown process to start before sending the second
// request. This is more robust than a fixed sleep.
require.Eventually(t, func() bool {
return !interceptor.Alive()
}, time.Second, 10*time.Millisecond, "shutdown did not start")

@Antisys Antisys force-pushed the fix/nonzero-exit-on-internal-shutdown branch from e2775c4 to ec0ed79 Compare March 28, 2026 18:05
@Antisys
Copy link
Copy Markdown
Author

Antisys commented Mar 28, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a mechanism to distinguish between internal shutdown requests and external OS signals, ensuring that lnd exits with a non-zero code when triggered by internal failures. It introduces the ErrShutdownRequested error, updates the signal Interceptor to track the shutdown source, and adds comprehensive unit tests. A review comment identifies an improvement opportunity in the test suite, suggesting the replacement of a time.Sleep call with a more robust polling mechanism using require.Eventually to prevent potential flakiness.


// Allow mainInterruptHandler goroutine to fully exit and reset
// the global started flag so the next test can call Intercept().
time.Sleep(50 * time.Millisecond)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using time.Sleep in tests can introduce flakiness if the goroutine being waited on takes longer to execute than the sleep duration. A more robust approach is to use require.Eventually to poll for the desired state.

This pattern is already used in TestOSSignalThenRequestShutdownNoFlag, and applying it here would improve test reliability and consistency.

You'll need to add import "sync/atomic" to use atomic.LoadInt32.

	require.Eventually(t, func() bool {
		return atomic.LoadInt32(&started) == 0
	}, time.Second, 10*time.Millisecond, "interceptor did not reset")
References
  1. Unit tests must always use the require library. Either table driven unit tests or tests using the rapid library are preferred.

When lnd auto-shuts down due to an internal error such as a chain
backend health check failure, Main() previously returned nil, causing
the process to exit with code 0. This made it impossible for process
supervisors to distinguish a crash from a clean shutdown.

Add a shutdownRequested atomic.Bool to signal.Interceptor that is set
inside mainInterruptHandler only when an internal RequestShutdown() is
the first shutdown trigger. This avoids a race where a late internal
request could override an earlier OS signal shutdown. Main() now
returns ErrShutdownRequested in that case, which flows to os.Exit(1)
in cmd/lnd/main.go. User-initiated shutdowns (SIGINT, SIGTERM)
continue to exit with code 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Antisys Antisys force-pushed the fix/nonzero-exit-on-internal-shutdown branch from ec0ed79 to b2ddd4c Compare March 28, 2026 18:17
@Antisys
Copy link
Copy Markdown
Author

Antisys commented Mar 28, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to distinguish between internal shutdown requests (e.g., health check failures) and external OS signals in lnd. It adds an ErrShutdownRequested error and updates the Main entry point to return this error when an internal shutdown is triggered, ensuring the process exits with a non-zero code. The signal package is updated with an atomic boolean flag to track the shutdown source, and new unit tests are added to verify the logic and handle race conditions between signals and internal requests. I have no feedback to provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant