signal+lnd: exit with non-zero code on internal shutdown#10688
signal+lnd: exit with non-zero code on internal shutdown#10688Antisys wants to merge 1 commit intolightningnetwork:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request improves the reliability of the application's exit behavior by distinguishing between graceful user-initiated shutdowns and forced shutdowns caused by internal errors. By implementing an atomic tracking mechanism, the system can now return a specific error code during internal failures, ensuring that monitoring tools and orchestrators can correctly identify and respond to non-zero exit states. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to distinguish between internal shutdown requests (e.g., health check failures) and external OS signals in lnd. It adds a shutdownRequested flag to the Interceptor, an ErrShutdownRequested error, and updates the Main function to return this error when an internal shutdown is triggered. Feedback suggests addressing a potential race condition where an internal shutdown request could override an external signal's exit code and recommends refactoring duplicated logic in Main into a helper function to improve maintainability.
| func (c *Interceptor) RequestShutdown() { | ||
| atomic.StoreInt32(&c.shutdownRequested, 1) | ||
|
|
||
| select { | ||
| case c.shutdownRequestChannel <- struct{}{}: | ||
| case <-c.quit: | ||
| } | ||
| } |
There was a problem hiding this comment.
There is a potential race condition here that could lead to an incorrect exit code.
Consider this scenario:
- A user sends a SIGINT (Ctrl+C) to initiate a graceful shutdown (expecting exit code 0).
- The
mainInterruptHandlerreceives the signal and starts the shutdown process. - Concurrently, or shortly after, an internal component calls
RequestShutdown(). RequestShutdown()will setshutdownRequestedto 1, even though the shutdown was already initiated by the user.- As a result,
Main()will returnErrShutdownRequested, and the process will exit with code 1, which is not what the user intended for a graceful shutdown.
The root cause is that setting the shutdownRequested flag is not atomic with the decision to shut down inside mainInterruptHandler. The first signal to initiate the shutdown should determine the cause and exit code.
A robust solution might involve more state management within the mainInterruptHandler itself to atomically determine and record the shutdown cause upon the first shutdown signal received.
lnd.go
Outdated
| if interceptor.ShutdownWasRequested() { | ||
| return ErrShutdownRequested | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
This block of logic to check for an internally-requested shutdown is duplicated at the end of the Main function (lines 819-823). To improve maintainability and avoid duplication, consider extracting this logic into a helper function.
For example:
func getShutdownError(interceptor signal.Interceptor) error {
if interceptor.ShutdownWasRequested() {
return ErrShutdownRequested
}
return nil
}Then you could replace this block with return getShutdownError(interceptor). This would make the code more DRY and easier to maintain if the shutdown logic needs to be changed in the future.
5bf23f0 to
bb622be
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to distinguish between internal shutdown requests and external OS signals by adding a shutdownRequested flag to the Interceptor and a corresponding ErrShutdownRequested error. The Main function is updated to return this error when an internal shutdown is detected, ensuring a non-zero exit code. Feedback includes a suggestion to refactor duplicated shutdown logic in lnd.go into a helper function and to use the more idiomatic atomic.Bool for the shutdown flag.
lnd.go
Outdated
| if interceptor.ShutdownWasRequested() { | ||
| return ErrShutdownRequested | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
This shutdown handling logic is duplicated later in this file (lines 819-823). To improve maintainability and avoid code duplication, consider extracting this logic into a small helper function.
For example:
func handleShutdown(interceptor signal.Interceptor) error {
if interceptor.ShutdownWasRequested() {
return ErrShutdownRequested
}
return nil
}This would simplify both call sites to return handleShutdown(interceptor) after waiting on interceptor.ShutdownChannel().
References
- Readable code is the most important requirement for any commit created. Refactoring duplicated code into a helper function improves readability and maintainability. (link)
signal/signal.go
Outdated
| // failure) rather than an external signal. This is a pointer so that | ||
| // it is shared between the Interceptor value returned by Intercept() | ||
| // and the goroutine running mainInterruptHandler. | ||
| shutdownRequested *int32 |
There was a problem hiding this comment.
For improved readability and to use more idiomatic Go (since Go 1.19), consider using *atomic.Bool instead of *int32 for this boolean flag. This would make the intent of the code clearer.
This would involve the following changes:
- This field's type would become
*atomic.Bool. - Initialization in
Intercept()would beshutdownRequested: new(atomic.Bool). - Setting the flag in
mainInterruptHandler()would becomec.shutdownRequested.Store(true). - Checking the flag in
ShutdownWasRequested()would becomereturn c.shutdownRequested.Load().
The existing started global variable in this file could also be refactored to use atomic.Bool for consistency.
| shutdownRequested *int32 | |
| shutdownRequested *atomic.Bool |
References
- Readable code is the most important requirement for any commit created. Using
atomic.Boolfor a boolean flag makes the code's intent clearer and more readable. (link)
bb622be to
773eab9
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces logic to distinguish between internal and external shutdown requests, allowing lnd to exit with a non-zero code upon internal failures. It adds a new error type, updates the signal interceptor to track the shutdown source, and includes unit tests. Review feedback recommends formatting the error message more compactly and shortening a descriptive comment to better align with style guidelines.
lnd.go
Outdated
| var ErrShutdownRequested = errors.New( | ||
| "lnd was shut down due to an internal request (e.g. chain backend " + | ||
| "failure)", | ||
| ) |
There was a problem hiding this comment.
While this formatting is not incorrect, it could be more concise to align with the style guide's preference for minimizing lines for error messages (lines 210-211, 228-229).
var ErrShutdownRequested = errors.New("lnd was shut down due to an internal " +
"request (e.g. chain backend failure)")References
- The style guide prefers minimizing lines for log and error messages. The current implementation spans four lines, which can be reduced. (link)
signal/signal.go
Outdated
| // it is shared between the Interceptor value returned by Intercept() | ||
| // and the goroutine running mainInterruptHandler. |
There was a problem hiding this comment.
The comment for shutdownRequested is quite long and wraps over multiple lines. While the line length check is left to the linter, the comment could be made more concise for better readability, while still conveying the necessary information.
| // it is shared between the Interceptor value returned by Intercept() | |
| // and the goroutine running mainInterruptHandler. | |
| // and the goroutine running mainInterruptHandler. |
773eab9 to
e2775c4
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enables lnd to distinguish between internal shutdown requests and external OS signals, allowing for a non-zero exit code upon internal failures. It introduces a shutdownRequested flag in the signal interceptor and updates the main loop to return a specific error when this flag is set. A review comment identifies a potential source of flakiness in the new tests where time.Sleep is used for synchronization, recommending a polling approach with require.Eventually instead.
signal/signal_test.go
Outdated
| // Give mainInterruptHandler time to process the signal and set | ||
| // isShutdown=true before the internal request arrives. | ||
| time.Sleep(50 * time.Millisecond) |
There was a problem hiding this comment.
Using time.Sleep to orchestrate concurrent tests can lead to flakiness, as the sleep duration is arbitrary and may not be sufficient on a slow machine or under heavy load. A more robust approach is to use a polling mechanism to wait for the desired state.
In this case, you can wait for the shutdown process to be initiated by polling interceptor.Alive(). This method returns false once the quit channel is closed, which happens inside the shutdown() function called by the interrupt handler. Using require.Eventually from the testify suite ensures the test waits just long enough for the state to change, making it both faster on average and more reliable.
| // Give mainInterruptHandler time to process the signal and set | |
| // isShutdown=true before the internal request arrives. | |
| time.Sleep(50 * time.Millisecond) | |
| // Wait for the shutdown process to start before sending the second | |
| // request. This is more robust than a fixed sleep. | |
| require.Eventually(t, func() bool { | |
| return !interceptor.Alive() | |
| }, time.Second, 10*time.Millisecond, "shutdown did not start") |
e2775c4 to
ec0ed79
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a mechanism to distinguish between internal shutdown requests and external OS signals, ensuring that lnd exits with a non-zero code when triggered by internal failures. It introduces the ErrShutdownRequested error, updates the signal Interceptor to track the shutdown source, and adds comprehensive unit tests. A review comment identifies an improvement opportunity in the test suite, suggesting the replacement of a time.Sleep call with a more robust polling mechanism using require.Eventually to prevent potential flakiness.
signal/signal_test.go
Outdated
|
|
||
| // Allow mainInterruptHandler goroutine to fully exit and reset | ||
| // the global started flag so the next test can call Intercept(). | ||
| time.Sleep(50 * time.Millisecond) |
There was a problem hiding this comment.
Using time.Sleep in tests can introduce flakiness if the goroutine being waited on takes longer to execute than the sleep duration. A more robust approach is to use require.Eventually to poll for the desired state.
This pattern is already used in TestOSSignalThenRequestShutdownNoFlag, and applying it here would improve test reliability and consistency.
You'll need to add import "sync/atomic" to use atomic.LoadInt32.
require.Eventually(t, func() bool {
return atomic.LoadInt32(&started) == 0
}, time.Second, 10*time.Millisecond, "interceptor did not reset")References
- Unit tests must always use the
requirelibrary. Either table driven unit tests or tests using therapidlibrary are preferred.
When lnd auto-shuts down due to an internal error such as a chain backend health check failure, Main() previously returned nil, causing the process to exit with code 0. This made it impossible for process supervisors to distinguish a crash from a clean shutdown. Add a shutdownRequested atomic.Bool to signal.Interceptor that is set inside mainInterruptHandler only when an internal RequestShutdown() is the first shutdown trigger. This avoids a race where a late internal request could override an earlier OS signal shutdown. Main() now returns ErrShutdownRequested in that case, which flows to os.Exit(1) in cmd/lnd/main.go. User-initiated shutdowns (SIGINT, SIGTERM) continue to exit with code 0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ec0ed79 to
b2ddd4c
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to distinguish between internal shutdown requests (e.g., health check failures) and external OS signals in lnd. It adds an ErrShutdownRequested error and updates the Main entry point to return this error when an internal shutdown is triggered, ensuring the process exits with a non-zero code. The signal package is updated with an atomic boolean flag to track the shutdown source, and new unit tests are added to verify the logic and handle race conditions between signals and internal requests. I have no feedback to provide.
Summary
Main()previously returnednil, causing exit code 0shutdownRequestedatomic flag tosignal.Interceptor, set only byRequestShutdown()(not by OS signals)Main()now returnsErrShutdownRequestedon internal shutdown, flowing toos.Exit(1)— user-initiated shutdowns (SIGINT/SIGTERM) still exit 0Test plan
TestRequestShutdownSetsFlag— verifiesRequestShutdown()sets the flagTestOSSignalDoesNotSetRequestedFlag— verifies OS signals do NOT set the flaggo build ./cmd/lnd/compiles cleanly🤖 Generated with Claude Code