Skip to content

fix: speed up node shutdown#5408

Open
sbackend123 wants to merge 3 commits intomasterfrom
fix/shutdown-node
Open

fix: speed up node shutdown#5408
sbackend123 wants to merge 3 commits intomasterfrom
fix/shutdown-node

Conversation

@sbackend123
Copy link
Copy Markdown
Contributor

Fixed order of closing components. Removed unnecessary logging. Made Close() methods safer. Fixed context in pull sync component.

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

Context: see the linked issue

What was done:

  • Fixed the shutdown order of storages. Previously, StateStore was closed before LocalStore. LocalStore uses BatchStore, which depends on StateStore. During shutdown or in-flight work, BatchStore could access StateStore, causing failed Put/Get operations, with a risk of incomplete flush and inconsistent local state.

  • Fixed a context handling issue in the pull sync component. Upon receiving a shutdown signal, the component waited 10 seconds to allow its sub-goroutines to terminate gracefully. However, the sub-goroutines did not receive the context cancellation signal.

  • Added logs to track how long each component takes to shut down.

  • Removed warning logs in libp2p for context cancellation when the node is shutting down, to avoid log noise.

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

#5397

Screenshots (if appropriate):

AI Disclosure

  • This PR contains code that has been generated by an LLM.
  • I have reviewed the AI generated code thoroughly.
  • I possess the technical expertise to responsibly review the code generated in this PR.

@sbackend123 sbackend123 changed the title fix: speed up node shutdown. fix: speed up node shutdown Mar 23, 2026
Fixed order of closing components. Removed unnecessary logging. Made Close() methods safer. Fixed context in pull sync component.
@sbackend123 sbackend123 marked this pull request as ready for review March 25, 2026 15:26
pkg/api/api.go Outdated
wsWg sync.WaitGroup // wait for all websockets to close on exit
quit chan struct{}
wsWg sync.WaitGroup // wait for all websockets to close on exit
closeOnce sync.Once
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? how come that a quit signal can happen twice?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this is the protection of the double closed channel, which is panicing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that is clear. the question is why would the channel be closed twice in the first place? it suggests a problem in the calling code no?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that it is just a common protection, as there is no way to restrict how many times an exported method is called. I did not notice that the Close method is called twice anywhere, but I suppose that a layer of safety is a good thing here.

quit chan struct{}
newPeer chan struct{}
quit chan struct{}
closeOnce sync.Once
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good hardening of the close functions. I did not like to write so many similar code for service shutdown or closing and wanted to have even the blocking mechanism, to wait until all the goroutines are closed before the Close() function returns (for consistency and reliability), and I have wrote something like this https://pkg.go.dev/resenje.org/x/shutdown#Graceful. If you think that something could be stolen from there, or to make something better and unified for bee, feel free to investigate, or use.

pkg/api/api.go Outdated
wsWg sync.WaitGroup // wait for all websockets to close on exit
quit chan struct{}
wsWg sync.WaitGroup // wait for all websockets to close on exit
closeOnce sync.Once
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this is the protection of the double closed channel, which is panicing.

@sbackend123
Copy link
Copy Markdown
Contributor Author

Good hardening of the close functions. I did not like to write so many similar code for service shutdown or closing and wanted to have even the blocking mechanism, to wait until all the goroutines are closed before the Close() function returns (for consistency and reliability), and I have wrote something like this https://pkg.go.dev/resenje.org/x/shutdown#Graceful. If you think that something could be stolen from there, or to make something better and unified for bee, feel free to investigate, or use.

Good hardening of the close functions. I did not like to write so many similar code for service shutdown or closing and wanted to have even the blocking mechanism, to wait until all the goroutines are closed before the Close() function returns (for consistency and reliability), and I have wrote something like this https://pkg.go.dev/resenje.org/x/shutdown#Graceful. If you think that something could be stolen from there, or to make something better and unified for bee, feel free to investigate, or use.

After discussion with @acud decided to remove sync.Once()

@sbackend123 sbackend123 requested review from acud and janos March 31, 2026 13:23
@janos
Copy link
Copy Markdown
Member

janos commented Mar 31, 2026

After discussion with @acud decided to remove sync.Once()

Cool, it is good that there has been a discussion.

}

b.logger.Debug("starting shutdown", "component", component)
defer b.logger.Debug("finished shutdown", "component", component)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it is worth to add to it the time-span it took to shut down the component

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants