Improve SupportBundleService performance, stability, and test coverage#25555
Draft
Improve SupportBundleService performance, stability, and test coverage#25555
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves the Support Bundle feature’s scalability by parallelizing node/cluster data collection, reducing memory pressure during log streaming, and adding full-backend integration tests to cover the end-to-end bundle lifecycle and permissions.
Changes:
- Parallelized per-node and cluster-wide support bundle data collection via
CompletableFutureto reduce latency at scale. - Reduced heap usage and improved streaming correctness when proxying datanode logs.
- Added full-backend integration tests covering manifest/logfile endpoints and bundle build/list/download/delete flows (including security/path traversal checks).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
graylog2-server/src/main/java/org/graylog2/rest/resources/system/debug/bundle/SupportBundleService.java |
Refactors bundle collection to run more work concurrently; improves streaming and bundle listing implementation details. |
full-backend-tests/src/test/java/org/graylog2/rest/resources/system/debug/bundle/SupportBundleResourceIT.java |
Adds end-to-end backend tests for support bundle APIs, permissions, and downloaded zip contents. |
Comments suppressed due to low confidence (1)
graylog2-server/src/main/java/org/graylog2/rest/resources/system/debug/bundle/SupportBundleService.java:561
- listBundles(): Files.list(bundleDir) may include directories; the previous implementation explicitly filtered them out. Without filtering, Files.size(f) can throw (and break listing) if any directory under bundleDir happens to start with the bundle prefix. Consider restoring a regular-file filter (e.g., !Files.isDirectory(p) / Files.isRegularFile(p)) before computing size.
public List<BundleFile> listBundles() {
try (var files = Files.list(bundleDir)) {
return files
.filter(p -> p.getFileName().toString().startsWith(BUNDLE_NAME_PREFIX))
.map(f -> {
try {
return new BundleFile(f.getFileName().toString(), Files.size(f));
} catch (IOException e) {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...rver/src/main/java/org/graylog2/rest/resources/system/debug/bundle/SupportBundleService.java
Outdated
Show resolved
Hide resolved
...rver/src/main/java/org/graylog2/rest/resources/system/debug/bundle/SupportBundleService.java
Outdated
Show resolved
Hide resolved
...s/src/test/java/org/graylog2/rest/resources/system/debug/bundle/SupportBundleResourceIT.java
Show resolved
Hide resolved
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The support bundle service had several performance and correctness issues that became significant at scale: sequential API fan-outs, thread pool starvation risk, memory pressure from buffering large log files, and no test coverage beyond a few unit cases.
/nocl yet
Description
Parallelism
Per-node data collection (thread dump, metrics, system stats, certificates) now runs as independent concurrent tasks instead of sequentially per node. Cluster-wide fan-out calls (requestOnAllNodes) are moved to a short-lived orchestration executor to avoid thread-pool starvation — previously wrapping them in supplyAsync on the shared executor could deadlock when the pool was exhausted by blocking fan-out wrappers waiting on sub-tasks submitted to the same pool.
Efficiency & correctness
Datanode log transfer switches from ResponseBody.bytes() (full heap buffer) to byteStream().transferTo() to avoid large allocations. Files.walk() in listBundles() is replaced with Files.list() since the directory is flat. closeEntry() in writeZipFile() is now guarded so it only runs if putNextEntry() succeeded, preventing a misleading second exception when zip entry creation fails. SimpleDateFormat (not thread-safe, per-call) is replaced with a static DateTimeFormatter. Several internal helpers are simplified or inlined, and public constants that have no external users are made private. Call timeout increased from 10 s to 30 s.
Motivation and Context
Fixes or at least improves https://github.com/Graylog2/graylog-plugin-enterprise/issues/13625
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: