feat(source sender): make chunk size configurable#25637
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ba2d828047
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| pub(crate) static SOURCE_SENDER_BUFFER_SIZE: LazyLock<usize> = | ||
| LazyLock::new(|| *TRANSFORM_CONCURRENCY_LIMIT * CHUNK_SIZE); | ||
| LazyLock::new(|| *TRANSFORM_CONCURRENCY_LIMIT * chunk_size()); |
There was a problem hiding this comment.
Reject chunk sizes that overflow buffer capacity
When a user supplies a very large but parseable --chunk-size/VECTOR_CHUNK_SIZE, this unchecked multiplication can wrap in release builds. For example, on a 64-bit build --threads 2 --chunk-size 9223372036854775808 makes the source sender buffer size wrap to 0, and the first source build then panics in Output::new_with_buffer when it unwraps NonZeroUsize::new(n); other overflowing values silently produce a much smaller buffer than requested. Please validate the configured chunk size or use checked arithmetic and return a config error instead of allowing startup panics/wraparound.
Useful? React with 👍 / 👎.
Summary
Hi! First, thank you for building and maintaining such a great project.
I would like to propose a feature that helps us reduce memory usage for Vector DaemonSet pods in production.
We use
vector-operatorand try to configure each pipeline through its ownVectorPipelinecustom resource as a fully independentsource -> transforms... -> sinkchain. Users configure these chains independently in their own Kubernetes namespaces, so a configuration issue in one user's pipeline does not affect other users because the pipelines are isolated.The tradeoff is that the logging agent running on Kubernetes nodes can become quite memory-heavy when many independent pipelines are configured. I tried tuning
VECTOR_THREADS, which helped somewhat, but not enough. After looking through the codebase, I found thatCHUNK_SIZEis a useful control point for reducing memory usage because it affects source sender batching and source output buffer sizing.I added this functionality in a fork and tested it in our testing environment. For example, with
VECTOR_CHUNK_SIZE=200, 12 independent pipelines fit into ~2 GB of memory withwhen_full: blockconfigured at every stage, even when several pipelines were blocked because their sinks could not send logs while the downstream collector was unavailable. Vector also handled log write spikes without issues. Interestingly, Vector's own operational metrics in our Grafana dashboards also appeared to become more stable and predictable.This PR adds
--chunk-size/VECTOR_CHUNK_SIZEso operators can tune this value without rebuilding Vector.If this is not the right direction, I would be happy to hear your suggestions.
Vector configuration
Example CLI usage:
The same value can be configured with the environment variable:
Minimal example configuration:
How did you test this PR?
Ran:
cargo check -p vector-core --no-default-features cargo check -p vector-core --no-default-features --features test --tests cargo clippy -p vector --lib cargo vdev run ./test-cfg.yml -- --chunk-size 20 -vvv VECTOR_CHUNK_SIZE=200 cargo vdev run ./test-cfg.yml -- -vvvAlso tested this change in our fork in a Kubernetes environment with multiple independent
VectorPipelinepipelines andVECTOR_CHUNK_SIZE=200.Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
None.
Notes
--chunk-sizeandVECTOR_CHUNK_SIZE.