[chore] Allow rebatching in idbatcher#45287
Conversation
ef2e23b to
549bc90
Compare
|
Since this is just a refactor I do not believe a changelog entry is necessary, happy to write one if it is desired however. |
Have the test run at a faster cadence to test more than one batch and error if duplicate traces are found as well as if any traces are missing.
The channel logic of the id batcher is not necessary and makes it challenging to move traces to early batches when a root span is received. Using a ring buffer allows us to reference a single batch id and move a trace earlier if desired.
Update the id batcher to allow moving a trace to an earlier batch, which then allows us to use the same batcher for all trace ids. This reduces some complexity in the processor and improves efficiency a small amount as we will not re-check already decided traces. This also allows us to remove the workaround for traces not part of the trace map as it will no longer be possible to go through that code twice for each batcher.
5252078 to
e977354
Compare
e977354 to
c78cf1f
Compare
And make sure that the dropped too early metric is not incremented.
| // Only increment the not found metric if the trace is not in the | ||
| // cache. If it is in the cache that means a decision was already | ||
| // made and the trace properly released. If using block on overflow | ||
| // we can avoid checking the cache as it is not possible to release | ||
| // a trace that is still in the batcher with that flow. | ||
| if !tsp.blockOnOverflow && !tsp.inCache(id) { | ||
| metrics.idNotFoundOnMapCount++ | ||
| } |
There was a problem hiding this comment.
Since we are back to one batcher we no longer have to worry about the in cache logic as each trace should only have a decision exactly once. I also added a case in the drop traces test to check that this metric is not incremented inappropriately.
Alwas make sure to allocate a small number of elements otherwise initial allocations can end up being empty before the batcher learns the appropriate size.
atoulme
left a comment
There was a problem hiding this comment.
Approved by codeowner ; refactor not impacting surface APIs.
|
Thank you for your contribution @csmarchbanks! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help. |
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This change allows moving traces between batches as part of the id batcher. It does this by having the processor store the id of the batch the trace is currently scheduled in which is then passed to the move method. Currently it is only possible to rebatch to batches that will be processed sooner than scheduled as that is what is needed by the processor today. It is also now possible to remove a trace from a batch which is necessary to do when we immediately drop a large trace rather than wait for a decision. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes open-telemetry#45054 <!--Describe what testing was performed and which tests were added.--> #### Testing I improved the tests that were already there, but otherwise this is simply a refactor. Relevant benchmarks show ~5% improved overall throughput: ``` goos: darwin goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor cpu: Apple M2 │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ ProcessorThroughput-8 1.131m ± 1% 1.071m ± 1% -5.35% (p=0.000 n=8) │ old.txt │ new.txt │ │ B/s │ B/s vs base │ ProcessorThroughput-8 236.6Mi ± 2% 250.0Mi ± 4% +5.65% (p=0.000 n=8) │ old.txt │ new.txt │ │ B/op │ B/op vs base │ ProcessorThroughput-8 4.471Mi ± 0% 4.471Mi ± 0% ~ (p=0.195 n=8) │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ ProcessorThroughput-8 36.62k ± 0% 36.62k ± 0% ~ (p=0.742 n=8) pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor/internal/idbatcher │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ ConcurrentEnqueue-8 164.0n ± 0% 108.4n ± 1% -33.92% (p=0.000 n=8) │ old.txt │ new.txt │ │ B/op │ B/op vs base │ ConcurrentEnqueue-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=8) ¹ ¹ all samples are equal │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ ConcurrentEnqueue-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=8) ¹ ¹ all samples are equal ```
Description
This change allows moving traces between batches as part of the id batcher. It does this by having the processor store the id of the batch the trace is currently scheduled in which is then passed to the move method. Currently it is only possible to rebatch to batches that will be processed sooner than scheduled as that is what is needed by the processor today. It is also now possible to remove a trace from a batch which is necessary to do when we immediately drop a large trace rather than wait for a decision.
Link to tracking issue
Fixes #45054
Testing
I improved the tests that were already there, but otherwise this is simply a refactor.
Relevant benchmarks show ~5% improved overall throughput: