Skip to content

[chore] Allow rebatching in idbatcher#45287

Merged
atoulme merged 10 commits intoopen-telemetry:mainfrom
csmarchbanks:idbatcher-update-batch
Jan 30, 2026
Merged

[chore] Allow rebatching in idbatcher#45287
atoulme merged 10 commits intoopen-telemetry:mainfrom
csmarchbanks:idbatcher-update-batch

Conversation

@csmarchbanks
Copy link
Copy Markdown
Contributor

@csmarchbanks csmarchbanks commented Jan 8, 2026

Description

This change allows moving traces between batches as part of the id batcher. It does this by having the processor store the id of the batch the trace is currently scheduled in which is then passed to the move method. Currently it is only possible to rebatch to batches that will be processed sooner than scheduled as that is what is needed by the processor today. It is also now possible to remove a trace from a batch which is necessary to do when we immediately drop a large trace rather than wait for a decision.

Link to tracking issue

Fixes #45054

Testing

I improved the tests that were already there, but otherwise this is simply a refactor.

Relevant benchmarks show ~5% improved overall throughput:

goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor
cpu: Apple M2
                      │   old.txt   │              new.txt              │
                      │   sec/op    │   sec/op     vs base              │
ProcessorThroughput-8   1.131m ± 1%   1.071m ± 1%  -5.35% (p=0.000 n=8)

                      │   old.txt    │              new.txt               │
                      │     B/s      │     B/s       vs base              │
ProcessorThroughput-8   236.6Mi ± 2%   250.0Mi ± 4%  +5.65% (p=0.000 n=8)

                      │   old.txt    │            new.txt            │
                      │     B/op     │     B/op      vs base         │
ProcessorThroughput-8   4.471Mi ± 0%   4.471Mi ± 0%  ~ (p=0.195 n=8)

                      │   old.txt   │           new.txt            │
                      │  allocs/op  │  allocs/op   vs base         │
ProcessorThroughput-8   36.62k ± 0%   36.62k ± 0%  ~ (p=0.742 n=8)

pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor/internal/idbatcher
                    │   old.txt   │              new.txt               │
                    │   sec/op    │   sec/op     vs base               │
ConcurrentEnqueue-8   164.0n ± 0%   108.4n ± 1%  -33.92% (p=0.000 n=8)

                    │  old.txt   │            new.txt            │
                    │    B/op    │    B/op     vs base           │
ConcurrentEnqueue-8   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=8) ¹
¹ all samples are equal

                    │  old.txt   │            new.txt            │
                    │ allocs/op  │ allocs/op   vs base           │
ConcurrentEnqueue-8   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=8) ¹
¹ all samples are equal

@csmarchbanks csmarchbanks force-pushed the idbatcher-update-batch branch from ef2e23b to 549bc90 Compare January 8, 2026 17:43
@csmarchbanks csmarchbanks marked this pull request as ready for review January 12, 2026 19:46
@csmarchbanks csmarchbanks requested a review from a team as a code owner January 12, 2026 19:46
@csmarchbanks csmarchbanks requested a review from mx-psi January 12, 2026 19:46
@github-actions github-actions Bot added the processor/tailsampling Tail sampling processor label Jan 12, 2026
@csmarchbanks
Copy link
Copy Markdown
Contributor Author

Since this is just a refactor I do not believe a changelog entry is necessary, happy to write one if it is desired however.

Comment thread processor/tailsamplingprocessor/internal/idbatcher/id_batcher.go Outdated
Comment thread processor/tailsamplingprocessor/internal/idbatcher/id_batcher.go
Have the test run at a faster cadence to test more than one batch and
error if duplicate traces are found as well as if any traces are
missing.
The channel logic of the id batcher is not necessary and makes it
challenging to move traces to early batches when a root span is
received. Using a ring buffer allows us to reference a single batch id
and move a trace earlier if desired.
Update the id batcher to allow moving a trace to an earlier batch, which
then allows us to use the same batcher for all trace ids. This reduces
some complexity in the processor and improves efficiency a small amount
as we will not re-check already decided traces.

This also allows us to remove the workaround for traces not part of the
trace map as it will no longer be possible to go through that code twice
for each batcher.
@csmarchbanks csmarchbanks force-pushed the idbatcher-update-batch branch from 5252078 to e977354 Compare January 22, 2026 17:30
@csmarchbanks csmarchbanks requested a review from jmacd as a code owner January 22, 2026 17:30
@csmarchbanks csmarchbanks force-pushed the idbatcher-update-batch branch from e977354 to c78cf1f Compare January 22, 2026 17:42
And make sure that the dropped too early metric is not incremented.
Comment on lines -596 to -603
// Only increment the not found metric if the trace is not in the
// cache. If it is in the cache that means a decision was already
// made and the trace properly released. If using block on overflow
// we can avoid checking the cache as it is not possible to release
// a trace that is still in the batcher with that flow.
if !tsp.blockOnOverflow && !tsp.inCache(id) {
metrics.idNotFoundOnMapCount++
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are back to one batcher we no longer have to worry about the in cache logic as each trace should only have a decision exactly once. I also added a case in the drop traces test to check that this metric is not incremented inappropriately.

Alwas make sure to allocate a small number of elements otherwise initial
allocations can end up being empty before the batcher learns the
appropriate size.
@csmarchbanks csmarchbanks changed the title Allow rebatching in idbatcher [chore] Allow rebatching in idbatcher Jan 29, 2026
Copy link
Copy Markdown
Contributor

@atoulme atoulme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved by codeowner ; refactor not impacting surface APIs.

@atoulme atoulme merged commit 7728f5f into open-telemetry:main Jan 30, 2026
191 checks passed
@otelbot
Copy link
Copy Markdown
Contributor

otelbot Bot commented Jan 30, 2026

Thank you for your contribution @csmarchbanks! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help.

hardik-choksi pushed a commit to hardik-choksi/opentelemetry-collector-contrib that referenced this pull request Feb 2, 2026
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This change allows moving traces between batches as part of the id
batcher. It does this by having the processor store the id of the batch
the trace is currently scheduled in which is then passed to the move
method. Currently it is only possible to rebatch to batches that will be
processed sooner than scheduled as that is what is needed by the
processor today. It is also now possible to remove a trace from a batch
which is necessary to do when we immediately drop a large trace rather
than wait for a decision.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#45054 

<!--Describe what testing was performed and which tests were added.-->
#### Testing
I improved the tests that were already there, but otherwise this is
simply a refactor.

Relevant benchmarks show ~5% improved overall throughput:
```
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor
cpu: Apple M2
                      │   old.txt   │              new.txt              │
                      │   sec/op    │   sec/op     vs base              │
ProcessorThroughput-8   1.131m ± 1%   1.071m ± 1%  -5.35% (p=0.000 n=8)

                      │   old.txt    │              new.txt               │
                      │     B/s      │     B/s       vs base              │
ProcessorThroughput-8   236.6Mi ± 2%   250.0Mi ± 4%  +5.65% (p=0.000 n=8)

                      │   old.txt    │            new.txt            │
                      │     B/op     │     B/op      vs base         │
ProcessorThroughput-8   4.471Mi ± 0%   4.471Mi ± 0%  ~ (p=0.195 n=8)

                      │   old.txt   │           new.txt            │
                      │  allocs/op  │  allocs/op   vs base         │
ProcessorThroughput-8   36.62k ± 0%   36.62k ± 0%  ~ (p=0.742 n=8)

pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor/internal/idbatcher
                    │   old.txt   │              new.txt               │
                    │   sec/op    │   sec/op     vs base               │
ConcurrentEnqueue-8   164.0n ± 0%   108.4n ± 1%  -33.92% (p=0.000 n=8)

                    │  old.txt   │            new.txt            │
                    │    B/op    │    B/op     vs base           │
ConcurrentEnqueue-8   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=8) ¹
¹ all samples are equal

                    │  old.txt   │            new.txt            │
                    │ allocs/op  │ allocs/op   vs base           │
ConcurrentEnqueue-8   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=8) ¹
¹ all samples are equal
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

processor/tailsampling Tail sampling processor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor idbatcher to support rebatching

4 participants