[chore] Allow rebatching in idbatcher by csmarchbanks · Pull Request #45287 · open-telemetry/opentelemetry-collector-contrib

csmarchbanks · 2026-01-08T17:43:09Z

Description

This change allows moving traces between batches as part of the id batcher. It does this by having the processor store the id of the batch the trace is currently scheduled in which is then passed to the move method. Currently it is only possible to rebatch to batches that will be processed sooner than scheduled as that is what is needed by the processor today. It is also now possible to remove a trace from a batch which is necessary to do when we immediately drop a large trace rather than wait for a decision.

Link to tracking issue

Fixes #45054

Testing

I improved the tests that were already there, but otherwise this is simply a refactor.

Relevant benchmarks show ~5% improved overall throughput:

goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor
cpu: Apple M2
                      │   old.txt   │              new.txt              │
                      │   sec/op    │   sec/op     vs base              │
ProcessorThroughput-8   1.131m ± 1%   1.071m ± 1%  -5.35% (p=0.000 n=8)

                      │   old.txt    │              new.txt               │
                      │     B/s      │     B/s       vs base              │
ProcessorThroughput-8   236.6Mi ± 2%   250.0Mi ± 4%  +5.65% (p=0.000 n=8)

                      │   old.txt    │            new.txt            │
                      │     B/op     │     B/op      vs base         │
ProcessorThroughput-8   4.471Mi ± 0%   4.471Mi ± 0%  ~ (p=0.195 n=8)

                      │   old.txt   │           new.txt            │
                      │  allocs/op  │  allocs/op   vs base         │
ProcessorThroughput-8   36.62k ± 0%   36.62k ± 0%  ~ (p=0.742 n=8)

pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor/internal/idbatcher
                    │   old.txt   │              new.txt               │
                    │   sec/op    │   sec/op     vs base               │
ConcurrentEnqueue-8   164.0n ± 0%   108.4n ± 1%  -33.92% (p=0.000 n=8)

                    │  old.txt   │            new.txt            │
                    │    B/op    │    B/op     vs base           │
ConcurrentEnqueue-8   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=8) ¹
¹ all samples are equal

                    │  old.txt   │            new.txt            │
                    │ allocs/op  │ allocs/op   vs base           │
ConcurrentEnqueue-8   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=8) ¹
¹ all samples are equal

csmarchbanks · 2026-01-12T19:49:07Z

Since this is just a refactor I do not believe a changelog entry is necessary, happy to write one if it is desired however.

Have the test run at a faster cadence to test more than one batch and error if duplicate traces are found as well as if any traces are missing.

The channel logic of the id batcher is not necessary and makes it challenging to move traces to early batches when a root span is received. Using a ring buffer allows us to reference a single batch id and move a trace earlier if desired.

Update the id batcher to allow moving a trace to an earlier batch, which then allows us to use the same batcher for all trace ids. This reduces some complexity in the processor and improves efficiency a small amount as we will not re-check already decided traces. This also allows us to remove the workaround for traces not part of the trace map as it will no longer be possible to go through that code twice for each batcher.

And make sure that the dropped too early metric is not incremented.

csmarchbanks · 2026-01-22T18:20:24Z

-			// Only increment the not found metric if the trace is not in the
-			// cache. If it is in the cache that means a decision was already
-			// made and the trace properly released. If using block on overflow
-			// we can avoid checking the cache as it is not possible to release
-			// a trace that is still in the batcher with that flow.
-			if !tsp.blockOnOverflow && !tsp.inCache(id) {
-				metrics.idNotFoundOnMapCount++
-			}


Since we are back to one batcher we no longer have to worry about the in cache logic as each trace should only have a decision exactly once. I also added a case in the drop traces test to check that this metric is not incremented inappropriately.

Alwas make sure to allocate a small number of elements otherwise initial allocations can end up being empty before the batcher learns the appropriate size.

atoulme

Approved by codeowner ; refactor not impacting surface APIs.

otelbot · 2026-01-30T17:10:39Z

Thank you for your contribution @csmarchbanks! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help.

#### Description This change allows moving traces between batches as part of the id batcher. It does this by having the processor store the id of the batch the trace is currently scheduled in which is then passed to the move method. Currently it is only possible to rebatch to batches that will be processed sooner than scheduled as that is what is needed by the processor today. It is also now possible to remove a trace from a batch which is necessary to do when we immediately drop a large trace rather than wait for a decision.  #### Link to tracking issue Fixes open-telemetry#45054  #### Testing I improved the tests that were already there, but otherwise this is simply a refactor. Relevant benchmarks show ~5% improved overall throughput: ``` goos: darwin goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor cpu: Apple M2 │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ ProcessorThroughput-8 1.131m ± 1% 1.071m ± 1% -5.35% (p=0.000 n=8) │ old.txt │ new.txt │ │ B/s │ B/s vs base │ ProcessorThroughput-8 236.6Mi ± 2% 250.0Mi ± 4% +5.65% (p=0.000 n=8) │ old.txt │ new.txt │ │ B/op │ B/op vs base │ ProcessorThroughput-8 4.471Mi ± 0% 4.471Mi ± 0% ~ (p=0.195 n=8) │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ ProcessorThroughput-8 36.62k ± 0% 36.62k ± 0% ~ (p=0.742 n=8) pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor/internal/idbatcher │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ ConcurrentEnqueue-8 164.0n ± 0% 108.4n ± 1% -33.92% (p=0.000 n=8) │ old.txt │ new.txt │ │ B/op │ B/op vs base │ ConcurrentEnqueue-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=8) ¹ ¹ all samples are equal │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ ConcurrentEnqueue-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=8) ¹ ¹ all samples are equal ```

csmarchbanks force-pushed the idbatcher-update-batch branch from ef2e23b to 549bc90 Compare January 8, 2026 17:43

csmarchbanks marked this pull request as ready for review January 12, 2026 19:46

csmarchbanks requested a review from a team as a code owner January 12, 2026 19:46

csmarchbanks requested a review from mx-psi January 12, 2026 19:46

github-actions Bot assigned bogdandrutu Jan 12, 2026

github-actions Bot added the processor/tailsampling Tail sampling processor label Jan 12, 2026

portertech reviewed Jan 12, 2026

View reviewed changes

Comment thread processor/tailsamplingprocessor/internal/idbatcher/id_batcher.go Outdated

portertech reviewed Jan 12, 2026

View reviewed changes

Comment thread processor/tailsamplingprocessor/internal/idbatcher/id_batcher.go

csmarchbanks added 6 commits January 22, 2026 10:11

Improve id batcher test

17b5ac7

Have the test run at a faster cadence to test more than one batch and error if duplicate traces are found as well as if any traces are missing.

Refactor id batcher to use a ring buffer

81c2c77

The channel logic of the id batcher is not necessary and makes it challenging to move traces to early batches when a root span is received. Using a ring buffer allows us to reference a single batch id and move a trace earlier if desired.

Fix lint errors

002207f

Fix race in id_batcher.go

65e0016

Simplify locking

a9064cf

csmarchbanks force-pushed the idbatcher-update-batch branch from 5252078 to e977354 Compare January 22, 2026 17:30

csmarchbanks requested a review from jmacd as a code owner January 22, 2026 17:30

Remove dropped large traces from their batch

c78cf1f

csmarchbanks force-pushed the idbatcher-update-batch branch from e977354 to c78cf1f Compare January 22, 2026 17:42

Add test for new metric

1a49ef4

And make sure that the dropped too early metric is not incremented.

csmarchbanks commented Jan 22, 2026

View reviewed changes

portertech approved these changes Jan 22, 2026

View reviewed changes

Default newBatchesInitialCapacity

a2687e0

Alwas make sure to allocate a small number of elements otherwise initial allocations can end up being empty before the batcher learns the appropriate size.

csmarchbanks changed the title ~~Allow rebatching in idbatcher~~ [chore] Allow rebatching in idbatcher Jan 29, 2026

Merge branch 'main' into idbatcher-update-batch

94df6fa

atoulme approved these changes Jan 30, 2026

View reviewed changes

atoulme merged commit 7728f5f into open-telemetry:main Jan 30, 2026
191 checks passed

github-actions Bot mentioned this pull request Jan 30, 2026

[exporter/prometheusremotewriteexporter]: Report for failed tests on main #45288

Closed

csmarchbanks mentioned this pull request Apr 10, 2026

REQUEST: New membership for csmarchbanks open-telemetry/community#3363

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chore] Allow rebatching in idbatcher#45287

[chore] Allow rebatching in idbatcher#45287
atoulme merged 10 commits intoopen-telemetry:mainfrom
csmarchbanks:idbatcher-update-batch

csmarchbanks commented Jan 8, 2026 •

edited

Loading

Uh oh!

csmarchbanks commented Jan 12, 2026

Uh oh!

Uh oh!

Uh oh!

csmarchbanks Jan 22, 2026

Uh oh!

atoulme left a comment

Uh oh!

Uh oh!

otelbot Bot commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

csmarchbanks commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Link to tracking issue

Testing

Uh oh!

csmarchbanks commented Jan 12, 2026

Uh oh!

Uh oh!

Uh oh!

csmarchbanks Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

atoulme left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

otelbot Bot commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

csmarchbanks commented Jan 8, 2026 •

edited

Loading