Skip to content

Conversation

@SuperPaintman
Copy link

@SuperPaintman SuperPaintman commented Dec 12, 2025

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

See #8597 for more details.

Cache RemoteEndpoints.Engines() results, so repeated calls return the same slice of engine.

This improves two things:

  • It gives optimizers a consistent view of Engines() (used in PassthroughOptimizer and DistributedExecutionOptimizer).
  • It avoids recomputing lazy fields (MinT / MaxT / LabelSets) multiple times in optimizers. In the current implementation, .Engines() recreates all remoteEngine's, and effectively resets their cached state.

Verification

  • Added a unit test.
  • Verified the change in an internal fork (has been running for about a week).

Issue: #8597

CC: @MichaHoffmann

@SuperPaintman SuperPaintman force-pushed the 8597-query-distributed-mode-perf-improvements-1 branch 3 times, most recently from 1bfe2cf to a530364 Compare December 12, 2025 07:29
…axT / LabelSets between .Engines() calls

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>
@SuperPaintman SuperPaintman force-pushed the 8597-query-distributed-mode-perf-improvements-1 branch from a530364 to dcc729e Compare December 12, 2025 07:31
@SuperPaintman SuperPaintman changed the title query: cache engines in remoteEndpoints to reuse computed MinT / MaxT / LabelSets values across Engines() calls WIP: query: cache engines in remoteEndpoints to reuse computed MinT / MaxT / LabelSets values across Engines() calls Dec 12, 2025
@MichaHoffmann
Copy link
Contributor

I think this looks reasonable to me; optimizers will have a consistent view of the Engines() then. Currently its used by Passhtrough and Distributed Optimizers.

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>
@pull-request-size pull-request-size bot added size/M and removed size/S labels Dec 15, 2025

// endpoints.Engines() should return exactly the same slice.
testutil.Equals(t, len(engines1), len(engines2))
testutil.Equals(t, cap(engines1), cap(engines2))
Copy link
Author

@SuperPaintman SuperPaintman Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can verify that the underlying slice data is exactly the same:

testutil.Equals(t, unsafe.Pointer(unsafe.SliceData(engines1)), unsafe.Pointer(unsafe.SliceData(engines2)))

But I think, that's an overkill.

This would also be affected if the internal representation changes, e.g.:

-return r.engines
+return slices.Clone(r.engines)

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>
Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>
Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>
@SuperPaintman SuperPaintman changed the title WIP: query: cache engines in remoteEndpoints to reuse computed MinT / MaxT / LabelSets values across Engines() calls query: cache engines in remoteEndpoints to reuse computed MinT / MaxT / LabelSets values across Engines() calls Dec 15, 2025
@SuperPaintman SuperPaintman marked this pull request as ready for review December 15, 2025 03:13
clients := r.getClients()
engines := make([]api.RemoteEngine, len(clients))
for i := range clients {
engines[i] = NewRemoteEngine(r.logger, clients[i], r.opts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this even simpler and do this once in the constructor instead of wrapping everything into sync.Once? In Thanos case, it just creates a slice with a read lock so it should be very fast, no?

Copy link
Author

@SuperPaintman SuperPaintman Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change can look odd outside of the broader context and other changes.

In Thanos case, it just creates a slice with a read lock so it should be very fast, no?

I'm changing that in another PR. Engines() will do more work than just creating a new slice.

And it also needs more info than high-level API endpoints have (e.g. here).

So now I think it makes more sense to have this cache in the promql-engine: thanos-io/promql-engine#680

return engines
func (r *remoteEndpoints) Engines() []api.RemoteEngine {
r.enginesOnce.Do(func() {
clients := r.getClients()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this introduce a bug with dns endpoints? How would new endpoints ever get added to the slice?

Copy link
Author

@SuperPaintman SuperPaintman Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create new remoteEndpoints for each query, and then Engines() is almost immediately called in query optimizers (e.g. here).

So this cache lives only for and during one query, and it doesn't affect adding / updating endpoints.

@SuperPaintman
Copy link
Author

Moved it into promql-engine: thanos-io/promql-engine#680

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants