Skip to content

feat: post-pass shard-group reconciliation for asymmetric partial-load matchers#19557

Closed
clintropolis wants to merge 2 commits into
apache:masterfrom
clintropolis:partial-load-ensure-core-partitions-loaded
Closed

feat: post-pass shard-group reconciliation for asymmetric partial-load matchers#19557
clintropolis wants to merge 2 commits into
apache:masterfrom
clintropolis:partial-load-ensure-core-partitions-loaded

Conversation

@clintropolis

Copy link
Copy Markdown
Member

Description

This PR adds a post-processing step to RunRules so that partial-load matchers that resolve differently across segments of a shard group (e.g. ClusterGroupPartialLoadMatcher over range-partitioned segments) don't leave the broker with an incomplete PartitionHolder. Rule.run now returns a new RuleRunResult object that RunRules can use to 'do stuff', though most implementations today do RuleRunResult.OK. For these asymmetric partial matchers, there is a ShardGroupFollowup implementation of RuleRunResult that RunRules can use to check the siblings of a shard group to perform an 'empty' partial load to ensure that the group is fully available. I am unsure if there is enough of a pattern to make RunRules more generically handle RuleRunResult, i was planning to wait and see if any other use cases pop up before trying to make this handling more generic.

Doing an empty load like this seemed cheaper than trying to figure out how to allow the timeline to sometimes allow incomplete groups to appear as complete, since the empty partial load should be cheap for the target historicals.

changes:

  • adds PartialLoadMatcher.emptyMatch (default null) as an opt-in way to do a zero-content "load" for matchers that can resolve asymmetrically.
  • Rule.run returns a RuleRunResult (OK constant for most rules). PartialLoadRule returns a ShardGroupFollowup on a positive match, but only when numCorePartitions > 0.
  • RunRules streams followups in a buffer keyed by (dataSource, interval, version), flushing at iteration boundaries (NEWEST_SEGMENT_FIRST groups segments contiguously by that triple). Flush dispatches emptyMatch loads to unmatched core siblings; siblings not part of the core partition group are skipped.
  • adds TimelineLookup.findChunks which returns a defensive copy via PartitionHolder.copyWithOnlyVisibleChunks so iteration is safe outside the timeline lock.
  • PartialClusterGroupLoadSpec allows an empty clusterGroupIndices list as the wire form of an empty match.

…d matchers

changes:
* adds `PartialLoadMatcher.emptyMatch` (default null) as an opt-in way to do a zero-content "load" for matchers that can resolve asymmetrically.
* `Rule.run` returns a `RuleRunResult` (`OK` constant for most rules). `PartialLoadRule` returns a `ShardGroupFollowup` on a positive match, but only when numCorePartitions > 0.
* `RunRules` streams followups in a buffer keyed by (dataSource, interval, version), flushing at iteration boundaries (NEWEST_SEGMENT_FIRST groups segments contiguously by that triple). Flush dispatches emptyMatch loads to unmatched core siblings; siblings not part of the core partition group are skipped.
* adds `TimelineLookup.findChunks` which returns a defensive copy via `PartitionHolder.copyWithOnlyVisibleChunks` so iteration is safe outside the timeline lock.
* `PartialClusterGroupLoadSpec` allows an empty `clusterGroupIndices` list as the wire form of an empty match.
* </ul>
*/
@Nullable
default MatchResult emptyMatch(DataSegment segment, Map<String, Object> baseLoadSpec)
*/
public interface RuleRunResult
{
RuleRunResult OK = new RuleRunResult()
* where different partitions of a shard group resolve to different load specs and the broker would otherwise drop
* the group as incomplete via {@code PartitionHolder.isComplete()}.
*/
public record ShardGroupFollowup(

@capistrant capistrant left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty cool idea! Biggest concern is behavior when onCannotMatch = FULL_LOAD won't we (potentially) create follow ups for segments that actually loaded?

// Only the core partition group has an atomic-replace completeness requirement on the broker, segments with
// no core partitions (e.g. append-only streaming) don't need sibling reconciliation.
if (segment.getShardSpec().getNumCorePartitions() > 0) {
return new ShardGroupFollowup(segment, matcher, getTieredReplicants());

@capistrant capistrant Jun 5, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about how this behaves when usingonCannotMatch == FULL_LOAD like won't this cause empty loads of other segments that fell through to full load?

@capistrant capistrant Jun 5, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this create core partition set follow ups even when segments who are outside of the core are the only ones who matched? do we want that?

@clintropolis

Copy link
Copy Markdown
Member Author

^ closed in favor of #19565

@clintropolis clintropolis deleted the partial-load-ensure-core-partitions-loaded branch June 9, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants