Skip to content

[opt](group-commit) Skip createLocation in group commit stream load sink#63561

Open
liaoxin01 wants to merge 1 commit into
apache:masterfrom
liaoxin01:opt-groupcommit-skip-sink-location-master
Open

[opt](group-commit) Skip createLocation in group commit stream load sink#63561
liaoxin01 wants to merge 1 commit into
apache:masterfrom
liaoxin01:opt-groupcommit-skip-sink-location-master

Conversation

@liaoxin01
Copy link
Copy Markdown
Contributor

Summary

The BE-side GroupCommitBlockSinkOperatorX::init does not consume TOlapTableSink.location or slave_location (it only reads tuple_id / schema / db_id / table_id / partition / group_commit_mode / load_id / max_filter_ratio). However, FE still runs createLocation, which iterates O(partitions * indexes * tablets * replicas) and, for every replica, takes the CloudSystemInfoService RW read lock via CloudReplica.getCurrentClusterId.

Under high-concurrency group commit stream load on wide-partition tables (3000+ partitions in a real production incident), CAS contention on the RW lock's state cache line saturated all FE CPUs, and the cluster could not recover even after scaling out (more cores = more CAS contenders = worse contention).

Change

  • Introduce a protected initLocationParams(TOlapTableSink) hook on OlapTableSink. Default behavior delegates to createLocation, so non-group-commit sinks are unaffected.
  • Route both init(...) overloads in OlapTableSink through the hook.
  • GroupCommitBlockSink overrides the hook to return empty placeholder TOlapTableLocationParam objects. TOlapTableSink.location is a required thrift field, so we still set non-null placeholders, but no tablet/replica enumeration happens.

Effect on the group-commit path:

  • Per-request FE CPU: O(partitions * indexes * tablets * replicas)O(1)
  • CloudSystemInfoService RW lock acquisitions: hundreds of concurrent CAS spinners → 0

Test plan

  • Added GroupCommitBlockSinkTest covering:
    • initLocationParams returns 2 placeholders with empty tablet lists (verifies the override is what runs, not createLocation).
    • parseGroupCommit parses async_mode / sync_mode / off_mode (case-insensitive) and returns null for unknown values.
  • Existing regression tests for stream load with group_commit=true still pass.
  • Manual high-concurrency stream load run on a wide-partition table to confirm FE CPU is no longer dominated by CloudSystemInfoService lock contention.

Copilot AI review requested due to automatic review settings May 23, 2026 14:43
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liaoxin01
Copy link
Copy Markdown
Contributor Author

/review

@liaoxin01
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Skips the expensive createLocation tablet/replica enumeration for the group-commit stream load sink, which under cloud mode contends on CloudSystemInfoService's RW lock and dominates FE CPU on wide-partition tables. BE's GroupCommitBlockSinkOperatorX::init does not consume location/slave_location, so empty placeholders satisfy the required thrift field at O(1) cost.

Changes:

  • Add a protected initLocationParams(TOlapTableSink) hook on OlapTableSink (default delegates to createLocation) and route both init overloads through it.
  • Override initLocationParams in GroupCommitBlockSink to return two empty placeholder TOlapTableLocationParam objects.
  • Add GroupCommitBlockSinkTest covering the override and parseGroupCommit parsing.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
fe/fe-core/src/main/java/org/apache/doris/planner/OlapTableSink.java Introduces initLocationParams hook and routes both init overloads through it.
fe/fe-core/src/main/java/org/apache/doris/planner/GroupCommitBlockSink.java Overrides the hook to return empty placeholder location params, skipping replica enumeration.
fe/fe-core/src/test/java/org/apache/doris/planner/GroupCommitBlockSinkTest.java New unit tests for the override behavior and parseGroupCommit.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found.

Critical checkpoint conclusions:

  • Goal/test coverage: The change avoids FE tablet replica location enumeration for GroupCommitBlockSink while preserving default OlapTableSink behavior. The added unit test covers the override contract and group_commit parsing. I could not run it locally because thirdparty/installed/bin/protoc is absent in this runner, and FE build instructions require stopping before Maven when protoc is missing.
  • Scope/focus: The implementation is small and focused: a protected hook plus a GroupCommitBlockSink override.
  • Concurrency/locking: The changed FE code does not introduce new shared mutable state or locks. It removes the high-contention createLocation path from the per-request group commit block sink initialization.
  • Lifecycle/static initialization: No new lifecycle-sensitive objects or static initialization dependencies were added.
  • Configuration/compatibility: No config items or storage/protocol-incompatible fields were added. Required thrift location fields are still populated with non-null placeholders.
  • Parallel code paths: Both relevant OlapTableSink init overloads now route through the hook; non-group-commit and RemoteOlapTableSink paths retain createLocation via the default implementation.
  • Error handling/data correctness: BE GroupCommitBlockSinkOperatorX::init only consumes schema, partition, ids, group_commit_mode, load_id, and max_filter_ratio; the real internal group-commit insert plan still obtains the normal OLAP sink planning path. I did not find a data visibility or transaction correctness regression from the skipped placeholder locations.
  • Observability/performance: The change addresses the intended hot path without adding noisy logs. Existing group commit BE logging remains available.

User focus: No additional user-provided review focus was supplied.

The BE-side GroupCommitBlockSinkOperatorX::init does not consume
TOlapTableSink.location or slave_location (it only reads tuple_id,
schema, db_id, table_id, partition, group_commit_mode, load_id and
max_filter_ratio). However, FE still ran createLocation, which iterates
O(partitions * indexes * tablets * replicas) and, for every replica,
takes the CloudSystemInfoService RW read lock via
CloudReplica.getCurrentClusterId. Under high-concurrency group commit
stream load on wide-partition tables (3000+ partitions in one
production incident), CAS contention on the RW lock's state cache line
saturated all FE CPUs and the cluster could not recover even after
scaling.

Introduce an initLocationParams hook on OlapTableSink so subclasses
can override how location params are populated. Both init(...)
overloads now route through this hook. GroupCommitBlockSink overrides
it to return empty placeholder params (location is a required thrift
field, but its contents are unused on BE for the group commit path).

Add GroupCommitBlockSinkTest to lock in the contract.
@liaoxin01 liaoxin01 force-pushed the opt-groupcommit-skip-sink-location-master branch from 4372498 to cd29627 Compare May 23, 2026 14:59
@liaoxin01
Copy link
Copy Markdown
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants