[opt](group-commit) Skip createLocation in group commit stream load sink#63561
[opt](group-commit) Skip createLocation in group commit stream load sink#63561liaoxin01 wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
There was a problem hiding this comment.
Pull request overview
Skips the expensive createLocation tablet/replica enumeration for the group-commit stream load sink, which under cloud mode contends on CloudSystemInfoService's RW lock and dominates FE CPU on wide-partition tables. BE's GroupCommitBlockSinkOperatorX::init does not consume location/slave_location, so empty placeholders satisfy the required thrift field at O(1) cost.
Changes:
- Add a
protected initLocationParams(TOlapTableSink)hook onOlapTableSink(default delegates tocreateLocation) and route bothinitoverloads through it. - Override
initLocationParamsinGroupCommitBlockSinkto return two empty placeholderTOlapTableLocationParamobjects. - Add
GroupCommitBlockSinkTestcovering the override andparseGroupCommitparsing.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| fe/fe-core/src/main/java/org/apache/doris/planner/OlapTableSink.java | Introduces initLocationParams hook and routes both init overloads through it. |
| fe/fe-core/src/main/java/org/apache/doris/planner/GroupCommitBlockSink.java | Overrides the hook to return empty placeholder location params, skipping replica enumeration. |
| fe/fe-core/src/test/java/org/apache/doris/planner/GroupCommitBlockSinkTest.java | New unit tests for the override behavior and parseGroupCommit. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Review result: no blocking issues found.
Critical checkpoint conclusions:
- Goal/test coverage: The change avoids FE tablet replica location enumeration for GroupCommitBlockSink while preserving default OlapTableSink behavior. The added unit test covers the override contract and group_commit parsing. I could not run it locally because thirdparty/installed/bin/protoc is absent in this runner, and FE build instructions require stopping before Maven when protoc is missing.
- Scope/focus: The implementation is small and focused: a protected hook plus a GroupCommitBlockSink override.
- Concurrency/locking: The changed FE code does not introduce new shared mutable state or locks. It removes the high-contention createLocation path from the per-request group commit block sink initialization.
- Lifecycle/static initialization: No new lifecycle-sensitive objects or static initialization dependencies were added.
- Configuration/compatibility: No config items or storage/protocol-incompatible fields were added. Required thrift location fields are still populated with non-null placeholders.
- Parallel code paths: Both relevant OlapTableSink init overloads now route through the hook; non-group-commit and RemoteOlapTableSink paths retain createLocation via the default implementation.
- Error handling/data correctness: BE GroupCommitBlockSinkOperatorX::init only consumes schema, partition, ids, group_commit_mode, load_id, and max_filter_ratio; the real internal group-commit insert plan still obtains the normal OLAP sink planning path. I did not find a data visibility or transaction correctness regression from the skipped placeholder locations.
- Observability/performance: The change addresses the intended hot path without adding noisy logs. Existing group commit BE logging remains available.
User focus: No additional user-provided review focus was supplied.
The BE-side GroupCommitBlockSinkOperatorX::init does not consume TOlapTableSink.location or slave_location (it only reads tuple_id, schema, db_id, table_id, partition, group_commit_mode, load_id and max_filter_ratio). However, FE still ran createLocation, which iterates O(partitions * indexes * tablets * replicas) and, for every replica, takes the CloudSystemInfoService RW read lock via CloudReplica.getCurrentClusterId. Under high-concurrency group commit stream load on wide-partition tables (3000+ partitions in one production incident), CAS contention on the RW lock's state cache line saturated all FE CPUs and the cluster could not recover even after scaling. Introduce an initLocationParams hook on OlapTableSink so subclasses can override how location params are populated. Both init(...) overloads now route through this hook. GroupCommitBlockSink overrides it to return empty placeholder params (location is a required thrift field, but its contents are unused on BE for the group commit path). Add GroupCommitBlockSinkTest to lock in the contract.
4372498 to
cd29627
Compare
|
run buildall |
Summary
The BE-side
GroupCommitBlockSinkOperatorX::initdoes not consumeTOlapTableSink.locationorslave_location(it only readstuple_id/schema/db_id/table_id/partition/group_commit_mode/load_id/max_filter_ratio). However, FE still runscreateLocation, which iteratesO(partitions * indexes * tablets * replicas)and, for every replica, takes theCloudSystemInfoServiceRW read lock viaCloudReplica.getCurrentClusterId.Under high-concurrency group commit stream load on wide-partition tables (3000+ partitions in a real production incident), CAS contention on the RW lock's
statecache line saturated all FE CPUs, and the cluster could not recover even after scaling out (more cores = more CAS contenders = worse contention).Change
protected initLocationParams(TOlapTableSink)hook onOlapTableSink. Default behavior delegates tocreateLocation, so non-group-commit sinks are unaffected.init(...)overloads inOlapTableSinkthrough the hook.GroupCommitBlockSinkoverrides the hook to return empty placeholderTOlapTableLocationParamobjects.TOlapTableSink.locationis a required thrift field, so we still set non-null placeholders, but no tablet/replica enumeration happens.Effect on the group-commit path:
O(partitions * indexes * tablets * replicas)→O(1)CloudSystemInfoServiceRW lock acquisitions: hundreds of concurrent CAS spinners → 0Test plan
GroupCommitBlockSinkTestcovering:initLocationParamsreturns 2 placeholders with empty tablet lists (verifies the override is what runs, notcreateLocation).parseGroupCommitparsesasync_mode/sync_mode/off_mode(case-insensitive) and returns null for unknown values.group_commit=truestill pass.CloudSystemInfoServicelock contention.