HBASE-30238 HBase bulkload replication causes duplicate HFile loading and compaction storm when source RPC times out by mini666 · Pull Request #8380 · apache/hbase

mini666 · 2026-06-19T08:05:51Z

JIRA: https://issues.apache.org/jira/browse/HBASE-30238

This PR adds:

Configurable bandwidth throttling for bulkloaded HFile copy during replication.
In-progress bulkload deduplication for concurrent RPC retries of the same bulkload event.

Testing:

JAVA_HOME=/Users/juke.mini666/.asdf/installs/java/temurin-17.0.6+10 mvn test -pl hbase-server -am -Dtest=TestHFileReplicatorBandwidth,TestReplicationSinkBulkLoadDedup -DfailIfNoTests=false -Dhadoop-three.version=3.4.3 -Dwarbucks.skip=true -Dcheckstyle.skip=true -Dspotbugs.skip=true -Drat.skip=true

…eplication Introduce hbase.replication.bulkload.copy.bandwidth.mb to rate-limit HFile copy from source HDFS in HFileReplicator.Copier. The limit is enforced through a shared RateLimiter across copy threads, and ReplicationSink updates the rate through configuration reload without requiring RegionServer restart.

Track in-progress bulkload events by replication cluster, encoded region, and bulkload sequence number so concurrent RPC retries skip duplicate execution. The key is removed after the current attempt completes or fails, preserving at-least-once retry semantics while avoiding concurrent duplicate loads.

wchevreuil · 2026-06-19T14:48:13Z

    bulkLoadHFileMap.put(tableName, newFamilyHFilePathsList);
  }

+  private static String buildBulkLoadKey(String replicationClusterId, BulkLoadDescriptor bld) {


nit: no need to be static.

wchevreuil

Some questions:

Do we know if the source always retries the same sink, in the event of rpc timeout? Otherwise, we may still have same bulkload entry submitted more than once.
Isn't the throttling going to increase likelihood for such RPC timeouts? If so, can we mention that on code comments?

Coordinate replicated bulkload WAL events across target region servers using ZooKeeper in-progress and completed markers. Add master-side cleanup for completed markers and cover cross-RS retry replay with MiniCluster.

mini666 · 2026-06-21T07:48:22Z

Thanks for the review.

I added a follow-up commit to handle the cross-target-RS retry case. The sink now coordinates replicated bulkload WAL events through ZooKeeper using in-progress and completed markers, so a retry sent to a different target region server can observe that the same bulkload event has already completed and skip reloading the HFiles.

I also added a master-side chore to clean expired completed markers, while keeping them if a matching in-progress marker still exists.

For coverage, I added unit tests for the ZooKeeper event tracker and cleanup chore, a sink-level completed-marker skip test, and a MiniCluster test that replays the exact same bulkload WAL batch first through one target RS sink and then through another target RS sink.

I verified the change with the focused bulkload replication tests, including the new cross-target-RS retry MiniCluster test. The run passed with 13 tests, 0 failures, 0 errors, and 0 skipped.

juke-mini666 added 2 commits June 19, 2026 17:00

wchevreuil reviewed Jun 19, 2026

View reviewed changes

juke-mini666 added 2 commits June 21, 2026 09:23

HBASE-30238 Address bulkload replication review comments

9e7665d

HBASE-30238 Add distributed bulkload replication event tracking

1b72829

Coordinate replicated bulkload WAL events across target region servers using ZooKeeper in-progress and completed markers. Add master-side cleanup for completed markers and cover cross-RS retry replay with MiniCluster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HBASE-30238 HBase bulkload replication causes duplicate HFile loading and compaction storm when source RPC times out#8380

HBASE-30238 HBase bulkload replication causes duplicate HFile loading and compaction storm when source RPC times out#8380
mini666 wants to merge 4 commits into
apache:masterfrom
mini666:HBASE-30238

mini666 commented Jun 19, 2026

Uh oh!

wchevreuil Jun 19, 2026 •

edited

Loading

Uh oh!

wchevreuil left a comment

Uh oh!

mini666 commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mini666 commented Jun 19, 2026

Uh oh!

wchevreuil Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wchevreuil left a comment

Choose a reason for hiding this comment

Uh oh!

mini666 commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wchevreuil Jun 19, 2026 •

edited

Loading