HBASE-30238 HBase bulkload replication causes duplicate HFile loading and compaction storm when source RPC times out#8380
HBASE-30238 HBase bulkload replication causes duplicate HFile loading and compaction storm when source RPC times out#8380mini666 wants to merge 4 commits into
Conversation
…eplication Introduce hbase.replication.bulkload.copy.bandwidth.mb to rate-limit HFile copy from source HDFS in HFileReplicator.Copier. The limit is enforced through a shared RateLimiter across copy threads, and ReplicationSink updates the rate through configuration reload without requiring RegionServer restart.
Track in-progress bulkload events by replication cluster, encoded region, and bulkload sequence number so concurrent RPC retries skip duplicate execution. The key is removed after the current attempt completes or fails, preserving at-least-once retry semantics while avoiding concurrent duplicate loads.
| bulkLoadHFileMap.put(tableName, newFamilyHFilePathsList); | ||
| } | ||
|
|
||
| private static String buildBulkLoadKey(String replicationClusterId, BulkLoadDescriptor bld) { |
There was a problem hiding this comment.
nit: no need to be static.
wchevreuil
left a comment
There was a problem hiding this comment.
Some questions:
-
Do we know if the source always retries the same sink, in the event of rpc timeout? Otherwise, we may still have same bulkload entry submitted more than once.
-
Isn't the throttling going to increase likelihood for such RPC timeouts? If so, can we mention that on code comments?
Coordinate replicated bulkload WAL events across target region servers using ZooKeeper in-progress and completed markers. Add master-side cleanup for completed markers and cover cross-RS retry replay with MiniCluster.
|
Thanks for the review. I added a follow-up commit to handle the cross-target-RS retry case. The sink now coordinates replicated bulkload WAL events through ZooKeeper using in-progress and completed markers, so a retry sent to a different target region server can observe that the same bulkload event has already completed and skip reloading the HFiles. I also added a master-side chore to clean expired completed markers, while keeping them if a matching in-progress marker still exists. For coverage, I added unit tests for the ZooKeeper event tracker and cleanup chore, a sink-level completed-marker skip test, and a MiniCluster test that replays the exact same bulkload WAL batch first through one target RS sink and then through another target RS sink. I verified the change with the focused bulkload replication tests, including the new cross-target-RS retry MiniCluster test. The run passed with 13 tests, 0 failures, 0 errors, and 0 skipped. |
JIRA: https://issues.apache.org/jira/browse/HBASE-30238
This PR adds:
Testing:
JAVA_HOME=/Users/juke.mini666/.asdf/installs/java/temurin-17.0.6+10 mvn test -pl hbase-server -am -Dtest=TestHFileReplicatorBandwidth,TestReplicationSinkBulkLoadDedup -DfailIfNoTests=false -Dhadoop-three.version=3.4.3 -Dwarbucks.skip=true -Dcheckstyle.skip=true -Dspotbugs.skip=true -Drat.skip=true