Skip to content

HDDS-14794. Design doc for Incremental container replication#9913

Open
echonesis wants to merge 2 commits intoapache:masterfrom
echonesis:HDDS-14794
Open

HDDS-14794. Design doc for Incremental container replication#9913
echonesis wants to merge 2 commits intoapache:masterfrom
echonesis:HDDS-14794

Conversation

@echonesis
Copy link
Contributor

What changes were proposed in this pull request?

Incremental container replication design doc

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14794

How was this patch tested?

NA

@adoroszlai adoroszlai changed the title HDDS-14794. Support incremental container replication HDDS-14794. Design doc for Incremental container replication Mar 13, 2026
@sodonnel
Copy link
Contributor

I didn't read this in detail, just quickly skimmed it, but I believe the Container Reconciler largely solves this problem, or if it doesn't solve it completely it could easily be extended to. Have you see the design and implementation of the reconciler?

@ivandika3
Copy link
Contributor

ivandika3 commented Mar 14, 2026

@echonesis Thanks for raising this patch and @sodonnel for pointing to the container reconcilier patch. This is just an idea to allow QUASI_CLOSED container replicas with lower sequence ID to catch up with the replica with higher sequence ID.

I am raising this in the context of multi-DC stretch cluster (as opposed to cross-region DC) setup where a pipeline that have a main pipeline of 3 replicas on the main DC and one replica that uses incremental block replication on the target DC which will listen to the main pipeline regardless whether the Ratis group has been closed.

@echonesis Let me think about this first.

@sodonnel
Copy link
Contributor

The idea of the reconciler, is to take unhealthy replicas (eg those with block corruptions) and fix the unhealthy state without replication from RM. In the initial version I believe it does this as part of the container scanner, but the idea was to extend to be trigger by RM when RM detects something like quasi_closed or an unhealthy replica, so it can try to repair them, rather than do a full replication. This sounds a lot like what you want to do, so it would be really great if you could look into building on the work started on the reconcilor already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants