-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feat][pip] PIP-454: Metadata Store Migration Framework #25196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
|
||
| **Broker/Bookie actions (automatic, triggered by watching the flag):** | ||
| 1. Detect migration flag via watch on `/pulsar/migration-coordinator/migration` | ||
| 2. Defer non-critical metadata writes (e.g., ledger rollovers, bundle ownership changes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this use the existing solution? a SessionEvent.ConnectionLost/SessionEvent.SessionLost event sets a flag metadataServiceAvailable that is used for this purpose in many locations.
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerFactoryImpl.java
Lines 148 to 152 in d630394
| /** | |
| * Keep a flag to indicate whether we're currently connected to the metadata service. | |
| */ | |
| @Getter | |
| private boolean metadataServiceAvailable; |
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerFactoryImpl.java
Lines 288 to 291 in 1617bb2
| private synchronized void handleMetadataStoreNotification(SessionEvent e) { | |
| log.info("Received MetadataStore session event: {}", e); | |
| metadataServiceAvailable = e.isConnected(); | |
| } |
It seems that currently ledger trimming, ledger rollover and loadbalancer load shedding are using the the metadataServiceAvailable flag in ManagedLedgerFactoryImpl.
There's also a dependency on the event directly:
Lines 1137 to 1150 in 38807b1
| // Do not attempt to write if not connected | |
| if (lastMetadataSessionEvent != null | |
| && lastMetadataSessionEvent.isConnected() | |
| && (needBrokerDataUpdate() || force)) { | |
| localData.setLastUpdate(System.currentTimeMillis()); | |
| brokerDataLock.updateValue(localData).join(); | |
| // Clear deltas. | |
| localData.cleanDeltas(); | |
| // Update previous data. | |
| lastData.update(localData); | |
| } |
would the coordinator send a SessionEvent.ConnectionLost event when migration starts so that it remains compatible with the existing solution?
AbstractMetadataStore has a flag isConnected which could also be useful? it's not currently used within Pulsar, just for metadata store caching decisions. I guess it would be necessary to skip cache refreshs while the migration is on-going.
| Each broker and bookie registers itself as a migration participant by creating a sequential ephemeral node: | ||
| - Path: `/pulsar/migration-coordinator/participants/id-NNNN` (sequential) | ||
| - This allows the coordinator to know how many participants exist before migration starts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which node is selected as the migration coordinator?
If it's the broker, what if the migration takes a lot more memory than the broker usually does and causes OOME.
Would it be possible to deploy a dedicated coordinator or run the coordinator in-process, let's say in a pod with sufficient resources, running in a Pulsar cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional questions:
Does the same single coordinator instance migrate both Pulsar and Bookkeeper metadata? Why does the coordinator need to know how many participants exist?
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few comments / questions.
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions:
- is it possible to also migrate back from Oxia to ZooKeeper ?
- this PIP is not about the metadata store to coordinate geo replication, should we state it explicitly ?
|
Good questions.
While the mechanism can be quite generic, the tricky part to make it work seamlessly is to carry over the same version ids on each metadata record. That allows brokers and bookies conditional writes to not fail in the migration. In Oxia we were able to add explicit override for this specific purpose. With ZK it's unfortunately not as easy.
Good point. I'll highlight this more. |
Motivation
Modifications
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: