Skip to content

Persist partition map history to system keys for recovery#13283

Open
akankshamahajan15 wants to merge 2 commits into
apple:mainfrom
akankshamahajan15:bk_v3_history_ss
Open

Persist partition map history to system keys for recovery#13283
akankshamahajan15 wants to merge 2 commits into
apple:mainfrom
akankshamahajan15:bk_v3_history_ss

Conversation

@akankshamahajan15
Copy link
Copy Markdown
Contributor

@akankshamahajan15 akankshamahajan15 commented May 26, 2026

This PR adds the persistence and read path for partition map history on storage servers, so catch-up workers during recovery can route old-epoch mutations correctly.

  1. New system key: \xff\x02/backupPartitionmap/[8-byte epoch][8-byte version] := serialized PartitionMap
  2. Implemented persistPartitionMapToSS: idempotent write called by current-epoch workers when they receive a new partition map. PartitionMap is persisted to both S3 and SS.
  3. Implemented loadPartitionMapHistoryFromSS: used by old epoch backup workers; reads all entries for an epoch in version order after recovery to get partitionMap.

TODO: In next PR:

  • Logic of switching to multiple partitionmap for same epoch for old epoch backupworkers will be handled in next PR.
    • Example: backup starts at v=1 with PM_A; a re-partition fires at v=90 producing PM_B (still in epoch 1); recovery hits at v=100. A catch-up worker for epoch 1 draining v=80..100 must use PM_A for [80, 90) and PM_B for [90, 100). That
      requires reading multiple history entries and switching maps at the version boundary.
  • PartitionMap could be received anytime. Handle that in the code.

Dependencies

This PR depends on #13245.
This PR includes all changes from #13245 plus additions on top.

@akankshamahajan15 akankshamahajan15 added the Backup_v3 Range Partitioned Backup label May 26, 2026
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 597c09f
  • Duration 0:22:40
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 597c09f
  • Duration 0:34:26
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 597c09f
  • Duration 0:44:23
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 597c09f
  • Duration 0:50:41
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 597c09f
  • Duration 0:55:07
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 597c09f
  • Duration 1:03:17
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 597c09f
  • Duration 1:04:55
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a persistence + recovery read path for backup range-partitioned partition maps by storing per-(epoch, version) PartitionMap entries in system keys, enabling old-epoch catch-up backup workers to route mutations correctly after recovery.

Changes:

  • Introduces \xff\x02/backupPartitionmap/[epoch][version] system-key encoding helpers and range helpers.
  • Implements a write path (persistPartitionMapToSS) for current-epoch workers and a read path (loadPartitionMapHistoryFromSS) for catch-up workers.
  • Refactors range-partitioned backup worker lifecycle to monitor log system updates in a dedicated actor and to support graceful shutdown on worker_removed.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
fdbserver/core/include/fdbserver/core/BackupPartitionMap.h Adds PartitionMapHistory type alias and modernizes typedefs.
fdbserver/backupworker/BackupWorkerRangePartitioned.cpp Persists partition map history to SS; reads it for catch-up workers; refactors readiness/shutdown/logsystem monitoring.
fdbserver/backupworker/BackupWorker.cpp Removes unused fields from VersionedMessage / BackupData.
fdbclient/SystemData.cpp Implements key/range encode/decode helpers for partition map history system keys.
fdbclient/include/fdbclient/SystemData.h Declares the new system-key range + helper APIs.
fdbclient/FileBackupAgent.cpp Avoids watching allWorkerStarted for range-partitioned log backups (not set by those workers).

Comment thread fdbserver/backupworker/BackupWorkerRangePartitioned.cpp Outdated
Comment thread fdbserver/backupworker/BackupWorkerRangePartitioned.cpp
Comment thread fdbserver/backupworker/BackupWorkerRangePartitioned.cpp Outdated
Comment thread fdbserver/backupworker/BackupWorkerRangePartitioned.cpp Outdated
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: c3e43f9
  • Duration 0:16:05
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: c3e43f9
  • Duration 0:31:05
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: c3e43f9
  • Duration 0:46:46
  • Result: ❌ FAILED
  • Error: Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: c3e43f9
  • Duration 0:47:22
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: c3e43f9
  • Duration 0:58:55
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: c3e43f9
  • Duration 1:06:17
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: c3e43f9
  • Duration 1:18:28
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: fad5f35
  • Duration 0:05:01
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: fad5f35
  • Duration 0:05:24
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: fad5f35
  • Duration 0:21:13
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: fad5f35
  • Duration 0:42:58
  • Result: ❌ FAILED
  • Error: Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: fad5f35
  • Duration 0:44:51
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: fad5f35
  • Duration 0:45:53
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: fad5f35
  • Duration 1:03:24
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: fad5f35
  • Duration 0:05:29
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: fad5f35
  • Duration 0:05:43
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: fad5f35
  • Duration 0:20:53
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: fad5f35
  • Duration 0:46:04
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: fad5f35
  • Duration 0:52:04
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: fad5f35
  • Duration 0:53:27
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: fad5f35
  • Duration 1:03:06
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backup_v3 Range Partitioned Backup

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants