Skip to content

feat: implement manual RPC backup via RocksDB checkpoint (Approach A)#1110

Closed
Zhangcy0x3 wants to merge 6 commits intonervosnetwork:developfrom
Zhangcy0x3:feat/backup-rpc-manual
Closed

feat: implement manual RPC backup via RocksDB checkpoint (Approach A)#1110
Zhangcy0x3 wants to merge 6 commits intonervosnetwork:developfrom
Zhangcy0x3:feat/backup-rpc-manual

Conversation

@Zhangcy0x3
Copy link
Copy Markdown
Contributor

Summary
This PR implements the initial phase of Approach A (RocksDB Checkpoint) as proposed in #1086. It provides a new RPC method backup_now that allows users to trigger a consistent online backup of the node's storage and essential key files without requiring a shutdown.

Key Changes

  1. Storage Abstraction & Decoupling
    Introduced the KVStore trait to abstract database handle access. This design decouples the RPC layer from the specific storage implementation, facilitating future maintenance and potential backend swaps.

  2. Core Backup Logic (Approach A)
    Integrated RocksDB's native Checkpoint API to create consistent, atomic snapshots of the store. Implemented physical file backup for critical identity files: ckb_key and fiber_key.

  3. Architecture & Dependency Injection
    Refactored InfoRpcServerImpl::new to correctly inject Store, CkbConfig, and FiberConfig. Using log_and_error! macro as error handling with structured error objects .

Test Plan
Passed the following integration tests:

test_rpc_backup_now_success: Verifies successful backup creation and file integrity.

test_rpc_backup_now_already_exists: Ensures proper error handling when the target backup directory already exists.

Task List (#1086 Phase 1)

  • Manual RPC trigger: backup_now

  • Approach A: Scheduled trigger (Planned)

  • Phase 2: Recovery capability (Planned)

Related Issues
Relates to #1086

@Zhangcy0x3 Zhangcy0x3 force-pushed the feat/backup-rpc-manual branch 3 times, most recently from a451109 to 63f99fc Compare February 7, 2026 10:09
@Zhangcy0x3
Copy link
Copy Markdown
Contributor Author

PR Update Summary

I have performed a comprehensive refactoring to resolve the WASM compatibility issues while ensuring the node backup functionality (Approach A) works correctly on native platforms.

Key Problems & Solutions
Dependency Leakage (WASM CI Failure)

Problem: Native dependencies like rocksdb and mio do not support WASM targets, causing compilation errors in CI.

Solution: Used precise #[cfg(not(target_arch = "wasm32"))] guards to isolate all backup-related logic and native imports.

Generic Parameter Mismatch (S: RpcServerStore)

Problem: In WASM, the server module (and its RpcServerStore trait) was originally gated out, causing "unresolved import" and "unused type parameter" errors.

Solution: Introduced a Trait Alias (StoreInfo) to bridge platform-specific storage bounds.

Applied std::marker::PhantomDatain the WASM version of InfoRpcServerImpl to handle the unused generic parameter S while maintaining a consistent struct signature.

API Consistency (Structural Integrity)

Problem: Differing struct definitions between platforms usually lead to messy #[cfg] blocks in the caller's code (e.g., mod.rs).

Solution: Unified the InfoRpcServerImpl::new constructor's signature across all platforms. In WASM, native-only parameters are accepted but safely ignored, allowing the RPC module to be initialized transparently by other modules.

@Zhangcy0x3 Zhangcy0x3 force-pushed the feat/backup-rpc-manual branch 2 times, most recently from c060b67 to 2cc29aa Compare February 8, 2026 10:04
@Zhangcy0x3 Zhangcy0x3 force-pushed the feat/backup-rpc-manual branch from 2cc29aa to c824c06 Compare February 8, 2026 12:08
@Zhangcy0x3
Copy link
Copy Markdown
Contributor Author

Zhangcy0x3 commented Feb 8, 2026

PR Update Summary: Completed Backup RPC implementation with full WASM and Feature-gate compatibility.

I have finalized the implementation of the backup_now RPC. Beyond the core backup logic, a significant portion of this PR focuses on ensuring the codebase remains compatible with WASM targets and various feature combinations.
Key Technical Improvements:

  1. Strict WASM Isolation:
  • Successfully decoupled native-only dependencies (RocksDB checkpointing and File I/O) from the WASM build path using #[cfg] guards.

  • Introduced a platform-agnostic StoreInfo trait alias in info.rs to maintain a unified API signature across all targets.

  1. Generic Parameter Management:
  • Utilized std::marker::PhantomData in the WASM version of InfoRpcServerImpl to handle the generic storage parameter S without leaking native traits.
  1. Storage/RPC Layer Decoupling:
  • Migrated the KVStore trait definition to the store_impl module. This not only clarifies the architectural boundary but also resolves the documentation generation conflicts (gen-rpc-doc) by keeping internal storage traits out of the public RPC interface.

@Officeyutong
Copy link
Copy Markdown
Collaborator

Hi @Zhangcy0x3, the current PR looks good to me. Will you continue on doing the restoring part?

@Zhangcy0x3
Copy link
Copy Markdown
Contributor Author

Hi @Zhangcy0x3, the current PR looks good to me. Will you continue on doing the restoring part?

Thanks for the review! I'm glad to hear the current progress looks good. Yes, I'd love to continue and implement the restoring part.

@chenyukang
Copy link
Copy Markdown
Collaborator

emm, we added a new rpc-json-types, sorry there are some conflicts need bo be resolved.

@chenyukang
Copy link
Copy Markdown
Collaborator

#1169

@Zhangcy0x3
Copy link
Copy Markdown
Contributor Author

Hi @chenyukang,

I noticed the CI failure on this PR. To ensure a better review experience and system consistency, I’ve decided to pause this specific PR for now.

I am currently finalizing the Restore implementation (Issue #1086). My current progress includes:

  1. Physical Restoration: Atomic file swapping with a rollback mechanism in restore.rs.

  2. Audit Mechanism: A RestoreAuditMap to track channel states post-recovery, ensuring we detect commitment number gaps during ChannelReestablish to prevent penalties.

I believe submitting a unified Disaster Recovery Suite (combining both backup and restore) will be more robust and easier to evaluate as a complete feature. I’ll close this one once the full PR is ready. Stay tuned!

@quake
Copy link
Copy Markdown
Member

quake commented Apr 1, 2026

Thanks for the PR! Since the store layer has been refactored and now supports both RocksDB and SQLite ( #1191 ) it would be better to move this backup logic into the StorageBackend trait.

I suggest adding a backup function to the trait, and then providing separate implementations for RocksDB and SQLite. This way the backup mechanism stays consistent with the abstraction and works across different backends.

@Zhangcy0x3
Copy link
Copy Markdown
Contributor Author

Thanks for the PR! Since the store layer has been refactored and now supports both RocksDB and SQLite ( #1191 ) it would be better to move this backup logic into the StorageBackend trait.

I suggest adding a backup function to the trait, and then providing separate implementations for RocksDB and SQLite. This way the backup mechanism stays consistent with the abstraction and works across different backends.

Thanks @quake! That's a great suggestion. Moving the backup logic into StorageBackend will definitely improve the abstraction.

Since I’ve already implemented both backup and restore in my latest PR #1197, I will apply this refactoring directly over there.

Let's continue the discussion in the new PR. Thanks again for the guidance!

@chenyukang
Copy link
Copy Markdown
Collaborator

This PR will be closed in favor of #1197?

@Zhangcy0x3
Copy link
Copy Markdown
Contributor Author

This PR will be closed in favor of #1197?

Yes, I'm closing this one in favor of #1197. All relevant logic from this PR has been migrated and improved in #1197.

@Zhangcy0x3 Zhangcy0x3 closed this Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants