I started this project because I was sick of the "API Tax." I was working on a project involving Databricks and spent more time wrestling with authentication and network latency than actually processing data. I just wanted to use standard tools like grep on my own files, but I needed the durability guarantees of proper object storage.
BlockFrame solves this by bringing Reed-Solomon erasure coding down to the local filesystem level. You act on it like a normal folder using FUSE (Linux) or WinFSP (Windows), but under the hood, it's splitting your files into chunks and protecting them against bit-rot and drive failure - no network required.
It's built specifically for local, write-once archival. It's not trying to replace S3 for the cloud, and it's definitely not for high-frequency dynamic writes. But if you want enterprise-grade durability for your local datasets without the complexity of a distributed cluster, this works.
BlockFrame differs from tools like MinIO or S3 by prioritizing OS-level integration over API compatibility.
Standard RAID protects against disk failure. BlockFrame protects against data corruption (bit rot) at the file level.
- Reed-Solomon encoding via
reed-solomon-simd(SIMD-accelerated). - Small files (<10MB) use RS(1,3) for high redundancy. Large datasets use RS(30,3), splitting files into 32MB segments grouped into blocks for storage efficiency.
- Mathematical reconstruction of corrupted sectors without needing a 4-node cluster or ZFS.
Instead of writing a client library, I implemented a virtual filesystem driver.
- The application intercepts syscalls (
open,read,seek). - When a user reads a file, BlockFrame performs a Merkle tree hash check. If the hash mismatches (corruption detected), it transparently pauses the read, reconstructs the data from parity shards in memory, and serves clean bytes to the caller.
- Applications work with the data natively without knowing it's being repaired in real-time.
| Aspect | Object Storage (S3/Databricks) | BlockFrame |
|---|---|---|
| Interface | HTTP API (GET /bucket/key) |
Syscall (read(), seek()) |
| Tooling | Requires SDKs (boto3) |
Standard tools (Explorer, pandas, VLC) |
| Recovery | Replica-based (network) | Parity-based (CPU/SIMD) |
| Complexity | Distributed consensus (Paxos/Raft) | Local state & Merkle proofs |
The design assumes a central server running BlockFrame with the archive mounted as a local drive. That mount point can then be shared over the network (SMB on Windows, NFS on Linux). This means:
- Only the server needs BlockFrame installed
- Clients connect to a standard network share
- Access control uses existing OS-level permissions (Active Directory, Group Policy, etc.)
- Data never leaves your infrastructure
Remote mounting is also natively supported for direct connection to a BlockFrame server.
- Reed-Solomon erasure coding at multiple tiers (RS(1,3) for small files, RS(30,3) for large files)
- Automatic tier selection based on file size
- FUSE (Linux) and WinFSP (Windows) filesystem mounting
- On-the-fly segment recovery from parity when corruption is detected
- Hash verification on every read
- Automatic reconstruction and in-place repair of corrupted segments
- Single binary with config file (config.toml)
- No external database or services required
Windows:
- Rust toolchain (stable)
- WinFSP v2.0 or later (required for mounting)
Linux:
-
Rust toolchain (stable)
-
FUSE development libraries:
# Debian/Ubuntu sudo apt install libfuse-dev pkg-config # Fedora/RHEL sudo dnf install fuse-devel # Arch sudo pacman -S fuse2
Blockframe is available for release, so please head over to the releases and download the latest version for your platform. Please do not estrange the winfsp-x64.dll from blockframe.exe.
Clone and build:
git clone https://github.com/crushr3sist/blockframe-rs.git
cd blockframe-rs
cargo build --releaseBinary will be available at target/release/blockframe.
Before running any commands, create a config.toml file in the same directory as the blockframe executable. This file is required and provides default values for all commands.
Example config.toml:
[archive]
# Default directory for storing archived files
# Used by: commit, serve, health, and mount (when default_remote is empty)
directory = "archive_directory"
[mount]
# Default mountpoint for the virtual filesystem
default_mountpoint = "./mnt/blockframe" # Linux example
# default_mountpoint = "Z:" # Windows drive letter
# IMPORTANT: Default remote server URL for mounting
# - Leave EMPTY ("") to use local archive directory by default
# - When set, `blockframe mount` will connect to this remote server by default
# - You can still override with --archive flag to mount local archive
# Example: "http://192.168.1.100:8080"
default_remote = ""
[cache]
# Cache settings for filesystem mounting
# 1 segment = 32mb
max_segments = 200
# Maximum cache size (supports KB, MB, GB)
max_size = "3GB"
[server]
# Default port for HTTP server
default_port = 8080
[logging]
# Logging level: "trace", "debug", "info", "warn", "error"
level = "info"Configuration Behavior:
- All CLI flags are optional - they override config defaults when provided
- Mount source priority (first available is used):
--remoteflag (if provided)--archiveflag (if provided)config.mount.default_remote(if not empty)config.archive.directory(fallback)
- Warning: If you set
default_remote, the mount command will connect to the remote server by default - To use local archive when
default_remoteis set, use:blockframe mount --archive archive_directory - This eliminates the need to specify
--archive,--port, or--mountpointrepeatedly - Adjust cache settings based on your system resources
1. Commit a file to the archive:
blockframe commit --file /path/to/your/file.binFiles are automatically stored in the archive_directory configured in config.toml.
2. Mount the archive as a filesystem:
# Simple: Uses defaults from config.toml
blockframe mount
# Or override specific settings:
# Linux
blockframe mount --mountpoint /mnt/custom --archive archive_directory
# Windows
blockframe mount --mountpoint Z: --archive archive_directory
# Remote mount (connect to another BlockFrame server)
blockframe mount --remote http://192.168.1.100:80803. Access your files:
Once mounted, access files through the mounted filesystem. Original files appear as regular files. Read operations trigger automatic hash verification and recovery if corruption is detected.
Archive a file with erasure coding.
blockframe commit --file <PATH>Arguments:
--file, -f <PATH>: Path to file to archive
Behaviour:
- Automatically selects tier based on file size
- Generates Reed-Solomon parity shards
- Builds Merkle tree for verification
- Writes manifest, segments, and parity to
archive_directory/{filename}_{hash}/
Example:
blockframe commit --file /data/large-video.mp4Mount archive as virtual filesystem.
blockframe mount [--mountpoint <PATH>] [--archive <PATH> | --remote <URL>]Arguments (all optional):
--mountpoint, -m <PATH>: Mount location (default: fromconfig.toml)- Linux: directory path (e.g.,
/mnt/blockframe) - Windows: drive letter (e.g.,
Z:)
- Linux: directory path (e.g.,
--archive, -a <PATH>: Local archive directory (default: fromconfig.toml, conflicts with--remote)--remote, -r <URL>: Remote BlockFrame server URL (default: fromconfig.toml, conflicts with--archive)
Behaviour:
- If no flags are provided, uses all defaults from
config.toml - If
default_remoteis set in config and no flags are given, connects to remote server - Otherwise falls back to local archive directory from config
- Reads manifests from archive or remote server
- Presents files as regular filesystem
- Performs hash verification on every read
- Automatically recovers corrupted segments from parity
- Read-only mount (writes not supported)
Examples:
# Use all defaults from config.toml
blockframe mount
# Override mountpoint only
blockframe mount -m /mnt/custom
# Linux local mount with explicit paths
blockframe mount -m /mnt/blockframe -a archive_directory
# Windows remote mount
blockframe mount -m Z: -r http://server.local:8080
# Remote mount using config defaults for mountpoint
blockframe mount -r http://192.168.1.50:8080Note for Windows: Requires WinFSP installed. Unmount with Ctrl+C or standard Windows unmount.
Start HTTP API server for remote access.
blockframe serve [--archive <PATH>] [--port <PORT>]Arguments (all optional):
--archive, -a <PATH>: Archive directory to serve (default: fromconfig.toml)--port, -p <PORT>: HTTP port (default: fromconfig.toml)
Behaviour:
- Serves archive over HTTP with CORS enabled for cross-origin access
- Provides file listing, manifest, and segment download endpoints
- Enables remote mounting from other machines on your network
- OpenAPI documentation available at
http://<your-ip>:<port>/docs - Read-only access
Examples:
# Use defaults from config.toml
blockframe serve
# Override port only
blockframe serve --port 9000
# Serve custom archive directory
blockframe serve --archive /storage/archive --port 9000Remote Access:
Once serving, access the API documentation at http://<your-ip>:8080/docs (or your configured port). Other machines can mount your archive using:
blockframe mount --remote http://<your-ip>:8080Scan archive for corruption and attempt repairs.
blockframe health [--archive <PATH>]Arguments (optional):
--archive, -a <PATH>: Archive directory to check (default: fromconfig.toml)
Behaviour:
- Scans all manifests in archive
- Verifies segment hashes against Merkle tree
- Reports corruption statistics
- Attempts reconstruction from parity where possible
- Writes recovered segments back to disk
Examples:
# Use default archive from config.toml
blockframe health
# Check specific archive directory
blockframe health --archive /backup/archiveOutput Example:
Checking 15 files...
video.mp4: healthy (120 segments)
dataset.bin: 3 corrupt segments
Recovered from parity: segments 45, 67, 89
archive.tar: healthy (5 segments)
Module Structure:
chunker/- File segmentation and Reed-Solomon encoding (commit_tiny, commit_segmented, commit_blocked)filestore/- Archive operations (get_all, find, repair, reconstruct)merkle_tree/- Hash tree construction and verificationmount/- FUSE/WinFSP filesystem implementations (LocalSource, RemoteSource)serve/- HTTP API server (Poem)config.rs- Configuration managementutils.rs- BLAKE3 hashing and utilities
Core Dependencies:
- Reed-Solomon encoder/decoder (reed-solomon-simd)
- Merkle tree for integrity verification
- Manifest parser and validator
- BLAKE3 hashing
I/O Layer:
- BufWriter for buffered disk writes
- memmap2 for zero-copy file reads
- Rayon for parallel processing
Service Layer:
- HTTP API (Poem) for remote access
- FUSE (Linux) / WinFSP (Windows) for filesystem mounting
- Health checking and repair CLI
BlockFrame automatically selects encoding tier based on file size, balancing redundancy against storage overhead.
| Tier | File Size | Encoding | Overhead | Recovery Capability |
|---|---|---|---|---|
| 1 | < 10 MB | RS(1,3) whole file | 300% | Lose 2 of 3 copies, still recover |
| 2 | 10 MB – 1 GB | RS(1,3) per segment | 300% | Each segment recovers independently |
| 3 | 1 – 35 GB | RS(30,3) per block | 10% | Lose any 3 of 33 shards per block |
| 4 | > 35 GB | Hierarchical | ~12% | Planned |
Tier 1 (tiny files): Entire file encoded as single unit. Maximum redundancy for critical small files.
Tier 2 (medium files): Each 32MB segment gets independent parity. Corruption in one segment does not affect others.
Tier 3 (large files): Segments grouped into blocks of 30, with block-level parity. Storage efficient for large datasets.
Tier selection is automatic. No manual configuration required.
archive_directory/
└── {filename}_{hash}/
├── manifest.json # Merkle root, hashes, metadata
├── segments/ # 32MB data segments
│ └── segment_N.dat
├── parity/ # Reed-Solomon parity shards
│ └── parity_N.dat
└── blocks/ # Tier 3: block structure
└── block_N/
├── segments/
└── parity/
Manifests are JSON. Segments and parity are raw binary. Everything is inspectable with standard tools.
Encoding:
- File is memory-mapped (zero-copy reads)
- Split into 32MB segments
- For Tier 3, segments grouped into blocks of 30
- Reed-Solomon encoding generates parity shards
- Merkle tree built from segment hashes
- Manifest, segments, and parity written to disk
Recovery:
- Filesystem read triggers hash verification
- If hash mismatch detected, load parity shards
- Reed-Solomon decoder reconstructs original segment
- Verify reconstructed segment against manifest hash
- Write recovered segment back to disk
- Return data to caller
Reed-Solomon guarantees: RS(30,3) means any 30 of 33 shards can reconstruct original data. RS(1,3) means any 1 of 4 shards (1 data + 3 parity) recovers the file.
- CPU: Intel Core i5-12600KF (6P + 4E cores)
- RAM: 32 GB
- Storage: HDD (~88 MB/s sequential write)
- OS: Windows 11 Pro
Benchmarks measured using cargo run -- commit --file <file> on HDD storage.
| File Size | Tier | Commit Time | Throughput |
|---|---|---|---|
| 171 KB | 1 | 0.5s | 0.4 MB/s |
| 1.6 GB | 3 | 36s | 45 MB/s |
| 26.6 GB | 3 | 27m 23s | 16.6 MB/s |
Tier 3 Performance: Both Tier 3 files use RS(30,3) encoding with 10% storage overhead. The 1.6 GB file achieves 45 MB/s, writing 1.76 GB total (1.6 GB data + 160 MB parity) in 36 seconds. This approaches the HDD's rated sequential write speed of 88 MB/s when accounting for metadata writes and Merkle tree computation.
The larger 26.6 GB file maintains 16.6 MB/s sustained throughput across 830 blocks. The performance difference is due to file system overhead - smaller files benefit from better cache locality and fewer directory operations.
SIMD Acceleration: Reed-Solomon encoding completes in milliseconds per segment. The performance envelope is determined by storage write speeds, not computational throughput.
| Storage Type | Sequential Write | Expected Throughput | 10 GB Archive |
|---|---|---|---|
| 5400 RPM HDD | 80-100 MB/s | 15-25 MB/s | ~7 min |
| 7200 RPM HDD | 120-150 MB/s | 30-40 MB/s | ~4 min |
| SATA SSD | 400-500 MB/s | 100-150 MB/s | ~80 sec |
| NVMe SSD | 2000-3500 MB/s | 300-500 MB/s | ~25 sec |
Performance scales linearly with storage speed. On NVMe, the 26.6 GB file would encode in approximately 3 minutes. The SIMD-accelerated encoding pipeline ensures CPU is not the bottleneck on modern storage.
BlockFrame is organized into focused modules. Each contains its own README with implementation details, design rationale, and technical decisions.
chunker/ - File segmentation and Reed-Solomon encoding. Handles commit pipeline from raw file to archived segments. See chunker/README.md for tier selection logic and encoding parameters.
filestore/ - Archive operations. Manifest scanning, file location, repair and reconstruction workflows. See filestore/README.md for batch health checking and recovery strategies.
mount/ - Filesystem implementations (FUSE and WinFSP). Transparent access with on-the-fly recovery. See mount/README.md for cache architecture, concurrency patterns, and platform-specific considerations.
merkle_tree/ - Hash tree construction and verification. Provides cryptographic integrity proofs.
serve/ - HTTP API server for remote access.
config.rs - Configuration management.
utils.rs - BLAKE3 hashing and segment size calculations.
Browse module READMEs for deeper technical insight into specific subsystems.
Reed-Solomon: RS(n,k) codes provide mathematically guaranteed reconstruction from partial data loss. BlockFrame uses reed-solomon-simd for SIMD-accelerated encoding/decoding.
Memory-mapped I/O: Files are memory-mapped for zero-copy reads. RAM usage remains constant regardless of file size. Kernel handles paging; application iterates through segments.
BLAKE3: Used for all hashing (the sha256 function name is historical). Faster than SHA-256 with better parallelization. Cryptographically secure.
Cache: Mounted filesystems use moka's W-TinyLFU for segment caching. Frequency-based eviction prevents cache pollution from sequential scans. See mount/README.md for detailed cache analysis.
Concurrency: FUSE allows serialized access (&mut self). WinFSP requires shared access (&self) due to Windows I/O threading model. Both implementations are thread-safe through different mechanisms.
Write Operations: Mounting is read-only. Archived files cannot be modified in-place. To update a file, commit a new version.
Tier 4: Files over 35GB currently use Tier 3 encoding. Hierarchical Tier 4 is planned.
Compression: Not implemented. Recommend compressing files before archiving if needed.
Encryption: Not implemented. Use filesystem-level encryption (LUKS, BitLocker) or encrypt files before committing.
Distributed Storage: Single-machine only. Remote mounting is supported but does not provide replication.
- Tier 4 hierarchical encoding for files > 35GB
- Async I/O for improved throughput
- HTTP streaming server with byte-range requests
- Segment-level deduplication
- Optional compression and encryption layers
- Distributed replication protocol
- reed-solomon-simd - SIMD-accelerated erasure coding
- blake3 - Fast cryptographic hashing
- rayon - Data parallelism
- memmap2 - Memory-mapped file I/O
- serde - Serialization framework
- fuser - FUSE bindings (Linux)
- winfsp - Filesystem driver (Windows)
- moka - W-TinyLFU cache
- clap - CLI argument parsing
- tracing - Structured logging
Note on winfsp-rs: This project includes a patched version of winfsp-rs located in patches/winfsp-rs/. The patches address specific compatibility and functionality requirements for BlockFrame. The original winfsp-rs is licensed under GPLv3, and the patched version maintains the same license.
MIT
For detailed technical explanations, architectural decisions, and implementation rationale, see module-specific READMEs:
- Cache architecture and W-TinyLFU analysis
- Tier selection and encoding strategies
- Batch health checking and repair workflows
- Merkle tree verification
Each module README provides context for design choices, trade-offs considered, and implementation details.
