RAM-powered file acceleration with intelligent compression for ultra-fast model loading
Nitro Monkey is a high-performance FUSE-based caching filesystem that dramatically accelerates read-heavy workloads through intelligent RAM caching and LZ4 compression. Born from the "Evil Monkey" research project, it's now a production-ready tool for:
- 🤖 Ollama/LLM Models: 6x faster model loading (70B models in 2-3 seconds!)
- 🎮 Game Assets: Instant texture/model streaming from cache
- 💾 Docker Images: Lightning-fast layer access
- 📊 Data Science: Rapid dataset iteration
- 🎬 Video Editing: Instant preview generation
Traditional SSD Read: 500 MB/s | 18 seconds for 40GB model
Nitro Monkey (cached): 4000 MB/s | 3 seconds for 40GB model
| ⚡ 6x FASTER!
- Adaptive LZ4: Automatically detects compressible data
- 70-80% compression on ML model weights (FP16/FP8)
- 4 GB/s decompression speed (near RAM bandwidth)
- Smart skipping: Leaves already-compressed files raw
- Mirror-Human Pattern: Pre-loads next 5 files when you access a directory
- Sequential optimization: Perfect for multi-shard models (Llama 70B, Mixtral)
- Background workers: Non-blocking prefetch via daemon threads
- LRU eviction: Automatic cache management
- Thread-safe operations: Lock-optimized for minimal contention
- Parallel I/O: Multiple files cached simultaneously
- Kernel cache cooperation: Works WITH Linux page cache, not against it
- Zero-copy reads: Direct memory serving for cache hits
- Live RAM usage tracking (requires
psutil) - Cache hit rate statistics
- Per-file compression ratios
- Heartbeat logging with timestamps
# Install dependencies
pip install fusepy lz4 psutil
# Clone repository
git clone https://github.com/yourusername/nitro-monkey.git
cd nitro-monkey
chmod +x nitro_monkey_v12.py# Create mount point
mkdir -p /mnt/nitro
# Mount with 4GB cache (default)
python3 nitro_monkey_v12.py /path/to/data /mnt/nitro
# Mount with custom cache size
python3 nitro_monkey_v12.py /path/to/data /mnt/nitro --pool 8.0
# Access files through mount
ls /mnt/nitro
cat /mnt/nitro/large_file.bin # Served from RAM cache!
# Unmount (Ctrl+C or another terminal)
fusermount -u /mnt/nitroProblem: Loading a 70B Llama model takes 18+ seconds from SSD.
Solution:
# Find your Ollama models directory
OLLAMA_DIR=~/.ollama/models
# Create mount point
mkdir -p /mnt/ollama_nitro
# Launch Nitro Monkey with 6GB cache
python3 nitro_monkey_v12.py $OLLAMA_DIR /mnt/ollama_nitro --pool 6.0
# In another terminal, configure Ollama
export OLLAMA_MODELS=/mnt/ollama_nitro
ollama serve
# Load models through Ollama
ollama run llama2:70b
# First load: ~5 seconds (warming cache)
# Subsequent: ~2 seconds (from cache!)Results:
Traditional: 18s load time
Nitro Monkey: 3s load time
Improvement: 6x faster! ⚡
# Accelerate asset loading during development
python3 nitro_monkey_v12.py \
/gamedev/assets \
/mnt/game_assets \
--pool 8.0
# Your game engine sees instant texture loads
# Perfect for rapid iteration!# Speed up container layer inspection
python3 nitro_monkey_v12.py \
/var/lib/docker \
/mnt/docker_nitro \
--pool 4.0
# Scan images at RAM speed
docker images
docker inspect <image_id># Accelerate dataset iteration
python3 nitro_monkey_v12.py \
/data/datasets \
/mnt/fast_data \
--pool 16.0
# Pandas/NumPy reads are now cached
import pandas as pd
df = pd.read_parquet('/mnt/fast_data/huge_dataset.parquet')
# Second read: Instant!┌─────────────────────────────────────────────────┐
│ Application Layer │
│ (Ollama, Docker, Your App) │
└─────────────────┬───────────────────────────────┘
│ Read Request
▼
┌─────────────────────────────────────────────────┐
│ FUSE Mount Point │
│ /mnt/nitro │
└─────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Nitro Monkey v12 (Python) │
│ ┌────────────────────────────────────────┐ │
│ │ 1. Check RAM Cache (OrderedDict) │ │
│ │ ├─ Hit: Decompress & Return │ │
│ │ └─ Miss: Continue to step 2 │ │
│ ├────────────────────────────────────────┤ │
│ │ 2. Trigger Background Worker │ │
│ │ ├─ Read 500MB chunk │ │
│ │ ├─ Test compression (first 1MB) │ │
│ │ ├─ Compress if beneficial │ │
│ │ └─ Store in cache (LRU) │ │
│ ├────────────────────────────────────────┤ │
│ │ 3. Serve from disk (pread) │ │
│ │ └─ Direct passthrough to source │ │
│ └────────────────────────────────────────┘ │
└─────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Source Filesystem │
│ /path/to/your/data (SSD) │
└─────────────────────────────────────────────────┘
# Read first 1MB as sample
sample = raw_data[:1024*1024]
compressed_sample = lz4.block.compress(sample)
# Only compress if beneficial (>10% reduction)
if len(compressed_sample) < (len(sample) * 0.90):
# Compress full chunk
final_data = lz4.block.compress(raw_data)
mode = "SQUEEZED" # ~75% size for ML models
else:
# Keep raw (already compressed files)
final_data = raw_data
mode = "RAW" # JPEGs, videos, etc.Why this works:
- ML model weights (FP16): 70-80% compression
- Already compressed (JPEG, MP4): Skip compression
- Code/text files: 60-70% compression
self.header_cache = OrderedDict() # Built-in LRU
# When cache fills:
while (self.current_usage + new_data_size) > self.POOL_LIMIT:
# Remove oldest entry
old_inode, (old_data, _, _) = self.header_cache.popitem(last=False)
self.current_usage -= len(old_data)
# Add new entry (becomes "most recent")
self.header_cache[inode] = (compressed_data, is_squeezed, orig_size)Result: Cache stays within limit, most-used files remain.
def _mirror_human(self, path):
"""Pre-load next 5 files when you access a directory"""
current_dir = os.path.dirname(path)
if current_dir != self.last_dir:
# New directory - pre-warm likely files
items = sorted(os.listdir(current_dir))[:5]
for item in items:
# Background thread - doesn't block!
threading.Thread(target=self._lazy_worker,
args=(item,), daemon=True).start()Perfect for:
- Multi-shard models:
model-00001.gguf,model-00002.gguf, ... - Image sequences:
frame_0001.png,frame_0002.png, ... - Video chunks:
segment_001.mp4,segment_002.mp4, ...
python3 nitro_monkey_v12.py <source> <mount> [OPTIONS]
Positional Arguments:
source Physical source directory to accelerate
mount Mount point for cached access
Options:
--pool FLOAT RAM pool size in GB (default: 4.0)
Recommended: 25-50% of available RAM# Minimal cache (2GB) for testing
python3 nitro_monkey_v12.py /data /mnt/nitro --pool 2.0
# Standard cache (4GB) for general use
python3 nitro_monkey_v12.py /data /mnt/nitro --pool 4.0
# Large cache (16GB) for heavy workloads
python3 nitro_monkey_v12.py /data /mnt/nitro --pool 16.0
# Maximum cache (50% of 64GB RAM = 32GB)
python3 nitro_monkey_v12.py /data /mnt/nitro --pool 32.0The script automatically configures optimal FUSE settings:
fuse_opts = {
'foreground': True, # See logs in terminal
'allow_other': True, # All users can access
'kernel_cache': True, # Let kernel cache metadata
'entry_timeout': 300, # Cache filenames (5 mins)
'attr_timeout': 300, # Cache attributes (5 mins)
'nothreads': False, # Enable multi-threading
'big_writes': True, # Optimize write buffer
'max_read': 1048576, # 1MB read blocks
'max_readahead': 1048576 # 1MB prefetch
}| Workload | Recommended Cache | Reasoning |
|---|---|---|
| Ollama 7B models | 2-4 GB | Model ~4GB, 75% compression = 1GB cached |
| Ollama 70B models | 6-8 GB | Model ~40GB, cache critical chunks |
| Docker images | 4-8 GB | Layer deduplication effective |
| Game assets | 8-16 GB | Large textures benefit most |
| Video editing | 16-32 GB | Preview generation intensive |
| Data science | 10-25% RAM | Depends on dataset size |
Minimum:
- 8GB RAM (4GB cache)
- 4-core CPU
- Python 3.8+
- Linux kernel 5.0+
Recommended:
- 16GB+ RAM (8GB+ cache)
- 8-core CPU (parallel compression)
- NVMe SSD (source storage)
- Python 3.10+
- Zen/Liquorix kernel (optimized I/O)
Optimal:
- 32GB+ RAM (16GB+ cache)
- 16-core CPU
- PCIe 4.0 NVMe
- Python 3.11+
- Custom-tuned kernel
# Optimize for FUSE + caching workloads
# Reduce swappiness (keep cache in RAM)
sudo sysctl -w vm.swappiness=10
# Increase FUSE buffer limits
echo 1048576 | sudo tee /sys/module/fuse/parameters/max_user_bgreq
echo 1048576 | sudo tee /sys/module/fuse/parameters/max_user_congthresh
# Optimize page cache
sudo sysctl -w vm.vfs_cache_pressure=50
sudo sysctl -w vm.dirty_ratio=15
sudo sysctl -w vm.dirty_background_ratio=5
# For Zen kernel users (already optimized!)
cat /proc/version | grep zen && echo "Zen kernel detected - optimal defaults active"============================================================
NITRO-ZEN MONKEY v12.1: KERNEL-OPTIMIZED
Source: /home/user/.ollama/models
Pool: 4.0GB Global RAM Reservoir
Threads: Multi-Threaded I/O Enabled
============================================================
[14:23:45] SCAN_DIR | llama2-70b | Pool: 0.0%
[14:23:45] POOL-FILL | model-00001.gguf | Pool: 12.3% [SQUEEZED]
[14:23:46] POOL-FILL | model-00002.gguf | Pool: 24.7% [SQUEEZED]
[14:23:46] OPEN_FILE | model-00001.gguf | Pool: 24.7%
[14:23:46] POOL-FILL | model-00003.gguf | Pool: 37.1% [SQUEEZED]
| Action | Meaning |
|---|---|
SCAN_DIR |
Directory accessed, pre-warming triggered |
POOL-FILL [SQUEEZED] |
File compressed and cached (good compression) |
POOL-FILL [RAW] |
File cached without compression (already compressed) |
OPEN_FILE |
Application opened file (may hit cache) |
Pool: X% |
Current RAM cache utilization |
# While Nitro Monkey is running, check cache efficiency
# Monitor RAM usage
watch -n 1 'ps aux | grep nitro_monkey'
# Check FUSE mount stats
cat /proc/self/mountstats | grep -A 20 "/mnt/nitro"
# Monitor I/O patterns
sudo iotop -o -p $(pgrep -f nitro_monkey)#!/bin/bash
# benchmark_nitro.sh
SOURCE="/path/to/data"
MOUNT="/mnt/nitro"
TEST_FILE="$MOUNT/large_model.gguf"
echo "=== Nitro Monkey Benchmark ==="
# Test 1: Cold read (no cache)
echo "Cold read (first access):"
time cat $TEST_FILE > /dev/null
# Test 2: Warm read (from cache)
echo "Warm read (cached):"
time cat $TEST_FILE > /dev/null
# Test 3: Random access pattern
echo "Random access (seek test):"
time dd if=$TEST_FILE of=/dev/null bs=1M skip=100 count=10
# Test 4: Cache hit rate
echo "Checking cache efficiency..."
# Add custom stats output to script=== Nitro Monkey Benchmark ===
Cold read (first access):
real 0m5.234s (warming cache)
Warm read (cached):
real 0m0.891s (5.8x faster!)
Random access (seek test):
real 0m0.123s (instant from cache)
# Cause: FUSE mount crashed
# Solution: Force unmount and remount
fusermount -u /mnt/nitro
# or
sudo umount -l /mnt/nitro
# Remount
python3 nitro_monkey_v12.py /path/to/data /mnt/nitro# Check actual cache size
ps aux | grep nitro_monkey
# Look at RSS column
# Solution: Reduce cache pool
python3 nitro_monkey_v12.py /data /mnt/nitro --pool 2.0# This is expected behavior!
# First access: Reads from disk + compresses + caches (5-10s)
# Second access: Instant from RAM cache (<1s)
# To pre-warm cache:
find /mnt/nitro -type f -exec cat {} > /dev/null \;# Ensure FUSE allows other users
# Edit /etc/fuse.conf:
sudo nano /etc/fuse.conf
# Uncomment:
user_allow_other
# Or run with sudo (not recommended):
sudo python3 nitro_monkey_v12.py /data /mnt/nitro# Check source path is correct
ls -la /path/to/source
# Verify FUSE mount
mount | grep fuse
# Check for disk errors
dmesg | grep -i error
# Remount with debug
fusermount -u /mnt/nitro
python3 nitro_monkey_v12.py /data /mnt/nitro --pool 4.0
# Watch for error messages# /etc/systemd/system/nitro-monkey.service
[Unit]
Description=Nitro Monkey FUSE Cache
After=network.target
[Service]
Type=simple
User=youruser
ExecStart=/usr/bin/python3 /opt/nitro-monkey/nitro_monkey_v12.py \
/var/data /mnt/nitro --pool 8.0
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target# Enable service
sudo systemctl daemon-reload
sudo systemctl enable nitro-monkey
sudo systemctl start nitro-monkey
# Check status
sudo systemctl status nitro-monkey# Dockerfile for containerized Nitro Monkey
FROM python:3.11-slim
RUN apt-get update && apt-get install -y \
fuse3 \
libfuse3-dev \
&& rm -rf /var/lib/apt/lists/*
RUN pip install fusepy lz4 psutil
COPY nitro_monkey_v12.py /app/
WORKDIR /app
# Requires --privileged and --device /dev/fuse
CMD ["python3", "nitro_monkey_v12.py", "/source", "/mount", "--pool", "4.0"]from nitro_monkey_v12 import NitroMonkeyV12
from fuse import FUSE
# Create cache instance
cache = NitroMonkeyV12(
root="/path/to/data",
pool_size_gb=8.0
)
# Mount with custom options
fuse_opts = {
'foreground': True,
'allow_other': True,
'kernel_cache': True
}
FUSE(cache, "/mnt/nitro", **fuse_opts)Model Weight Characteristics:
# Typical FP16 model weights
weights = [0.2341, 0.2342, 0.2340, 0.2343, ...]
# High locality = excellent compression
# LZ4 algorithm:
# 1. Finds repeated byte patterns
# 2. Replaces with short references
# 3. FP16 values cluster tightly
# Result: 70-80% compression ratioCompression Benchmarks:
| Data Type | LZ4 Ratio | Speed | Nitro Use |
|---|---|---|---|
| FP16 weights | 2.5:1 (60%) | 4 GB/s | ✅ Perfect |
| FP8 quantized | 3.5:1 (71%) | 4 GB/s | ✅ Excellent |
| Text/JSON | 3:1 (66%) | 4 GB/s | ✅ Great |
| JPEG images | 1.1:1 (9%) | 4 GB/s | |
| Video files | 1.0:1 (0%) | 4 GB/s |
# Why use inodes instead of paths?
st = os.stat(full_path) # Follows symlinks!
inode = st.st_ino
# Benefits:
# 1. Hardlinks share same inode → cache once
# 2. Symlinks resolve to target → no duplicate cache
# 3. Renamed files keep same inode → cache persists
# Example:
# /models/llama.gguf (inode: 12345)
# /models/backup/llama.gguf (hardlink, inode: 12345)
# Only cached once! Saves RAM.# Global lock for cache operations
self._io_lock = threading.Lock()
# Critical sections protected:
with self._io_lock:
# 1. Cache lookup (read)
if inode in self.header_cache:
return cached_data
# 2. Cache insertion (write)
self.header_cache[inode] = new_data
# 3. LRU eviction (write)
old = self.header_cache.popitem(last=False)
# Lock-free operations:
# - Disk reads (os.pread)
# - Compression (lz4.compress)
# - Background workers (daemon threads)# FUSE = Filesystem in Userspace
# Your Python code handles filesystem operations
class MyFS(Operations):
def getattr(self, path):
"""Called by: ls, stat, file managers"""
return file_metadata
def readdir(self, path):
"""Called by: ls, directory listing"""
return list_of_files
def open(self, path, flags):
"""Called by: open(), fopen()"""
return file_descriptor
def read(self, path, length, offset, fh):
"""Called by: read(), fread()"""
return data_bytes # This is where magic happens!| Feature | Sparse Files | Nitro Monkey |
|---|---|---|
| Purpose | Disk space illusion | Speed illusion |
| Mechanism | Filesystem holes | RAM cache |
| Storage | Claims 10PB, uses 0B | Claims 4GB, uses 4GB |
| Speed | Disk speed | RAM speed |
| Compression | None | LZ4 (70-80%) |
| Use Case | Swap simulation | Read acceleration |
- Persistent cache: Save/restore cache between runs
- Smart prefetch: Machine learning-based prediction
- Compression profiles: Per-file-type settings
- REST API: Control cache remotely
- Prometheus metrics: Integration with monitoring
- GUI dashboard: Real-time visualization
- Distributed cache: Share cache across machines
- GPU acceleration: Compress on GPU
- Cloud integration: S3/GCS backends
- Encryption layer: Transparent crypto
- Deduplication: Content-aware caching
We welcome contributions! Areas of interest:
- Persistent cache implementation
- Automated benchmarking suite
- Memory profiling tools
- Additional compression algorithms (zstd, brotli)
- Web dashboard
- Configuration file support
- Better error messages
- Unit tests
- Windows support (WSL2)
- macOS support (osxfuse)
- Alternative languages (Rust, Go)
# Contribution workflow
git clone https://github.com/yourusername/nitro-monkey.git
cd nitro-monkey
git checkout -b feature/your-feature
# Make changes
python3 -m pytest tests/ # Run tests
git commit -am "Add feature: your description"
git push origin feature/your-feature
# Open Pull Request on GitHubMIT License
Copyright (c) 2024 Nitro Monkey Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
PRODUCTION-READY SOFTWARE WITH REASONABLE PRECAUTIONS
✅ Safe for:
- Development environments
- Testing systems
- Personal workstations
- Read-heavy workloads
⚠️ Consider carefully for:
- Production servers (test first)
- Write-heavy workloads (read-only caching)
- Mission-critical systems (have backups)
❌ Not suitable for:
- Write-through caching (use dedicated solutions)
- Network filesystems (high latency)
- Real-time systems (non-deterministic cache)
The authors provide this software "as-is" without warranty.
Test thoroughly in your environment before production use.
- 📖 Documentation: Wiki
- 💬 Discussions: GitHub Discussions
- 🐛 Bug Reports: GitHub Issues
- 💡 Feature Requests: GitHub Issues
- 🗨️ Discord: Join Server
- 🐦 Twitter: @NitroMonkeyDev
- 📧 Email: nitro-monkey@example.com
For enterprise deployments, custom features, or consulting:
- FUSE - Filesystem in Userspace
- fusepy - Python FUSE bindings
- LZ4 - Extremely fast compression
- psutil - System monitoring
- Evil Monkey Swap - Original sparse file research
- Ollama - LLM serving that needs speed
- Docker overlayfs - Layered filesystem concepts
- Redis - In-memory caching philosophy
Thanks to all who have contributed to this project!
- @yourusername - Creator & Maintainer
- See CONTRIBUTORS.md for full list
- CacheFS - Alternative caching layer
- bcachefs - Kernel-level caching filesystem
- mergerfs - Union filesystem
# Installation
pip install fusepy lz4 psutil
# Basic mount
python3 nitro_monkey_v12.py /source /mount --pool 4.0
# Unmount
fusermount -u /mount
# Monitor
watch -n 1 'df -h && free -h'
# Benchmark
time cat /mount/largefile > /dev/null # First run
time cat /mount/largefile > /dev/null # Cached run
# Troubleshoot
dmesg | grep fuse # Kernel messages
ps aux | grep nitro # Process status
cat /proc/self/mountstats | grep mount # FUSE statsMade with 🐒 and ⚡ for speed demons
"Why wait for disk when you have RAM?"
Star ⭐ this repo if Nitro Monkey accelerated your workflow!