A key-value storage engine built from scratch in Go, following the Bitcask architecture (similar to LevelDB/RocksDB).
- Go 1.25.0 or later
- just task runner (optional, for convenience commands)
Stage 1.3: Write Batches - In Progress
Memoria follows the Bitcask architecture (a log-structured hash table):
On-disk record format:
[4B record length][4B CRC32][8B timestamp][1B type][4B key len][4B val len][key][value]
Locking model:
mainlock (sync.RWMutex) -- protects KeyDir and segment listswallock (sync.Mutex) -- serializes writes to the active segmentcachelock (sync.Mutex) -- protects the LRU file cache- Lock ordering: main -> cache (to prevent deadlocks during merge)
- Concurrent-safe key-value store with thread-safe access
- Write-Ahead Log (WAL) with CRC32 and fsync
- Log recovery on startup with corruption resilience (skips bad records)
- In-memory hash index (KeyDir) for O(1) disk seeks
- KeyDir tracks (file ID, offset, value size, record size)
- Single disk seek for reads via ReadAt
- Automatic segment rotation when MaxSize is exceeded (default 16MB)
- Segment statistics tracking (LiveBytes, DeadBytes, LiveKeys, DeadKeys)
- Background merge process for space reclamation (streaming, memory-efficient)
- Fragmentation threshold and dead bytes threshold triggers
- Hint file support (.hint) with magic number, versioning, and per-entry CRC32
- Active segment truncation on startup after crash
- RecordSize tracking for accurate merge heuristics
- Log truncation (old segments deleted after merge)
- LRU file cache for segment data (configurable max size)
- Orphan file cleanup (.hint and .merge files without corresponding .data)
- TOCTOU race protection during merge (re-checks KeyDir under lock)
- Sequential log file appending
- Binary log record encoding with CRC32
- Log recovery on startup (Replay mechanism)
- fsync for durability guarantees
- Row-level locking
Memoria is configured via MemoriaConfig. All fields have sensible defaults:
cfg := &MemoriaConfig{
DataDir: "data",
FileCache: &FileCacheConfig{
MaxSize: 10,
},
Segment: &SegmentConfig{
MaxSize: 16 * 1024 * 1024, // 16MB per segment
FragmentationThreshold: 0.6, // 60% dead keys triggers merge
DeadBytesThreshold: 512 * 1024 * 1024, // 512MB dead bytes triggers merge
MergeCheckInterval: time.Minute, // How often to check for merge
},
}
db, err := NewMemoria(cfg)package main
import (
"fmt"
"github.com/KiranRajeev-KV/memoria"
)
func main() {
db, err := memoria.NewMemoria(&memoria.MemoriaConfig{
DataDir: "my-db",
})
if err != nil {
panic(err)
}
defer db.Close()
// Write
db.set("name", "Alice")
db.set("age", "30")
// Read
value, _ := db.get("name")
fmt.Println(value) // "Alice"
// Delete
db.delete("age")
// View all keys
db.view()
// Get count
db.getCount()
}memoria/
main.go # Entry point, CLI demo
memdb.go # Core database: memoria struct, CRUD, recovery, initialization
segment.go # Segment management: discovery, rotation, hint file I/O
record.go # Log record encoding/decoding (binary protocol)
cache.go # LRU file cache for open segment files
options.go # MemoriaConfig, FileCacheConfig, SegmentConfig
merge.go # Background merge: streaming merge, fragmentation detection
memdb_test.go # 53+ comprehensive tests
justfile # Task runner commands (test, build, lint, coverage)
docs/
kv-storage-engine-roadmap.md # Full 8-phase implementation roadmap
Building a Go Key-Value Storage Engine.md # Architectural reference guide
| Extension | Purpose |
|---|---|
.data |
Segment files (append-only log of records) |
.hint |
Hint files (KeyDir snapshot for fast recovery) |
.merge |
Temporary merge output (cleaned up on startup) |
Segment filenames are nanosecond-precision Unix timestamps (e.g., 1712000000000000000.data).
# Run all tests
go test -race -v ./...
# Run with coverage
go test -cover ./...
# Using just task runner
just test # Run tests
just test-race # Run with race detector
just test-cover # Run with coverage
just coverage # Show detailed coverage reportThe test suite includes 53+ tests covering:
- Basic CRUD operations (set, get, delete)
- Input validation (empty keys, empty values)
- Recovery and persistence (restart, rotation, tombstones)
- Corruption handling (partial writes, corrupt hint files)
- Concurrency (100+ goroutines, race-detector verified)
- Segment rotation and statistics
- Background merge (triggers, deleted keys, updated keys, post-restart)
- Hint file creation, recovery, and corruption fallback
- LRU cache eviction (multi-segment, size=1 edge case)
- Orphan file cleanup (.hint and .merge)
- Edge cases (large 1MB values, special characters, emoji, zero-length segments)
See CHANGELOG.md for detailed feature history per stage.
See docs/kv-storage-engine-roadmap.md for the full implementation plan.