Skip to content

KiranRajeev-KV/memoria

Repository files navigation

Memoria

A key-value storage engine built from scratch in Go, following the Bitcask architecture (similar to LevelDB/RocksDB).

Prerequisites

  • Go 1.25.0 or later
  • just task runner (optional, for convenience commands)

Current Stage

Stage 1.3: Write Batches - In Progress

Architecture

Memoria follows the Bitcask architecture (a log-structured hash table):

image

On-disk record format:

[4B record length][4B CRC32][8B timestamp][1B type][4B key len][4B val len][key][value]

Locking model:

  • main lock (sync.RWMutex) -- protects KeyDir and segment lists
  • wal lock (sync.Mutex) -- serializes writes to the active segment
  • cache lock (sync.Mutex) -- protects the LRU file cache
  • Lock ordering: main -> cache (to prevent deadlocks during merge)

Implemented Features

Stage 1.2 - Bitcask Architecture (Complete)

  • Concurrent-safe key-value store with thread-safe access
  • Write-Ahead Log (WAL) with CRC32 and fsync
  • Log recovery on startup with corruption resilience (skips bad records)
  • In-memory hash index (KeyDir) for O(1) disk seeks
  • KeyDir tracks (file ID, offset, value size, record size)
  • Single disk seek for reads via ReadAt
  • Automatic segment rotation when MaxSize is exceeded (default 16MB)
  • Segment statistics tracking (LiveBytes, DeadBytes, LiveKeys, DeadKeys)
  • Background merge process for space reclamation (streaming, memory-efficient)
  • Fragmentation threshold and dead bytes threshold triggers
  • Hint file support (.hint) with magic number, versioning, and per-entry CRC32
  • Active segment truncation on startup after crash
  • RecordSize tracking for accurate merge heuristics
  • Log truncation (old segments deleted after merge)
  • LRU file cache for segment data (configurable max size)
  • Orphan file cleanup (.hint and .merge files without corresponding .data)
  • TOCTOU race protection during merge (re-checks KeyDir under lock)

Stage 1.1 - WAL (Complete)

  • Sequential log file appending
  • Binary log record encoding with CRC32
  • Log recovery on startup (Replay mechanism)
  • fsync for durability guarantees
  • Row-level locking

Configuration

Memoria is configured via MemoriaConfig. All fields have sensible defaults:

cfg := &MemoriaConfig{
    DataDir: "data",
    FileCache: &FileCacheConfig{
        MaxSize: 10,
    },
    Segment: &SegmentConfig{
        MaxSize:                16 * 1024 * 1024,  // 16MB per segment
        FragmentationThreshold: 0.6,               // 60% dead keys triggers merge
        DeadBytesThreshold:     512 * 1024 * 1024, // 512MB dead bytes triggers merge
        MergeCheckInterval:     time.Minute,        // How often to check for merge
    },
}
db, err := NewMemoria(cfg)

Quick Start

package main

import (
    "fmt"
    "github.com/KiranRajeev-KV/memoria"
)

func main() {
    db, err := memoria.NewMemoria(&memoria.MemoriaConfig{
        DataDir: "my-db",
    })
    if err != nil {
        panic(err)
    }
    defer db.Close()

    // Write
    db.set("name", "Alice")
    db.set("age", "30")

    // Read
    value, _ := db.get("name")
    fmt.Println(value) // "Alice"

    // Delete
    db.delete("age")

    // View all keys
    db.view()

    // Get count
    db.getCount()
}

Project Structure

memoria/
  main.go           # Entry point, CLI demo
  memdb.go          # Core database: memoria struct, CRUD, recovery, initialization
  segment.go        # Segment management: discovery, rotation, hint file I/O
  record.go         # Log record encoding/decoding (binary protocol)
  cache.go          # LRU file cache for open segment files
  options.go        # MemoriaConfig, FileCacheConfig, SegmentConfig
  merge.go          # Background merge: streaming merge, fragmentation detection
  memdb_test.go     # 53+ comprehensive tests
  justfile          # Task runner commands (test, build, lint, coverage)
  docs/
    kv-storage-engine-roadmap.md   # Full 8-phase implementation roadmap
    Building a Go Key-Value Storage Engine.md  # Architectural reference guide

On-Disk File Formats

Extension Purpose
.data Segment files (append-only log of records)
.hint Hint files (KeyDir snapshot for fast recovery)
.merge Temporary merge output (cleaned up on startup)

Segment filenames are nanosecond-precision Unix timestamps (e.g., 1712000000000000000.data).

Running Tests

# Run all tests
go test -race -v ./...

# Run with coverage
go test -cover ./...

# Using just task runner
just test        # Run tests
just test-race   # Run with race detector
just test-cover  # Run with coverage
just coverage    # Show detailed coverage report

The test suite includes 53+ tests covering:

  • Basic CRUD operations (set, get, delete)
  • Input validation (empty keys, empty values)
  • Recovery and persistence (restart, rotation, tombstones)
  • Corruption handling (partial writes, corrupt hint files)
  • Concurrency (100+ goroutines, race-detector verified)
  • Segment rotation and statistics
  • Background merge (triggers, deleted keys, updated keys, post-restart)
  • Hint file creation, recovery, and corruption fallback
  • LRU cache eviction (multi-segment, size=1 edge case)
  • Orphan file cleanup (.hint and .merge)
  • Edge cases (large 1MB values, special characters, emoji, zero-length segments)

Changelog

See CHANGELOG.md for detailed feature history per stage.

Roadmap

See docs/kv-storage-engine-roadmap.md for the full implementation plan.

About

A learning project implementing a Bitcask-style KV engine in Go with WAL, merge, and hint files

Topics

Resources

Stars

Watchers

Forks

Contributors