Skip to content

kk-code-lab/seglake

Repository files navigation

Seglake

A minimal, S3‑compatible object store focused on correctness and hard durability. Implementation: append‑only segments + object manifests + metadata in SQLite (WAL, synchronous=FULL).


Why Seglake

Seglake is a minimal, correctness‑first S3‑compatible object store for single‑node deployments and local testing.

Why it exists:

  • Predictable durability: data is appended to segments, fsynced, then manifests + metadata are committed via SQLite WAL before ACK.
  • Auditable on‑disk format: append‑only segments + standalone manifests make storage easy to inspect, validate, and recover.
  • Small operational surface: no cluster quorum, no distributed consensus — fewer moving parts.
  • S3‑compatible core: PUT/GET/HEAD, list v1/v2, range GET (incl. multi‑range), multipart upload, versioning.

Trade‑offs:

  • Not a distributed system: no built‑in HA or geo‑replication.
  • Designed for correctness and simplicity, not maximum throughput or elastic scale.

If you need a transparent, robust S3 backend for local/edge/on‑prem setups, Seglake is built for that niche.


Key features

  • S3 API: PUT/GET/HEAD, ListObjects V1/V2, ListBuckets, Range GET (single and multi‑range)
  • SigV4 + presigned URL (SigV2 not supported)
  • SigV4 streaming uploads (Content-Encoding: aws-chunked) with chunk/trailer validation
  • aws-chunked parser fuzz tests (SigV4 streaming)
  • Multipart upload (init/upload/list/complete/abort)
  • Durability contract: fsync segments + WAL commit before ACK
  • Append‑only segments, 4 MiB chunking, BLAKE3 per chunk
  • Ops tooling: fsck, scrub, rebuild-index, snapshot, gc-plan/run, gc-rewrite, mpu-gc, repl-validate
  • Access policies (MVP): per-key + bucket policies + conditions
  • Public buckets (unsigned access) via -public-buckets + bucket policy

Quick start

Build

make build

Run (dev)

./build/seglake -data-dir ./data -access-key test -secret-key testsecret

Default address: :9000.


Examples (awscli)

List buckets:

AWS_ACCESS_KEY_ID=test AWS_SECRET_ACCESS_KEY=testsecret AWS_DEFAULT_REGION=us-east-1 \
  aws s3 ls --endpoint-url http://localhost:9000

PUT/GET:

AWS_ACCESS_KEY_ID=test AWS_SECRET_ACCESS_KEY=testsecret AWS_DEFAULT_REGION=us-east-1 \
  aws s3 cp ./file.bin s3://demo/file.bin --endpoint-url http://localhost:9000

AWS_ACCESS_KEY_ID=test AWS_SECRET_ACCESS_KEY=testsecret AWS_DEFAULT_REGION=us-east-1 \
  aws s3 cp s3://demo/file.bin ./file.bin --endpoint-url http://localhost:9000

More: docs/ops.md.


Maintenance quickstart

Run unsafe ops without stopping the server by entering maintenance (quiesced) mode:

./build/seglake -mode maintenance -maintenance-action enable
./build/seglake -mode gc-run -gc-force
./build/seglake -mode maintenance -maintenance-action disable

Notes:

  • gc-run (and other destructive ops) require -gc-force.
  • When the server is running, ops use the local admin socket + token in the data dir (e.g. ./data/.seglake-admin.sock and ./data/.seglake-admin.token).
  • See docs/ops.md for full details and troubleshooting.

S3 compatibility (selected)

Feature Status Notes
ListBuckets Yes GET /
ListObjects V1 Yes GET /<bucket>?prefix=...
ListObjects V2 Yes GET /<bucket>?list-type=2
GetBucketLocation Yes GET /<bucket>?location
CreateBucket Yes PUT /<bucket>
DeleteBucket Yes Only if empty
PutObject Yes PUT /<bucket>/<key>
GetObject Yes GET /<bucket>/<key>
HeadObject Yes HEAD /<bucket>/<key>
DeleteObject Yes Idempotent
Versioned GET/HEAD/DELETE Yes ?versionId=...
Range GET Yes Single + multi‑range
CopyObject Yes x-amz-copy-source
Multipart upload Yes init/upload/list/complete/abort/list uploads
SigV4 auth Yes Header + presigned
SigV4 streaming Yes aws-chunked + trailer checksum validation
SigV2 auth No Not supported
Presigned URLs Yes TTL 1..7 days
ETag behavior Yes Single = MD5, multipart = md5(part md5s) + "-N"
CORS / OPTIONS Yes Preflight supported

Full scope: docs/spec.md.


Architecture (short)

  • Chunking: 4 MiB, BLAKE3 per chunk
  • Segments: append‑only, ~1 GiB or 10 min idle
  • Manifests: binary files describing object layout
  • Metadata: SQLite (WAL, synchronous=FULL)
  • Durability barrier: fsync segments → write manifest + meta tx → WAL flush → ACK

Full details: docs/spec.md.


Operations and maintenance

Available modes: status, fsck, scrub, rebuild-index, snapshot, support-bundle, gc-plan/gc-run, gc-rewrite, mpu-gc-plan/mpu-gc-run, repl-validate, buckets.

Checklists and examples: docs/ops.md
Smoke scripts: scripts/
Helper CLI: scripts/segctl

Quick examples:

  • scripts/segctl bucket list
  • scripts/segctl bucket create demo --versioning unversioned
  • scripts/segctl key list
  • scripts/segctl bucket-policy get

Notes:

  • Admin wrappers for keys, buckets, bucket policies.
  • Uses SEGLAKE_DATA_DIR (default ./data).

Replication (multi‑site)

Model: LWW + tombstone, HLC for ordering. Modes: repl-pull, repl-push, repl-bootstrap.

Examples and notes: docs/ops.md.


Limits and API behavior (selected)

  • Max object size: -max-object-size (default 5 GiB, 0 = unlimited)
  • Multipart min part size: 5 MiB (except the last part)
  • Presigned TTL: 1..7 days
  • Virtual-hosted style enabled by default

Full list: docs/spec.md.


Development

Key Makefile targets:

make build
make run
make test
make test-race
make test-e2e
make test-all
make fmt
make lint
make check

Documentation

  • docs/spec.md — spec/behavior and API scope
  • docs/ops.md — deployment, TLS, policies, GC, repl
  • docs/optimization.md — performance notes
  • docs/roadmap.md — roadmap / planned work
  • examples/ — systemd, Caddy, public bucket policy

License

LICENSE

About

S3‑compatible object storage engine in Go with SigV4, replication, and GC.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors