|
1 | | -# ⏳︎ tskv — Time-Series Key-Value Store |
| 1 | +# ⏳︎ tskv — Time-Window **T**ime-**S**eries **K**ey-**V**alue Cache |
2 | 2 |
|
3 | | -**TL;DR:** Single-node, crash-safe time-series KV store with a non-blocking TCP server (Linux **epoll**) and LSM-style storage: **Write-Ahead Log (WAL) → memtable → immutable SSTables**, plus a background compaction worker. Built with **C++23 modules**; no third-party libraries. |
| 3 | +**TL;DR:** `tskv` is a single-node, in-memory **time-window cache** for time-series data, with: |
| 4 | + |
| 5 | +- A bounded sliding retention window (e.g., "last N minutes") |
| 6 | +- A simple binary protocol over a non-blocking TCP server (Linux **epoll**) |
| 7 | +- Time-partitioned in-memory segments for efficient expiration |
| 8 | +- Modern **C++23** with modules; no third-party libraries |
| 9 | + |
| 10 | +Stage-0 (`v1.0`) focuses on a small, understandable core: a hot time window with clear semantics and basic observability, leaving WAL, multi-threading, and advanced features for future versions. |
4 | 11 |
|
5 | 12 | --- |
6 | 13 |
|
7 | 14 | ## ◎ Goals |
8 | | -- Demonstrate disciplined systems design in modern C++. |
9 | | -- Show clear **durability** boundaries (WAL append / optional sync) and **read-after-write** visibility. |
10 | | -- Keep **backpressure** and buffers **bounded** for predictable latency. |
11 | | -- Favor **correctness + measurable performance** over feature breadth. |
12 | | - |
13 | | -## ⛶ Architecture |
14 | | -- **Write path:** append to **WAL** → (optional `fdatasync`) → apply to **memtable** → periodic **flush** to **SSTable** (immutable, sorted). |
15 | | -- **Read path:** **memtable** first → then newest-to-oldest **SSTables**; per-file **Bloom filter** to skip negatives; index to jump to the right block. |
16 | | -- **Compaction:** merge overlapping SSTables, keep newest versions, drop obsolete ones; install via **manifest** with durable rename. |
17 | | - |
18 | | -## ⚑ Roadmap (high-level) |
19 | | -- [x] v0.1 — Bootstrap: README, CLI, PR template |
20 | | -- [x] v0.2 — Non-blocking TCP + epoll echo; clean shutdown |
21 | | -- [ ] v0.3 — Framing: header + length; PING/PONG |
22 | | -- [ ] v0.4 — Connection state: RX/TX rings; backpressure cap |
23 | | -- [ ] v0.5 — Engine queues: SPSC/MPSC; dispatcher |
24 | | -- [ ] v0.6 — WAL v1: append+CRC; sync policy flag |
25 | | -- [ ] v0.7 — Recovery: replay WAL; torn-tail safe |
26 | | -- [ ] v0.8 — Memtable v0: std::map; PUT/GET end-to-end |
27 | | -- [ ] v0.9 — SSTable v1: writer/reader; mmap; footer |
28 | | -- [ ] v0.10 — Manifest: live tables; durable rename |
29 | | -- [ ] v0.11 — Wire-through: GET/PUT via SST path |
30 | | -- [ ] v0.12 — Bloom filters: per-SST; bits/key tuning |
31 | | -- [ ] v0.13 — Memtable v1: skiplist + iterator |
32 | | -- [ ] v0.14 — SCAN RPC: streaming RESP; writev batches |
33 | | -- [ ] v0.15 — Concurrency: N I/O, M engine; fairness |
34 | | -- [ ] v0.16 — Metrics: counters + p50/p95/p99 endpoint |
35 | | -- [ ] v0.17 — Compaction v1: merge + manifest install |
36 | | -- [ ] v0.18 — Chaos tests: disk-full; kill-9 loops |
37 | | -- [ ] v0.19 — Perf pass: micro/macro benches; notes |
38 | | -- [ ] v1.0 — Polish: docs, demo.sh, ASan/UBSan; release |
39 | | - |
40 | | -## ∷ C++ Module Layout |
41 | | -- `tskv.common.*` — logging, metrics, ring buffers, fs helpers |
42 | | -- `tskv.net.*` — socket (non-blocking), reactor (**epoll**, edge-triggered), connection, rpc |
43 | | -- `tskv.kv.*` — engine, wal, memtable, sstable, manifest, compaction, filters |
44 | | - |
45 | | -## ∑ Metrics (planned) |
46 | | -- **net:** connections_open, rx_bytes_total, tx_bytes_total, backpressure_events_total |
47 | | -- **rpc:** put_total, get_total, scan_total, errors_total |
48 | | -- **wal/sstable:** appends_total, fsync_total, files_total, bloom_negative_total |
49 | | -- **latency:** p50/p95/p99 for GET & PUT |
| 15 | + |
| 16 | +- Provide a compact, readable example of a **time-window time-series cache**: |
| 17 | + - Bounded memory via a fixed retention window |
| 18 | + - Explicit "visible data" contract: recent data only |
| 19 | +- Demonstrate disciplined systems design in **modern C++**: |
| 20 | + - C++23 modules |
| 21 | + - Non-blocking I/O with **epoll** |
| 22 | +- Favor **correctness + clear invariants** over feature breadth: |
| 23 | + - Simple, explicit write and read paths |
| 24 | + - Straightforward retention / expiration logic |
| 25 | +- Keep latency and resource usage **predictable**: |
| 26 | + - No unbounded growth from infinite history |
| 27 | + - Easy-to-reason-about hot path |
| 28 | + |
| 29 | +--- |
| 30 | + |
| 31 | +## ⛶ Architecture (Stage-0) |
| 32 | + |
| 33 | +### Data model |
| 34 | + |
| 35 | +- Keys are time-series identifiers (e.g. `cpu.user`, `service=api,host=foo`). |
| 36 | +- Each write is a tuple `(series_id, timestamp, value)`. |
| 37 | +- The store maintains only data within a **sliding time window**: |
| 38 | + - `timestamp >= now - WINDOW` |
| 39 | + - Older data is considered expired and is eventually dropped. |
| 40 | + |
| 41 | +### In-memory layout |
| 42 | + |
| 43 | +- Data is stored in **time-partitioned segments**: |
| 44 | + - Each segment covers a fixed time slice (e.g. 1–10 seconds). |
| 45 | + - Segments are organized in time order (oldest → newest). |
| 46 | + - New writes go to the current "tail" segment. |
| 47 | +- A periodic retention pass drops the oldest segments whose time range is fully outside the configured window. |
| 48 | + |
| 49 | +This keeps memory and data size bounded by `WINDOW`, not by total insert volume. |
| 50 | + |
| 51 | +### Network path |
| 52 | + |
| 53 | +- Single-node server using **non-blocking TCP** and **epoll**. |
| 54 | +- Simple length-prefixed framing for requests and responses. |
| 55 | +- Initial RPCs: |
| 56 | + - `PING / PONG` for connectivity checks. |
| 57 | + - `PUT_TS(series_id, timestamp, value)` to append a point. |
| 58 | + - `GET_TS_LATEST(series_id)` to fetch the latest point in-window. |
| 59 | + - `RANGE(series_id, from_ts, to_ts)` to read points for a series over a time range (clipped to the window). |
| 60 | + |
| 61 | +--- |
| 62 | + |
| 63 | +## ⚑ Roadmap |
| 64 | + |
| 65 | +### Implemented |
| 66 | + |
| 67 | +- [x] **v0.1 — Bootstrap** |
| 68 | + - README, minimal CLI stub, basic project layout |
| 69 | + - PR template and basic coding conventions |
| 70 | + |
| 71 | +- [x] **v0.2 — Non-blocking TCP** |
| 72 | + - Non-blocking server with **epoll** |
| 73 | + - Basic echo handler for manual testing |
| 74 | + - Clean shutdown path |
| 75 | + |
| 76 | +### Planned to v1.0 (Stage-0) |
| 77 | + |
| 78 | +- [ ] **v0.3 — Framing + basic RPCs** |
| 79 | + - Length-prefixed request/response framing |
| 80 | + - `PING` / `PONG` and error responses |
| 81 | + - Skeleton handlers for time-series commands |
| 82 | + |
| 83 | +- [ ] **v0.4 — In-memory window store v0** |
| 84 | + - Global `WINDOW` config (e.g. last N minutes) |
| 85 | + - Single in-memory container for `(series_id, timestamp, value)` |
| 86 | + - Simple, periodic cleanup of expired entries |
| 87 | + - `PUT_TS` + `GET_TS_LATEST` end-to-end |
| 88 | + |
| 89 | +- [ ] **v0.5 — Time-partitioned segments** |
| 90 | + - Replace the single container with fixed-duration segments |
| 91 | + - Append writes to the current segment |
| 92 | + - Drop whole segments when they fall completely out of window |
| 93 | + - Basic `RANGE(series_id, from_ts, to_ts)` over segments |
| 94 | + |
| 95 | +- [ ] **v0.6 — Window-aware introspection** |
| 96 | + - `WINDOW_INFO` RPC: |
| 97 | + - Window size, number of segments |
| 98 | + - Approximate memory usage |
| 99 | + - Debug dump of segments and series counts |
| 100 | + |
| 101 | +- [ ] **v0.7 — Metrics** |
| 102 | + - Simple counters and gauges: |
| 103 | + - `ts_put_total`, `ts_get_latest_total`, `ts_range_total`, `ts_errors_total` |
| 104 | + - `window_segments`, `window_series_approx`, `window_points_approx` |
| 105 | + - Text or simple binary metrics endpoint/command |
| 106 | + |
| 107 | +- [ ] **v0.8 — Indexing pass (per-segment)** |
| 108 | + - Optional per-segment index: |
| 109 | + - `series_id -> offsets` |
| 110 | + - Speed up `GET_TS_LATEST` and `RANGE` without scanning all entries |
| 111 | + - Microbenchmarks for lookup vs. scan |
| 112 | + |
| 113 | +- [ ] **v0.9 — Reliability + perf pass** |
| 114 | + - Basic property tests for window semantics: |
| 115 | + - Writes with timestamps < `now - WINDOW` are never visible |
| 116 | + - Writes with timestamps in the window remain visible |
| 117 | + - Simple load generator for PUT/GET/RANGE |
| 118 | + - First round of notes on throughput/latency |
| 119 | + |
| 120 | +- [ ] **v1.0 — Stage-0 release** |
| 121 | + - Minimal but complete time-window cache: |
| 122 | + - Protocol, window store, segments, indexing, basic metrics |
| 123 | + - Documentation: |
| 124 | + - Architecture overview |
| 125 | + - Wire protocol reference |
| 126 | + - Example usage script (`demo.sh`) |
| 127 | + - Sanitizers in CI (ASan/UBSan) and a small test suite |
| 128 | + |
| 129 | +--- |
| 130 | + |
| 131 | +## ∷ C++ Module Layout (Stage-0) |
| 132 | + |
| 133 | +- `tskv.common.*` |
| 134 | + - Logging, basic metrics types, time helpers |
| 135 | + - Small ring buffers / utility containers |
| 136 | +- `tskv.net.*` |
| 137 | + - Socket wrapper (non-blocking) |
| 138 | + - Reactor (**epoll**, edge-triggered) |
| 139 | + - Connection and RPC framing |
| 140 | +- `tskv.window.*` |
| 141 | + - In-memory segments |
| 142 | + - Window management (retention, expiration) |
| 143 | + - Query execution (latest / range) |
| 144 | + - Simple per-segment indexing |
| 145 | + |
| 146 | +--- |
| 147 | + |
| 148 | +## ∑ Metrics (Stage-0, planned) |
| 149 | + |
| 150 | +- **net:** |
| 151 | + - `net_connections_open` |
| 152 | + - `net_rx_bytes_total` |
| 153 | + - `net_tx_bytes_total` |
| 154 | + |
| 155 | +- **rpc:** |
| 156 | + - `rpc_ping_total` |
| 157 | + - `ts_put_total` |
| 158 | + - `ts_get_latest_total` |
| 159 | + - `ts_range_total` |
| 160 | + - `rpc_errors_total` |
| 161 | + |
| 162 | +- **window:** |
| 163 | + - `window_segments` |
| 164 | + - `window_points_approx` |
| 165 | + - `window_series_approx` |
| 166 | + |
| 167 | +- **latency (sample-based, coarse):** |
| 168 | + - p50 / p95 / p99 for: |
| 169 | + - `PUT_TS` |
| 170 | + - `GET_TS_LATEST` |
| 171 | + - `RANGE` |
| 172 | + |
| 173 | +--- |
| 174 | + |
| 175 | +## After v1.0 (ideas, not committed) |
| 176 | + |
| 177 | +These are potential extensions beyond the stage-0 scope: |
| 178 | + |
| 179 | +- **Durability for the hot window** |
| 180 | + - WAL segments per time-partitioned segment |
| 181 | + - Startup replay to reconstruct the last N minutes after a crash |
| 182 | + |
| 183 | +- **Multi-threading** |
| 184 | + - Split I/O and engine into separate threads |
| 185 | + - Shard data across multiple engine threads by series id |
| 186 | + |
| 187 | +- **Richer time-series semantics** |
| 188 | + - Per-series TTL overrides |
| 189 | + - Server-side aggregates: |
| 190 | + - Windowed `SUM` / `AVG` / `MIN` / `MAX` RPCs |
| 191 | + |
| 192 | +- **Advanced observability** |
| 193 | + - More detailed metrics (per-series or per-connection) |
| 194 | + - Debug RPCs to inspect segment contents, hot keys, etc. |
| 195 | + |
| 196 | +- **Replication / HA experiments** |
| 197 | + - Simple follower replication for the time window |
| 198 | + - Eventually-consistent read replicas |
| 199 | + |
| 200 | +Stage-0 (`v1.0`) stays deliberately small: a single-node, in-memory time-window cache with a clear contract and straightforward implementation. Everything else can grow out of that foundation. |
0 commit comments