diff --git a/documentation/concepts/deep-dive/indexes.md b/documentation/concepts/deep-dive/indexes.md index 68a865032..7f2b7d861 100644 --- a/documentation/concepts/deep-dive/indexes.md +++ b/documentation/concepts/deep-dive/indexes.md @@ -14,6 +14,16 @@ Indexing is available for [symbol](/docs/concepts/symbol/) columns in both table and [materialized views](/docs/concepts/materialized-views). Index support for other types will be added over time. +QuestDB supports two index types: + +| Index type | Syntax | Covering support | Best for | +|------------|--------|-----------------|----------| +| **Bitmap** (default) | `INDEX` or `INDEX TYPE BITMAP` | No | General-purpose, low write overhead | +| **Posting** | `INDEX TYPE POSTING` | Yes (via `INCLUDE`) | Read-heavy workloads, selective queries, wide tables | + +See [Posting index and covering index](/docs/concepts/deep-dive/posting-index/) +for the detailed guide on the posting index and its covering query capabilities. + ## Index creation and deletion The following are ways to index a `symbol` column: @@ -97,6 +107,9 @@ Consider the following query applied to the above table :::warning +Index capacity applies to **bitmap indexes only**. Posting indexes manage +their own storage layout and do not use this setting. + We strongly recommend to rely on the default index capacity. Misconfiguring this property might lead to worse performance and increased disk usage. @@ -114,8 +127,8 @@ When in doubt, reach out via the QuestDB support channels for advice. ::: -When a symbol column is indexed, an additional **index capacity** can be defined -to specify how many row IDs to store in a single storage block on disk: +When a symbol column has a bitmap index, an additional **index capacity** can be +defined to specify how many row IDs to store in a single storage block on disk: - Server-wide setting: `cairo.index.value.block.size` with a default of `256` - Column-wide setting: The diff --git a/documentation/concepts/deep-dive/posting-index.md b/documentation/concepts/deep-dive/posting-index.md new file mode 100644 index 000000000..cb4580d07 --- /dev/null +++ b/documentation/concepts/deep-dive/posting-index.md @@ -0,0 +1,524 @@ +--- +title: Posting index and covering index +sidebar_label: Posting index +description: + The posting index is a compact, high-performance index for symbol columns + that supports covering queries. Learn how it works, when to use it, and + how to optimize queries with INCLUDE columns. +--- + +The **posting index** is an advanced index type for +[symbol](/docs/concepts/symbol/) columns that provides better compression, +faster reads, and **covering index** support compared to the default bitmap +index. + +A **covering index** stores additional column values alongside the index +entries, so queries that only need those columns can be answered entirely from +the index without reading the main column files. + +## When to use the posting index + +Use the posting index when: + +- You frequently filter on a symbol column (`WHERE symbol = 'X'`) +- Your queries select a small set of columns alongside the symbol filter +- You want to reduce I/O by reading from compact sidecar files instead of + full column files +- You need efficient `DISTINCT` queries on a symbol column +- You need efficient `LATEST ON` queries partitioned by a symbol column + +The posting index is especially effective for high-cardinality symbol columns +(hundreds to thousands of distinct values) and wide tables where reading full +column files is expensive. + +## Creating a posting index + +### At table creation + +Inline syntax (index defined alongside the column): + +```questdb-sql +CREATE TABLE trades ( + timestamp TIMESTAMP, + symbol SYMBOL INDEX TYPE POSTING, + exchange SYMBOL, + price DOUBLE, + quantity DOUBLE +) TIMESTAMP(timestamp) PARTITION BY DAY WAL; +``` + +Out-of-line syntax (index defined separately): + +```questdb-sql +CREATE TABLE trades ( + timestamp TIMESTAMP, + symbol SYMBOL, + exchange SYMBOL, + price DOUBLE, + quantity DOUBLE +), INDEX(symbol TYPE POSTING) +TIMESTAMP(timestamp) PARTITION BY DAY WAL; +``` + +### With covering columns (INCLUDE) + +```questdb-sql +CREATE TABLE trades ( + timestamp TIMESTAMP, + symbol SYMBOL INDEX TYPE POSTING INCLUDE (exchange, price), + exchange SYMBOL, + price DOUBLE, + quantity DOUBLE +) TIMESTAMP(timestamp) PARTITION BY DAY WAL; +``` + +The `INCLUDE` clause specifies which columns are stored in the index sidecar +files. Queries that only read these columns plus the indexed symbol column +can be served entirely from the index. + +:::tip + +The designated timestamp column is automatically included in the covering +index — even when no explicit `INCLUDE` clause is given. So a bare +`INDEX TYPE POSTING` already covers `SELECT timestamp, sym FROM t WHERE +sym = 'X'`. The expanded list is what `SHOW CREATE TABLE` round-trips, so +`INCLUDE (exchange, price)` renders back as +`INCLUDE (exchange, price, timestamp)` after creation. Controlled by the +`cairo.posting.index.auto.include.timestamp` server property +(default `true`). + +::: + +:::note + +The `INCLUDE` clause is only supported with inline column syntax and +`ALTER TABLE`. The out-of-line `INDEX(col TYPE POSTING)` syntax does not +support `INCLUDE`. + +Writing `INDEX INCLUDE (...)` (no explicit `TYPE`) is also accepted and +implicitly creates a posting index — `INCLUDE` is only valid with +`POSTING`, so the parser promotes the type for you. + +::: + +### On an existing table + +```questdb-sql +ALTER TABLE trades + ALTER COLUMN symbol ADD INDEX TYPE POSTING INCLUDE (exchange, price); +``` + +### Encoding options + +The posting index supports three row ID encoding options with different +compression and query performance characteristics: + +| Syntax | Encoding | Notes | +|--------|----------|-------| +| `INDEX TYPE POSTING` | Adaptive (default) | Trials delta + Frame-of-Reference and Elias-Fano per key per stride and keeps the smaller output | +| `INDEX TYPE POSTING EF` | Elias-Fano only | Forces Elias-Fano even when delta + FoR would be smaller — useful for benchmarking | +| `INDEX TYPE POSTING DELTA` | Delta + Frame-of-Reference only | Forces delta + FoR even when Elias-Fano would be smaller — useful for benchmarking | + +**Delta + Frame-of-Reference encoding** stores each key's row IDs as +per-key deltas, split into blocks of 64 with per-block Frame-of-Reference +bitpacking. Round-robin or periodic distributions produce constant +deltas (bitwidth 0), so this mode compresses them to near-zero. The +trade-off is a per-key block-header overhead that hurts low-cardinality +keys. + +**Elias-Fano (EF) encoding** is a classic monotonic-sequence encoding: +each key's sorted row IDs are split into low and high bit halves, with +the high half stored as a unary-coded bit array and the low half as a +fixed-width packed array. This typically produces denser output for +keys with few values per stride and avoids the block-header overhead. + +The **adaptive (default)** encoding trial-encodes each key with both +delta + Frame-of-Reference and Elias-Fano per stride and picks whichever +produces the smaller output. This is the right choice for almost all +workloads — the explicit `DELTA` / `EF` variants exist mainly for +benchmarking. + +```questdb-sql +-- Default adaptive encoding (recommended for most workloads) +CREATE TABLE t1 (ts TIMESTAMP, s SYMBOL INDEX TYPE POSTING) + TIMESTAMP(ts) PARTITION BY DAY WAL; + +-- Force Elias-Fano only (benchmarking) +CREATE TABLE t2 (ts TIMESTAMP, s SYMBOL INDEX TYPE POSTING EF) + TIMESTAMP(ts) PARTITION BY DAY WAL; + +-- Force delta + Frame-of-Reference only (benchmarking) +CREATE TABLE t3 (ts TIMESTAMP, s SYMBOL INDEX TYPE POSTING DELTA) + TIMESTAMP(ts) PARTITION BY DAY WAL; +``` + +:::note + +`CAPACITY` is only supported for bitmap indexes. Using `CAPACITY` with a +posting index will produce an error. + +::: + +## Covering index + +The covering index is the most powerful feature of the posting index. When all +columns in a query's `SELECT` list are either: + +- The indexed symbol column itself (from the `WHERE` clause) +- Listed in the `INCLUDE` clause + +...the query engine reads data directly from the index sidecar files, bypassing +the main column files entirely. This is significantly faster for selective +queries on wide tables. + +### Supported column types in INCLUDE + +All column types except the indexed symbol column itself can be included: + +| Type | Compression | Notes | +|------|-------------|-------| +| BOOLEAN, BYTE, GEOBYTE, DECIMAL8 | Frame-of-Reference bitpacking | ≤1 byte per value (worst case) | +| SHORT, CHAR, GEOSHORT, DECIMAL16 | Frame-of-Reference bitpacking | ≤2 bytes per value | +| INT, IPv4, GEOINT, DECIMAL32 | Frame-of-Reference bitpacking | ≤4 bytes per value | +| FLOAT | ALP (Adaptive Lossless floating-Point) | Lossless float compression | +| LONG, DATE, GEOLONG, DECIMAL64 | Frame-of-Reference bitpacking | ≤8 bytes per value | +| TIMESTAMP | Linear-prediction + Frame-of-Reference | Designed for monotonic timestamps | +| DOUBLE | ALP (Adaptive Lossless floating-Point) | Lossless float compression | +| SYMBOL | Frame-of-Reference bitpacking | Stored as integer key, resolved at query time | +| UUID, DECIMAL128 | Raw copy | 16 bytes per value | +| LONG256, DECIMAL256 | Raw copy | 32 bytes per value | +| VARCHAR, STRING | FSST compressed (≥4 KB strides) | Typically 2–5× compression on repetitive text | +| BINARY | Length-prefixed raw bytes | Variable-width, no compression | +| Arrays (DOUBLE[], INT[], etc.) | Length-prefixed raw bytes | Variable-width, no compression | + +### How to choose INCLUDE columns + +Include columns that you frequently select together with the indexed symbol: + +```questdb-sql +-- If your typical queries look like this: +SELECT timestamp, price, quantity FROM trades WHERE symbol = 'AAPL'; + +-- Then include those columns (timestamp is auto-included as designated timestamp): +CREATE TABLE trades ( + timestamp TIMESTAMP, + symbol SYMBOL INDEX TYPE POSTING INCLUDE (price, quantity), + exchange SYMBOL, + price DOUBLE, + quantity DOUBLE, + -- other columns not needed in hot queries + raw_data VARCHAR, + metadata VARCHAR +) TIMESTAMP(timestamp) PARTITION BY DAY WAL; +``` + +:::tip + +Only include columns that appear in your most frequent queries. Each included +column adds storage overhead and slows down writes slightly. Columns not in +the `INCLUDE` list can still be queried — they just won't benefit from the +covering optimization and will be read from column files. + +::: + +### Inspecting indexes with SHOW COLUMNS + +`SHOW COLUMNS` displays index metadata for each column, including the index +type and covered columns: + +```questdb-sql +SHOW COLUMNS FROM trades; +``` + +| column | type | indexed | indexBlockCapacity | symbolCached | symbolCapacity | symbolTableSize | designated | upsertKey | indexType | indexInclude | +|-----------|-----------|---------|--------------------|--------------|----------------|-----------------|------------|-----------|-----------|---------------------------| +| timestamp | TIMESTAMP | false | 0 | false | 0 | 0 | true | false | | | +| symbol | SYMBOL | true | 256 | true | 256 | 0 | false | false | POSTING | exchange,price,timestamp | +| exchange | SYMBOL | false | 256 | true | 256 | 0 | false | false | | | +| price | DOUBLE | false | 0 | false | 0 | 0 | false | false | | | +| quantity | DOUBLE | false | 0 | false | 0 | 0 | false | false | | | + +The `indexType` column shows `POSTING`, `POSTING DELTA`, `POSTING EF`, +`BITMAP`, or is empty for non-indexed columns. The `indexInclude` column +lists covered column names — note the auto-included designated timestamp. + +### Verifying covering index usage + +Use `EXPLAIN` to verify that a query uses the covering index: + +```questdb-sql +EXPLAIN SELECT timestamp, price FROM trades WHERE symbol = 'AAPL'; +``` + +If the covering index is used, the plan shows `CoveringIndex`: + +``` +SelectedRecord + CoveringIndex on: symbol with: timestamp, price + filter: symbol='AAPL' +``` + +`IN`-list filters render as `filter: symbol IN ['AAPL','GOOGL','MSFT']`. +`LATEST ON` queries that hit the covering path show an `op: latest` +annotation and have no `SelectedRecord` wrapper: + +``` +CoveringIndex op: latest on: symbol with: timestamp, price + filter: symbol='AAPL' +``` + +`SELECT DISTINCT` does not need to read covered values, so it shows up as +`PostingIndex op: distinct` rather than `CoveringIndex`: + +``` +PostingIndex op: distinct on: symbol + Frame forward scan on: trades +``` + +When you add a filter on a covered column, an `Async Filter` is layered +above the covering index — the predicate values are read from the sidecar, +not the column file: + +``` +SelectedRecord + Async Filter workers: N + filter: 100 100; +``` + +### IN-list queries + +```questdb-sql +-- Multiple keys, still uses covering index +SELECT price FROM trades WHERE symbol IN ('AAPL', 'GOOGL', 'MSFT'); +``` + +### LATEST ON queries + +```questdb-sql +-- Latest row per symbol, reads from sidecar +SELECT timestamp, symbol, price +FROM trades +WHERE symbol = 'AAPL' +LATEST ON timestamp PARTITION BY symbol; +``` + +### DISTINCT queries + +```questdb-sql +-- Enumerates keys from index metadata, O(keys x partitions) instead of full scan +SELECT DISTINCT symbol FROM trades; + +-- Also works with timestamp filters +SELECT DISTINCT symbol FROM trades WHERE timestamp > '2024-01-01'; +``` + +### COUNT queries + +```questdb-sql +-- Plan: Count over CoveringIndex, no column data read +SELECT COUNT(*) FROM trades WHERE symbol = 'AAPL'; +``` + +### Aggregate queries on covered columns + +```questdb-sql +-- Aggregates over a covered column read from the sidecar instead of +-- the column file +SELECT count(*), min(price), max(price) +FROM trades +WHERE symbol = 'AAPL'; +``` + +## SQL optimizer hints + +Two hints control index usage: + +### no_covering + +Forces the query to read from column files instead of the covering index +sidecar. Useful for benchmarking or when the covering path has an issue. + +```questdb-sql +SELECT /*+ no_covering */ price FROM trades WHERE symbol = 'AAPL'; +``` + +### no_index + +Completely disables index usage, falling back to a full table scan with +filter. Also implies `no_covering`. + +```questdb-sql +SELECT /*+ no_index */ price FROM trades WHERE symbol = 'AAPL'; +``` + +## Trade-offs + +### Storage + +The posting index itself is very compact (~1 byte per indexed value, vs. +~15 bytes per value for the bitmap index). The covering sidecar adds +storage proportional to the included columns: + +- **DOUBLE, FLOAT**: ALP (Adaptive Lossless floating-Point), backed by + Frame-of-Reference bitpacking with an exception list for outliers. +- **TIMESTAMP**: linear-prediction header with Frame-of-Reference residual + bitpacking — designed for monotonic timestamp data. +- **Other fixed-width integer types** (BOOLEAN, BYTE, SHORT, CHAR, INT, + LONG, DATE, IPv4, GEO\*, DECIMAL8–DECIMAL64, SYMBOL keys): + Frame-of-Reference bitpacking sized to the column's natural width, so + the worst case is the column-file byte size and typical case is much + smaller. +- **UUID, LONG256, DECIMAL128, DECIMAL256**: stored raw at full width + with a small count header. +- **VARCHAR, STRING**: FSST-compressed once a stride exceeds 4 KB of raw + data; typically 2–5× smaller than the column file. +- **BINARY and arrays**: length-prefixed raw bytes (no compression). + +### Write performance + +Write overhead depends on the number and type of `INCLUDE` columns: + +- **Posting index without INCLUDE**: ~9% slower than the bitmap index for + the index path itself (delta + Frame-of-Reference encoding vs. simple + append). +- **Posting index with fixed-width INCLUDE**: additional sidecar write cost + proportional to the number of columns; values are batched and compressed + at seal time. +- **Posting index with VARCHAR / STRING / BINARY / ARRAY INCLUDE**: pays + the full variable-width copy cost per row plus an FSST symbol-table + rebuild per seal for VARCHAR / STRING. + +Query performance improvements typically far outweigh the write cost for +read-heavy workloads. + +### Memory + +The posting index uses native memory for encoding/decoding buffers. Each +FSST-compressed `VARCHAR` or `STRING` column carries a ~2.3 KB symbol +table that is loaded alongside the sidecar at read time and easily fits +in L1 cache; per-reader decompression buffers are also small. + +## Architecture + +The posting index stores data in three file types per partition: + +- **`.pk`** — Key file: double-buffered metadata pages with the per-key + generation directory; readers see consistent snapshots via a seqlock + protocol. +- **`.pv`** — Value file: row IDs encoded as either delta + + Frame-of-Reference bitpacking or Elias-Fano (depending on the index's + encoding variant), organised into stride-indexed generations. +- **`.pci` + `.pc0`, `.pc1`, …** — Sidecar files: covered column values + stored alongside the posting list. The single `.pci` header lists the + covered columns by writer index (`PCI1` magic, plus the `coverCount` + used by readers to size their sidecar mappings). Each `.pcN` (with + txn-segment suffix on disk, e.g. `s.pc0.0.0`) holds the encoded data + for one `INCLUDE` column. The auto-included designated timestamp + counts as one of the covered columns and gets its own `.pcN` file. + +### Generations and sealing + +Data is written incrementally as **generations** (one per commit). Each +generation contains a sparse block of key→rowID mappings. Periodically, +generations are **sealed** into a single dense generation with stride-indexed +layout for optimal read performance. + +Sealing happens automatically when the active generation count reaches a +threshold (`cairo.posting.seal.gen.threshold`, default 16) or when a +partition is closed. Sealed data is written stride-by-stride (256 keys per +stride). Within the delta + Frame-of-Reference family, the writer +trial-encodes each stride in two sub-layouts and keeps whichever produces +fewer bytes: + +- **Delta sub-layout** — per-key delta encoding, then per-block + Frame-of-Reference bitpacking. Wins when there are roughly ten or more + values per key, where the delta distribution lets each block use a + small bitwidth. +- **Flat sub-layout** — stride-wide Frame-of-Reference with a single base + and bitwidth, plus a prefix-count array for per-key slicing. Wins when + keys are sparse (roughly eight or fewer values per key) by eliminating + per-key metadata. + +These are internal to delta + Frame-of-Reference and are independent of the +SQL `DELTA` / `EF` encoding variants described above. When the resulting +bitwidth is 8, 16, or 32, decoding uses a native AVX2 fast path; other +bit widths fall back to a Java decoder. + +### FSST compression for strings + +VARCHAR and STRING columns in the INCLUDE list are compressed using FSST +(Fast Static Symbol Table) compression during sealing once a stride exceeds +4 KB of raw data. FSST replaces frequently occurring 1–8 byte patterns +with single-byte codes, typically achieving 2–5× compression on string data +with repetitive patterns. The 2.3 KB symbol table fits in L1 cache and +gives stateless O(1) per-value decode. + +The FSST symbol table is trained per seal and stored inline in the sidecar +file. Decompression is transparent to the query engine. + +## Limitations + +:::warning + +- `INCLUDE` is only supported for the posting index type (not bitmap). + Writing `INDEX TYPE BITMAP INCLUDE (...)` errors with + `INCLUDE is only supported for POSTING index type`. +- `INCLUDE` cannot list the indexed symbol column itself. +- `INCLUDE` is not supported with out-of-line `INDEX(col ...)` syntax — + use inline column syntax or `ALTER TABLE` instead. +- `CAPACITY` is not supported for posting indexes (bitmap only). +- The covering path engages only when the query filters on the indexed + symbol (single key, `IN`-list, or bind variable). Queries without such + a filter — including unfiltered `LATEST ON … PARTITION BY sym`, + unfiltered `SAMPLE BY`, and unfiltered `GROUP BY` — fall back to a + regular page-frame scan. +- `REINDEX` on WAL tables requires dropping and re-adding the index + (this applies to all index types, not just posting). + +::: diff --git a/documentation/concepts/deep-dive/sql-optimizer-hints.md b/documentation/concepts/deep-dive/sql-optimizer-hints.md index 93f207598..7a989a54b 100644 --- a/documentation/concepts/deep-dive/sql-optimizer-hints.md +++ b/documentation/concepts/deep-dive/sql-optimizer-hints.md @@ -358,3 +358,37 @@ your symbol set is high-cardinality. - superseded by `asof_index` - `asof_memoized_search` - superseded by `asof_memoized` + +----- + +## Index hints + +These hints control whether the query optimizer uses indexes (bitmap or posting) +for symbol column lookups. + +### no_covering + +Disables the [covering index](/docs/concepts/deep-dive/posting-index/) +optimization, forcing the query to read from column files instead of the +index sidecar. The index is still used for row ID lookup, but column values +are read from the main column files. + +```questdb-sql +SELECT /*+ no_covering */ price FROM trades WHERE symbol = 'AAPL'; +``` + +This is useful for benchmarking covering index performance or working around +a specific issue with the covering path. + +### no_index + +Completely disables all index usage for the query, including bitmap index, +posting index, and covering index. The query falls back to a full table scan +with a filter applied to every row. Also implies `no_covering`. + +```questdb-sql +SELECT /*+ no_index */ price FROM trades WHERE symbol = 'AAPL'; +``` + +This is useful for benchmarking index effectiveness or forcing a table scan +when you know the filter selectivity is low (many rows match). diff --git a/documentation/concepts/symbol.md b/documentation/concepts/symbol.md index 2e5bf740f..bfd7b0be5 100755 --- a/documentation/concepts/symbol.md +++ b/documentation/concepts/symbol.md @@ -117,6 +117,7 @@ ALTER TABLE trades ALTER COLUMN client_id CACHE; For columns frequently used in `WHERE` clauses, add an index: ```questdb-sql +-- Bitmap index (default) — low overhead, good for most cases CREATE TABLE trades ( timestamp TIMESTAMP, symbol SYMBOL INDEX, @@ -124,10 +125,23 @@ CREATE TABLE trades ( ) TIMESTAMP(timestamp) PARTITION BY DAY; ``` +For read-heavy workloads, a [posting index](/docs/concepts/deep-dive/posting-index/) +offers better compression and supports covering queries: + +```questdb-sql +-- Posting index with covering columns — reads from compact sidecar files +CREATE TABLE trades ( + timestamp TIMESTAMP, + symbol SYMBOL INDEX TYPE POSTING INCLUDE (price), + price DOUBLE +) TIMESTAMP(timestamp) PARTITION BY DAY WAL; +``` + Or add an index later: ```questdb-sql ALTER TABLE trades ALTER COLUMN symbol ADD INDEX; +-- or: ALTER TABLE trades ALTER COLUMN symbol ADD INDEX TYPE POSTING; ``` See [Indexes](/docs/concepts/deep-dive/indexes/) for more information. diff --git a/documentation/configuration/cairo-engine.md b/documentation/configuration/cairo-engine.md index b7f1ba9de..23a98b8e1 100644 --- a/documentation/configuration/cairo-engine.md +++ b/documentation/configuration/cairo-engine.md @@ -275,7 +275,8 @@ causes performance degradation. Must be a power of 2. - **Reloadable**: no Approximation of the number of rows for a single index key. Must be a power -of 2. +of 2. Applies to bitmap indexes only; posting indexes manage their own block +layout. ### cairo.parallel.index.threshold @@ -292,12 +293,43 @@ Minimum number of rows before parallel indexation is used. Enables parallel indexation. Works in conjunction with `cairo.parallel.index.threshold`. +### cairo.posting.index.auto.include.timestamp + +- **Default**: `true` +- **Reloadable**: no + +When `true`, the designated timestamp column is automatically added to the +covering index for any +[posting index](/docs/concepts/deep-dive/posting-index/), including bare +`INDEX TYPE POSTING` declarations with no `INCLUDE` clause. + +### cairo.posting.index.row.id.encoding + +- **Default**: `adaptive` +- **Reloadable**: no + +Default row ID encoding for posting indexes when no encoding variant is +specified. Valid values: `adaptive` (trial-encodes both delta + +Frame-of-Reference and Elias-Fano per stride and picks the smaller), `delta` +(delta + Frame-of-Reference only), `ef` (Elias-Fano only). + +### cairo.posting.seal.gen.threshold + +- **Default**: `16` +- **Reloadable**: no + +Maximum number of unsealed generations per partition before +[posting index](/docs/concepts/deep-dive/posting-index/) sealing is triggered. +Sealing compacts active generations into a single dense generation with a +stride-indexed layout. + ### cairo.spin.lock.timeout - **Default**: `1000` - **Reloadable**: no -Timeout in milliseconds when attempting to acquire BitmapIndexReaders. +Timeout in milliseconds when attempting to acquire index readers (bitmap and +posting). ### cairo.work.steal.timeout.nanos diff --git a/documentation/query/functions/meta.md b/documentation/query/functions/meta.md index dae212033..e02f30f6b 100644 --- a/documentation/query/functions/meta.md +++ b/documentation/query/functions/meta.md @@ -355,14 +355,20 @@ Returns a `table` with the following columns: - `type` - type of the column - `indexed` - if indexing is applied to this column - `indexBlockCapacity` - how many row IDs to store in a single storage block on - disk + disk (bitmap indexes only) - `symbolCached` - whether this `symbol` column is cached - `symbolCapacity` - how many distinct values this column of `symbol` type is expected to have +- `symbolTableSize` - current number of distinct values stored in this + `symbol` column's table - `designated` - if this is set as the designated timestamp column for this table - `upsertKey` - if this column is a part of UPSERT KEYS list for table [deduplication](/docs/concepts/deduplication) +- `indexType` - the [index type](/docs/concepts/deep-dive/indexes/) + (`POSTING`, `POSTING DELTA`, `POSTING EF`, `BITMAP`, or empty) +- `indexInclude` - comma-separated names of columns included in a + [posting index's](/docs/concepts/deep-dive/posting-index/) covering sidecar For more details on the meaning and use of these values, see the [CREATE TABLE](/docs/query/sql/create-table/) documentation. @@ -373,12 +379,12 @@ For more details on the meaning and use of these values, see the table_columns('my_table'); ``` -| column | type | indexed | indexBlockCapacity | symbolCached | symbolCapacity | designated | upsertKey | -| ------ | --------- | ------- | ------------------ | ------------ | -------------- | ---------- | --------- | -| symb | SYMBOL | true | 1048576 | false | 256 | false | false | -| price | DOUBLE | false | 0 | false | 0 | false | false | -| ts | TIMESTAMP | false | 0 | false | 0 | true | false | -| s | VARCHAR | false | 0 | false | 0 | false | false | +| column | type | indexed | indexBlockCapacity | symbolCached | symbolCapacity | symbolTableSize | designated | upsertKey | indexType | indexInclude | +| ------ | --------- | ------- | ------------------ | ------------ | -------------- | --------------- | ---------- | --------- | --------- | ------------ | +| symb | SYMBOL | true | 1048576 | false | 256 | 0 | false | false | BITMAP | | +| price | DOUBLE | false | 0 | false | 0 | 0 | false | false | | | +| ts | TIMESTAMP | false | 0 | false | 0 | 0 | true | false | | | +| s | VARCHAR | false | 0 | false | 0 | 0 | false | false | | | ```questdb-sql title="Get designated timestamp column" SELECT "column", type, designated FROM table_columns('my_table') WHERE designated = true; diff --git a/documentation/query/sql/alter-mat-view-alter-column-add-index.md b/documentation/query/sql/alter-mat-view-alter-column-add-index.md index d8b5d2b27..eb98d3629 100644 --- a/documentation/query/sql/alter-mat-view-alter-column-add-index.md +++ b/documentation/query/sql/alter-mat-view-alter-column-add-index.md @@ -12,6 +12,7 @@ query performance for filtered lookups. ``` ALTER MATERIALIZED VIEW viewName ALTER COLUMN columnName ADD INDEX [ CAPACITY n ] +ALTER MATERIALIZED VIEW viewName ALTER COLUMN columnName ADD INDEX TYPE POSTING ``` ## Parameters @@ -20,7 +21,8 @@ ALTER MATERIALIZED VIEW viewName ALTER COLUMN columnName ADD INDEX [ CAPACITY n | --------- | ----------- | | `viewName` | Name of the materialized view | | `columnName` | Name of the `SYMBOL` column to index | -| `CAPACITY` | Optional index capacity (advanced; use default unless you understand implications) | +| `CAPACITY` | Optional index capacity for bitmap indexes (advanced; use default unless you understand implications) | +| `TYPE POSTING` | Use a [posting index](/docs/concepts/deep-dive/posting-index/) instead of the default bitmap index | ## When to use @@ -30,13 +32,33 @@ Add an index when: - The column has high cardinality (many distinct values) - Query performance on the materialized view needs improvement -## Example +## Examples -```questdb-sql title="Add index to symbol column" +### Adding a bitmap index (default) + +```questdb-sql title="Add bitmap index to symbol column" ALTER MATERIALIZED VIEW trades_hourly ALTER COLUMN symbol ADD INDEX; ``` +### Adding a posting index + +```questdb-sql title="Add posting index to symbol column" +ALTER MATERIALIZED VIEW trades_hourly + ALTER COLUMN symbol ADD INDEX TYPE POSTING; +``` + +:::note + +An explicit `INCLUDE` clause for covering indexes is not currently +accepted on materialized views — the parser rejects it. The view's +designated timestamp is still auto-added, so `INDEX TYPE POSTING` on a +view's symbol column produces a covering index over the timestamp, +which is enough to accelerate `WHERE symbol = … LATEST ON ts` and +similar timestamp-only covering queries. + +::: + ## Behavior | Aspect | Description | diff --git a/documentation/query/sql/alter-table-alter-column-add-index.md b/documentation/query/sql/alter-table-alter-column-add-index.md index b8e6505b9..2df5159f5 100644 --- a/documentation/query/sql/alter-table-alter-column-add-index.md +++ b/documentation/query/sql/alter-table-alter-column-add-index.md @@ -12,13 +12,57 @@ Indexes an existing [`symbol`](/docs/concepts/symbol/) column. ALTER TABLE tableName ALTER COLUMN columnName ADD INDEX; ``` - Adding an [index](/docs/concepts/deep-dive/indexes/) is an atomic, non-blocking, and non-waiting operation. Once complete, the SQL optimizer will start using the new index for SQL executions. -## Example +## Examples + +### Adding a bitmap index (default) -```questdb-sql title="Adding an index" +```questdb-sql ALTER TABLE trades ALTER COLUMN side ADD INDEX; ``` + +### Adding a posting index + +```questdb-sql +ALTER TABLE trades ALTER COLUMN instrument ADD INDEX TYPE POSTING; +``` + +The designated timestamp is auto-included as a covered column even when +no explicit `INCLUDE` clause is given, so the bare form above already +covers `SELECT timestamp, instrument FROM trades WHERE instrument = 'X'`. + +An encoding variant can also be forced: + +```questdb-sql +-- Force delta + Frame-of-Reference (benchmarking) +ALTER TABLE trades ALTER COLUMN instrument ADD INDEX TYPE POSTING DELTA; + +-- Force Elias-Fano (benchmarking) +ALTER TABLE trades ALTER COLUMN instrument ADD INDEX TYPE POSTING EF; +``` + +### Adding a posting index with covering columns + +The `INCLUDE` clause stores additional column values in the index sidecar +files, enabling covering queries that bypass column file reads: + +```questdb-sql +ALTER TABLE trades + ALTER COLUMN symbol ADD INDEX TYPE POSTING INCLUDE (price, quantity); +``` + +The designated timestamp is appended to the `INCLUDE` list automatically. +After this, queries that only select columns from the `INCLUDE` list (plus +the indexed symbol column and designated timestamp) are served from the +index sidecar: + +```questdb-sql +-- This query reads from the index sidecar, not from column files +SELECT timestamp, price FROM trades WHERE symbol = 'AAPL'; +``` + +See [Posting index and covering index](/docs/concepts/deep-dive/posting-index/) +for supported column types and performance details. diff --git a/documentation/query/sql/create-table.md b/documentation/query/sql/create-table.md index 3517d7287..a04398b27 100644 --- a/documentation/query/sql/create-table.md +++ b/documentation/query/sql/create-table.md @@ -641,6 +641,8 @@ must be of type [symbol](/docs/concepts/symbol/). INDEX (columnRef [CAPACITY valueBlockSize]) ``` +### Bitmap index (default) + ```questdb-sql CREATE TABLE trades ( timestamp TIMESTAMP, @@ -650,13 +652,78 @@ CREATE TABLE trades ( ), INDEX(symbol) TIMESTAMP(timestamp); ``` +### Posting index + +The posting index offers better compression and read performance than the +default bitmap index. Use `INDEX TYPE POSTING` with either inline or +out-of-line syntax: + +```questdb-sql +-- Inline syntax +CREATE TABLE trades ( + timestamp TIMESTAMP, + symbol SYMBOL INDEX TYPE POSTING, + price DOUBLE, + amount DOUBLE +) TIMESTAMP(timestamp) PARTITION BY DAY WAL; + +-- Out-of-line syntax +CREATE TABLE trades ( + timestamp TIMESTAMP, + symbol SYMBOL, + price DOUBLE, + amount DOUBLE +), INDEX(symbol TYPE POSTING) +TIMESTAMP(timestamp) PARTITION BY DAY WAL; +``` + +### Posting index with covering columns (INCLUDE) + +The `INCLUDE` clause stores additional column values in the index sidecar +files. Queries that only need these columns plus the indexed symbol can be +served entirely from the index, bypassing column files: + +```questdb-sql +CREATE TABLE trades ( + timestamp TIMESTAMP, + symbol SYMBOL INDEX TYPE POSTING INCLUDE (price, exchange), + exchange SYMBOL, + price DOUBLE, + amount DOUBLE +) TIMESTAMP(timestamp) PARTITION BY DAY WAL; +``` + +The designated timestamp column is automatically included — you do not need +to list it in the `INCLUDE` clause. With this schema, the following query +reads only from the index sidecar: + +```questdb-sql +SELECT timestamp, price FROM trades WHERE symbol = 'AAPL'; +``` + +:::note + +`INCLUDE` is only supported with inline column syntax (not out-of-line +`INDEX(col ...)`). Use `ALTER TABLE` to add covering columns to an existing +table. + +::: + +See [Posting index and covering index](/docs/concepts/deep-dive/posting-index/) +for a comprehensive guide including supported column types, query patterns, +and performance characteristics. + :::warning - The **index capacity** and [**symbol capacity**](/docs/concepts/symbol/) are different settings. - The index capacity value should not be changed, unless a user is aware of all - the implications. ::: + the implications. +- `CAPACITY` is only supported for bitmap indexes — it cannot be used with + posting indexes. + +::: See the [Index concept](/docs/concepts/deep-dive/indexes/#how-indexes-work) for more information about indexes. diff --git a/documentation/query/sql/explain.md b/documentation/query/sql/explain.md index c738e4153..2c6aa154e 100644 --- a/documentation/query/sql/explain.md +++ b/documentation/query/sql/explain.md @@ -78,6 +78,12 @@ The following list contains some plan node types: `INTERSECT`). - `Index forward/backward scan` - scans all row ids associated with a given `symbol` value from start to finish or vice versa. +- `CoveringIndex` - reads data from a + [posting index's](/docs/concepts/deep-dive/posting-index/) covering sidecar + files instead of main column files. Appears when all selected columns are + covered by the `INCLUDE` clause. +- `PostingIndex` - uses a posting index for accelerated operations such as + `DISTINCT` on a symbol column. - `Limit` - standalone node implementing the `LIMIT` keyword. Other nodes can implement `LIMIT` internally, e.g. the `Sort` node. - `Row forward/backward scan` - scans data frame (usually partitioned) records diff --git a/documentation/query/sql/show.md b/documentation/query/sql/show.md index 6976ce129..b71252116 100644 --- a/documentation/query/sql/show.md +++ b/documentation/query/sql/show.md @@ -71,13 +71,19 @@ SHOW TABLES; SHOW COLUMNS FROM trades; ``` -| column | type | indexed | indexBlockCapacity | symbolCached | symbolCapacity | symbolTableSize | designated | upsertKey | -| --------- | --------- | ------- | ------------------ | ------------ | -------------- | --------------- | ---------- | --------- | -| symbol | SYMBOL | false | 0 | true | 256 | 42 | false | false | -| side | SYMBOL | false | 0 | true | 256 | 2 | false | false | -| price | DOUBLE | false | 0 | false | 0 | 0 | false | false | -| amount | DOUBLE | false | 0 | false | 0 | 0 | false | false | -| timestamp | TIMESTAMP | false | 0 | false | 0 | 0 | true | false | +| column | type | indexed | indexBlockCapacity | symbolCached | symbolCapacity | symbolTableSize | designated | upsertKey | indexType | indexInclude | +| --------- | --------- | ------- | ------------------ | ------------ | -------------- | --------------- | ---------- | --------- | --------- | ------------ | +| symbol | SYMBOL | false | 0 | true | 256 | 42 | false | false | | | +| side | SYMBOL | false | 0 | true | 256 | 2 | false | false | | | +| price | DOUBLE | false | 0 | false | 0 | 0 | false | false | | | +| amount | DOUBLE | false | 0 | false | 0 | 0 | false | false | | | +| timestamp | TIMESTAMP | false | 0 | false | 0 | 0 | true | false | | | + +The `indexType` column shows the index type (`POSTING`, `POSTING DELTA`, +`POSTING EF`, `BITMAP`, or empty for non-indexed columns). The +`indexInclude` column lists the names of columns included in a +[posting index's](/docs/concepts/deep-dive/posting-index/) covering +sidecar, as a comma-separated string. ### SHOW CREATE TABLE @@ -102,6 +108,25 @@ CREATE TABLE trades ( WITH maxUncommittedRows=500000, o3MaxLag=600000000us; ``` +#### Posting index with covering columns + +When a symbol column has a posting index with `INCLUDE`, the DDL reflects +the index type and covered columns. The designated timestamp is appended +to the `INCLUDE` list automatically, so a table created with +`INCLUDE (price, exchange)` round-trips as +`INCLUDE (price, exchange, timestamp)`: + +```questdb-sql +CREATE TABLE trades ( + symbol SYMBOL CAPACITY 256 CACHE INDEX TYPE POSTING INCLUDE (price, exchange, timestamp), + exchange SYMBOL CAPACITY 256 CACHE, + price DOUBLE, + amount DOUBLE, + timestamp TIMESTAMP +) timestamp(timestamp) PARTITION BY DAY WAL +WITH maxUncommittedRows=500000, o3MaxLag=600000000us; +``` + #### Per-column Parquet encoding When columns have per-column Parquet encoding or compression overrides, they diff --git a/documentation/schema-design-essentials.md b/documentation/schema-design-essentials.md index 592e9d09f..b88a6c929 100644 --- a/documentation/schema-design-essentials.md +++ b/documentation/schema-design-essentials.md @@ -75,6 +75,47 @@ TIMESTAMP(ts) PARTITION BY MONTH; See [Partitions](/docs/concepts/partitions/) for details. +## Indexing + +Index your primary filter columns to speed up `WHERE` clause queries. QuestDB +supports two index types for SYMBOL columns: + +```questdb-sql +-- Default bitmap index — low overhead, good for most cases +CREATE TABLE trades ( + ts TIMESTAMP, + symbol SYMBOL INDEX, + price DOUBLE +) TIMESTAMP(ts) PARTITION BY DAY WAL; + +-- Posting index with covering columns — best for read-heavy, selective queries +CREATE TABLE trades ( + ts TIMESTAMP, + symbol SYMBOL INDEX TYPE POSTING INCLUDE (price), + price DOUBLE, + raw_data VARCHAR -- not in INCLUDE, read from column files +) TIMESTAMP(ts) PARTITION BY DAY WAL; +-- The designated timestamp (ts) is automatically included in the covering index. +``` + +**When to choose each:** + +| Scenario | Recommendation | +|----------|---------------| +| General purpose, write-heavy | Bitmap index (`INDEX`) | +| Read-heavy, filtering on symbol | Posting index (`INDEX TYPE POSTING`) | +| Frequent queries on a few columns | Posting with `INCLUDE` | +| Wide table, queries select subset | Posting with `INCLUDE` — biggest win | + +The covering index (`INCLUDE`) lets queries that only select covered columns +read from compact sidecar files instead of full column files. The designated +timestamp is automatically included, so timestamp-filtered queries benefit +without explicit listing. Use `EXPLAIN` to verify your queries use the +`CoveringIndex` plan. + +See [Indexes](/docs/concepts/deep-dive/indexes/) and +[Posting index](/docs/concepts/deep-dive/posting-index/) for details. + ## Data types ### SYMBOL vs VARCHAR diff --git a/documentation/sidebars.js b/documentation/sidebars.js index ea45c1353..f4b5fcd67 100644 --- a/documentation/sidebars.js +++ b/documentation/sidebars.js @@ -544,6 +544,7 @@ module.exports = { collapsed: true, items: [ "concepts/deep-dive/indexes", + "concepts/deep-dive/posting-index", "concepts/deep-dive/interval-scan", "concepts/deep-dive/jit-compiler", "concepts/deep-dive/query-tracing",