The canonical reference for the SQL surface SQLRite implements today. Parsing is delegated to sqlparser using the SQLite dialect, so tokens and grammar follow SQLite — execution only implements the subset below, and anything else is rejected with a typed NotImplemented error rather than silently partial behavior.
If you're looking for how to use SQLRite (REPL flow, meta-commands, history, embedding), see Using SQLRite. This document is the strict reference for what statements execute and what semantics they carry.
| Statement | Supported today |
|---|---|
CREATE TABLE |
Columns with PRIMARY KEY / UNIQUE / NOT NULL / DEFAULT <literal>; typed columns; auto-indexes on constrained columns |
CREATE [UNIQUE] INDEX |
Single-column named indexes, IF NOT EXISTS, persisted as cell-based B-Trees |
INSERT INTO |
Auto-ROWID, UNIQUE/PK enforcement, clean type errors, NULL/DEFAULT padding |
SELECT |
* or column list, WHERE, single-column ORDER BY, LIMIT; index probing on col = literal |
UPDATE |
Multi-column SET, WHERE, arithmetic RHS, type + UNIQUE enforcement |
DELETE |
WHERE predicate or whole-table |
ALTER TABLE |
RENAME TO, RENAME COLUMN, ADD COLUMN, DROP COLUMN (one operation per statement) |
DROP TABLE / DROP INDEX |
IF EXISTS; single target; auto-indexes refused for DROP INDEX |
BEGIN / COMMIT / ROLLBACK |
Snapshot-based; single-level; WAL-backed commit; auto-rollback on COMMIT disk failure |
VACUUM |
Compacts the file: rewrites every live B-Tree contiguously from page 1 and clears the freelist. Bare VACUUM; only — no modifiers. |
Statements the parser accepts (because sqlparser understands them in the SQLite dialect) but SQLRite doesn't execute yet return SQL Statement not supported yet. The Not yet supported section below enumerates the common ones.
Every statement above accepts ? placeholders anywhere a value literal is allowed (WHERE, ORDER BY, INSERT VALUES, …). Bind via the public Rust API:
use sqlrite::{Connection, Value};
let mut conn = Connection::open_in_memory()?;
conn.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)")?;
let mut ins = conn.prepare_cached("INSERT INTO users (name, age) VALUES (?, ?)")?;
ins.execute_with_params(&[Value::Text("alice".into()), Value::Integer(30)])?;
ins.execute_with_params(&[Value::Text("bob".into()), Value::Integer(25)])?;
let stmt = conn.prepare_cached("SELECT name FROM users WHERE age > ?")?;
let rows = stmt
.query_with_params(&[Value::Integer(26)])?
.collect_all()?;
# Ok::<(), sqlrite::SQLRiteError>(())- Positional only.
?placeholders are bound by source-order index (params[0]= first?, etc.). Named placeholders (:foo,$1) are not yet supported. - Strict arity. The slice length must match the placeholder count or
query_with_params/execute_with_paramsreturns a typed error. - Vector binding.
Value::Vector(Vec<f32>)binds where a bracket-array literal would normally appear — including the second arg ofvec_distance_*inside an HNSW-eligibleORDER BY. The optimizer recognizes the bound shape, so the graph shortcut still fires. - Plan cache.
Connection::prepare_cachedkeeps a per-connection LRU (default cap 16; tune viaset_prepared_cache_capacity) so a hot SQL string parses exactly once across the connection's lifetime.Connection::preparealways re-parses.
CREATE TABLE <name> (<col> <type> [column_constraint]* [, ...]);| Keyword(s) | Storage class | Notes |
|---|---|---|
INTEGER, INT, BIGINT, SMALLINT |
Integer (i64) | All four alias to the same 64-bit signed storage class |
TEXT, VARCHAR |
Text (String) | UTF-8; no length limit enforced (VARCHAR's (n) is parsed and ignored) |
REAL, FLOAT, DOUBLE, DECIMAL |
Real (f64) | Double-precision; DECIMAL(p,s) precision/scale parsed and ignored |
BOOLEAN |
Boolean | Stored compactly in the null bitmap's sibling bits; accepts TRUE / FALSE |
VECTOR(N) |
Vector (Vec<f32>, fixed dim N) | Phase 7a. Dense f32 array of fixed dimension. N is required and must be ≥ 1. Inserted as bracket-array literals [0.1, 0.2, ...]. Dimension is enforced at INSERT/UPDATE; mismatched-length values are rejected. Distance functions and ANN indexing land in 7b–7d. |
JSON, JSONB |
Text (canonical JSON) | Phase 7e. JSON document stored as canonical UTF-8 text — same as SQLite's JSON1 extension (Q3 scope correction since bincode was removed in Phase 3c). INSERT/UPDATE values are validated via serde_json::from_str; malformed JSON is rejected with a typed error and no row is written. JSONB is accepted as an alias for JSON (PostgreSQL convention; both store as text in our case). Path-style read access via the json_extract / json_type / json_array_length / json_object_keys functions below. |
PRIMARY KEY— one column per table; the column must beINTEGERand gets auto-ROWID behavior (omitted on INSERT → auto-assigned). Auto-creates an index namedsqlrite_autoindex_<table>_<column>.UNIQUE— enforced at INSERT/UPDATE time. Auto-creates an index with the same naming scheme.NOT NULL— rejects NULL at INSERT/UPDATE. Omitted columns on INSERT are NULL by default (or pick up the column'sDEFAULT, if any), so aNOT NULLwithout an INSERT-time value or DEFAULT is an error.DEFAULT <literal>— value substituted when the column is omitted from an INSERT. Accepts integer / real / text / boolean / NULL literals (and unary+/-on numerics). Function-call defaults likeCURRENT_TIMESTAMPand other non-literal expressions are rejected at CREATE TABLE time. ExplicitINSERT ... VALUES (..., NULL, ...)is preserved as NULL — the default only fires for omitted columns (matches SQLite).
- Table-level constraints (
PRIMARY KEY (col1, col2),FOREIGN KEY,CHECK,UNIQUE (col1, col2)) are parsed but ignored. - Multi-column
PRIMARY KEY— only single-column PKs work; a composite PK is accepted by the parser but treated as no PK.
Table 'foo' already exists.— duplicateCREATE TABLE.'sqlrite_master' is a reserved name used by the internal schema catalog— you tried to shadow the catalog table.Column 'foo' appears more than once in the table definition— duplicate column names.PRIMARY KEY column must be INTEGER— PK on a non-integer column.
CREATE [UNIQUE] INDEX [IF NOT EXISTS] <name> ON <table> (<column>);- Single-column only. Composite indexes (
CREATE INDEX ... ON t (a, b)) are parsed but rejected at execution. - The index name is required. Anonymous (
CREATE INDEX ON t (col)) is rejected withanonymous indexes are not supported. - Supported column types:
INTEGER,TEXT.REALandBOOLEANcolumns cannot be indexed yet. CREATE UNIQUE INDEXon a column whose existing rows already carry duplicate values is rejected before any change is made — the table + other indexes stay untouched.IF NOT EXISTS— skips the create if an index with that name already exists. No-op return value in that case.- Indexes persist as their own cell-based B-Trees (see Storage model).
Every PRIMARY KEY and every UNIQUE column gets an auto-index at CREATE TABLE time:
sqlrite_autoindex_<table>_<column>
These are full-citizen indexes — they're visible via .tables-adjacent catalog queries (once those land), persist across saves, and accelerate equality probes. You don't need to CREATE INDEX them yourself.
CREATE INDEX <name> ON <table> USING hnsw (<vector_column>)
[WITH (metric = '<l2|cosine|dot>')];Builds an HNSW approximate-nearest-neighbor index over a VECTOR(N) column. The query optimizer recognizes ORDER BY vec_distance_l2(col, literal) LIMIT k (or the cosine / dot variants) on an HNSW-indexed column whose metric matches the query's distance function, and probes the graph instead of full-scanning. SQLR-23 — the second arg can be either an inline [...] literal or a bound Value::Vector(...) parameter via Statement::query_with_params; the optimizer recognizes both, so prepared-statement KNN queries still take the graph shortcut.
The WITH (metric = '…') clause picks the distance the graph is built for. Three values are recognized: 'l2' (Euclidean — the default, also accepts 'euclidean'), 'cosine', and 'dot' (negated dot-product — also accepts 'inner_product' / 'ip'). Omitting the clause is equivalent to metric = 'l2', so pre-SQLR-28 catalogs round-trip unchanged. The metric is not a query-time choice — the graph topology depends on the metric used during INSERT (neighbour pruning is metric-specific), so a query whose vec_distance_* function doesn't match the index's metric falls through to brute-force rather than getting a wrong answer back from the graph. If you need both L2 and cosine probes on the same column, create two indexes.
- Recall@10 ≥ 0.95 at default parameters (
M=16,ef_construction=200,ef_search=50). TheM/ef_*knobs aren't tunable from SQL yet — see Q2 ofdocs/phase-7-plan.md. - The index is built incrementally on
INSERT.DELETE/UPDATEmark the indexneeds_rebuild; the next save rebuilds from current rows under the same metric. - Persisted as a
KIND_HNSWcell tree alongside the regular page hierarchy — open path loads the graph bit-for-bit, no algorithm runs. The metric travels through the synthesized CREATE INDEX SQL insqlrite_master; no file-format bump. - Without an HNSW index — or with a metric mismatch — the same
ORDER BY vec_distance_… LIMIT kquery still works; it just brute-force-scans every row (Phase 7c's bounded-heap top-k optimization keeps the memory footprint to O(k)).
CREATE INDEX <name> ON <table> USING fts (<text_column>);Builds an FTS5-style inverted index with BM25 ranking over a TEXT column. Pairs with the fts_match and bm25_score functions for keyword retrieval, and composes with vec_distance_* for hybrid retrieval (see docs/fts.md).
- TEXT columns only.
INTEGER,REAL,BOOLEAN,VECTOR,JSONcolumns are rejected at CREATE-INDEX time. - Single-column only. Multi-column FTS is deferred to Phase 8.1.
UNIQUEis rejected — UNIQUE has no meaning for an inverted index.- The index is built incrementally on
INSERT.DELETE/UPDATEmark the indexneeds_rebuild; the next save rebuilds from current rows. - Persisted as a
KIND_FTS_POSTINGcell tree alongside the regular page hierarchy — open path loads posting lists bit-for-bit, no re-tokenization. - The first save of a database with at least one FTS index promotes the file format from v4 to v5 (on-demand bump per Phase 8 plan Q10).
- Tokenizer is ASCII MVP per Q3:
[^A-Za-z0-9]+split, lowercased, no stemming, no stop list. - BM25 parameters are fixed at SQLite FTS5's defaults (
k1 = 1.5,b = 0.75).
INSERT INTO <name> (col1, col2, ...) VALUES (v1, v2, ...)
[, (v1, v2, ...) ...];- Explicit column list is required. Value-list-only inserts (
INSERT INTO t VALUES (...)) are not supported yet. INTEGER PRIMARY KEYauto-ROWID — omit the PK column and a ROWID is auto-assigned (max existing + 1, starting at 1).- Multi-row inserts — the parser accepts
VALUES (...), (...), (...), and SQLRite runs each row through the type + UNIQUE checks in order. A failure mid-batch leaves the already-inserted rows in place. - NULL padding — columns not named in the column list default to NULL.
NOT NULLcolumns must appear in the list (or be the omitted PK). - Type validation happens at INSERT time. A mismatched literal (
INSERT INTO t (age) VALUES ('not-a-number')whereageisINTEGER) is rejected with a typed error — no panic, no partial write. - UNIQUE enforcement runs before any row insert so a failing batch doesn't leave partial state.
| Literal | Example |
|---|---|
| Integer | 42, -5, 0 |
| Real | 3.14, -0.001, 1e10 |
| Text | 'single-quoted' — doubled single quotes escape: 'it''s' |
| Boolean | TRUE, FALSE (case-insensitive) |
| NULL | NULL (case-insensitive) |
| Vector | [0.1, 0.2, 0.3] — JSON-style bracket-array; integer elements widen to f32 ([1, 2, 3] is valid). For VECTOR(N) columns; dimension must match the declared N. (Phase 7a) |
Hex literals, blob literals, and date/time functions are not supported.
SELECT [DISTINCT] {* | <projection_item>[, <projection_item>, ...]}
FROM <table> [AS <alias>]
[{INNER | LEFT [OUTER] | RIGHT [OUTER] | FULL [OUTER]} JOIN <table> [AS <alias>] ON <expr>]*
[WHERE <expr>]
[GROUP BY <col>[, <col>, ...]]
[ORDER BY <expr> [ASC|DESC]]
[LIMIT <non-negative-integer>];<projection_item> is one of:
<column> -- bare column reference
COUNT(*) -- counts every row, including all-NULL ones
COUNT([DISTINCT] <column>) -- counts non-NULL values, optionally deduping
{SUM | AVG | MIN | MAX}(<column>) -- aggregate over a single column
<projection_item> AS <alias> -- optional column alias
- Projection:
*(all columns in declaration order), a bare column list, or an explicit list mixing bare columns and aggregate calls. Each item can carry an optionalAS alias(the alias becomes the output column header and is recognized byORDER BY). WHERE: any expression. Evaluated per row; NULL-as-false in WHERE context (three-valued logic collapsed to two-valued for filtering). IncludesIS NULL/IS NOT NULLfor explicit null tests,LIKE/NOT LIKE/ILIKEfor pattern matching, andIN (list) / NOT IN (list)for set-membership against literal lists.DISTINCT:SELECT DISTINCTdeduplicates result rows after projection (and after aggregation, when both apply).NULLvalues compare equal to otherNULLs for dedupe, matching SQL's DISTINCT semantic.GROUP BY: one or more bare column names. Every non-aggregate item in the projection must appear in theGROUP BYlist (the parser rejects the violation with a clear message).GROUP BY <col>without any aggregate behaves like an implicitDISTINCT <col>.- Aggregates (SQLR-3):
COUNT(*),COUNT(col),COUNT(DISTINCT col),SUM(col),AVG(col),MIN(col),MAX(col).SUMover an integer column staysINTEGERuntil aREALinput arrives or the running sum overflowsi64(one-time promotion toREAL).AVGalways returnsREAL(orNULLon empty / all-NULL groups).MIN/MAXskip NULLs and use the same total order asORDER BY. Aggregates over an empty table or empty group return0forCOUNT(*)/COUNT(col)andNULLfor the rest. ORDER BY: single sort key,ASC(default) orDESC. For non-aggregating queries the key is any expression — including function calls — so KNN queries likeORDER BY vec_distance_l2(embedding, [...]) LIMIT kwork end-to-end (Phase 7b). For aggregating queries the key resolves against the output row by name: a bare identifier matches an alias or aGROUP BYcolumn, and a function call likeCOUNT(*)matches an aggregate projection by its canonical display form. Sort key types must match across rows.LIMIT: non-negative integer literal.LIMIT 0is valid (returns zero rows). WhenDISTINCTis in play,LIMITis applied after deduplication so it counts unique rows.
Four flavors are supported, all with explicit ON conditions:
| Flavor | Keeps unmatched rows from… |
|---|---|
INNER JOIN |
…neither side. Only ON-matched pairs survive. |
LEFT [OUTER] JOIN |
…the left side; right-side columns become NULL for unmatched left rows. |
RIGHT [OUTER] JOIN |
…the right side; left-side columns become NULL for unmatched right rows. |
FULL [OUTER] JOIN |
…both sides, NULL-padded on the unmatched side. |
- Engine choice: SQLite ships only
INNERandLEFT OUTER. SQLRite implements all four because the per-flavor differences boil down to NULL-padding policy on top of one shared nested-loop driver — addingRIGHT/FULLwas effectively free once the executor had a multi-table scope. Seedocs/design-decisions.mdfor the rationale. - Aliases:
FROM customers AS c INNER JOIN orders AS o ON c.id = o.customer_id. When an alias is supplied the original table name leaves scope (SQL standard) — qualifier resolution uses the alias. - Qualified column references:
<table>.<col>and<alias>.<col>resolve to that specific side. Bare<col>references must resolve to exactly one in-scope table; ambiguous references error with a "qualify it as<table>.col" hint. - Output of
SELECT *over a join is every column of every in-scope table, in source order. Duplicate header names are permitted (SQLite-style). Disambiguate with explicitSELECT t.col AS t_col, u.col AS u_col. - Multi-join chains left-fold:
A JOIN B ON ... JOIN C ON ...evaluates as(A ⨝ B) ⨝ C. Each new clause sees every prior alias / table in itsONexpression. - Self-joins require an alias on at least one side:
FROM nodes AS p INNER JOIN nodes AS c ON p.id = c.parent_id. Without one, you get aduplicate table referenceerror so qualifiers stay unambiguous. WHEREruns after joins. AWHERE right.col IS NULLfilter on aLEFT JOINcorrectly returns left rows with no match (the standard "anti-join via outer-join" idiom).ORDER BYandLIMITapply to the fully joined row stream.- Algorithm: plain nested-loop join, O(N×M) per join level. Adequate for an embedded learning database; hash / merge joins on equi-join shapes are a future optimization.
JOIN ... USING (col)andNATURAL JOIN— explicitONonly. (Both are deferred —USINGis straightforward but adds a column-resolution rule we haven't needed yet.)CROSS JOIN(writeINNER JOIN ... ON trueinstead) and comma-separated FROM lists.- Aggregates /
GROUP BY/DISTINCTover a join. The single-table aggregator is wired against one rowid stream; rewiring it for joined rows is a separate increment. Surfaces as a cleanNotImplementedat parse time. fts_match/bm25_scoreinside a JOIN expression. They need to look up an FTS index by column, which is single-table-bound today. Use them on a single-table SELECT first, or fold the FTS lookup into the FROM side.
The executor includes a tiny optimizer: if the WHERE is exactly <indexed_col> = <literal> or <literal> = <indexed_col>, it probes the index and scans only matching rows. Mixed predicates (WHERE a = 1 AND b > 2), range predicates (WHERE a > 1), and OR-combined predicates fall back to a full table scan. Aggregating queries (GROUP BY / aggregate functions) skip the rowid-shape optimizations (HNSW / FTS / bounded-heap top-k) since every matching row contributes to its group.
%matches any (possibly empty) char sequence;_matches exactly one char.\escapes the next character so\%matches a literal percent. Outside\%/\_/\\, a backslash is itself a literal — matching SQLite's loose default.- Case folding is ASCII-only and on by default, mirroring SQLite's default
PRAGMA case_sensitive_like = OFF.LIKE 'a%'matches bothAppleandapple. Non-ASCII characters compare by code point (no Unicode case folding). LIKE … ESCAPE '<char>'is not supported.LIKE ANY (...)is not supported.NULL LIKE 'pattern'evaluates toNULL; in aWHEREthat excludes the row.
- Only the literal-list form is supported:
WHERE x IN (1, 2, 3)andWHERE x NOT IN (...). - Three-valued logic: if the LHS is
NULL, the result isNULL; if the RHS list contains aNULLand no other entry matches, the result isNULL. In aWHEREboth cases collapse to "row excluded", matching SQLite. IN (subquery),IN UNNEST(...), andBETWEENare not supported yet.
CROSS JOIN, comma-separated FROM lists,NATURAL JOIN,JOIN ... USING (col)— explicitINNER/LEFT/RIGHT/FULL OUTER JOIN ... ON ...only (see JOIN semantics)- Aggregates /
GROUP BY/DISTINCTover a JOIN — pipe through a subquery once subqueries land - Subqueries, CTEs (
WITH), views HAVING— pre-aggregationWHEREworks; post-aggregation filtering does not yetDISTINCTonSUM/AVG/MIN/MAX(onlyCOUNT(DISTINCT col)is supported)GROUP BYon expressions — bare column names only in v1LIKE … ESCAPE '<char>',IN (subquery),BETWEEN,GLOB,REGEXP- Expressions in the projection list beyond aggregate calls (
SELECT age + 1 FROM usersis still rejected; aggregates are the one allowed expression form) - Multi-column
ORDER BY,NULLS FIRST/LAST(single sort key only) OFFSET- Window functions (
OVER (...),FILTER (WHERE ...),WITHIN GROUP)
Any of the above reaches the executor as a parsed AST node that execution doesn't handle, producing either NotImplemented or a more specific error (e.g., joins are not supported).
UPDATE <table> SET col1 = <expr> [, col2 = <expr>]* [WHERE <expr>];- Multi-column
SET— separate assignments with commas. - RHS is a full expression — can reference other columns of the same row:
UPDATE users SET age = age + 1, updated_at = 'now' WHERE id = 42;
- Type enforcement — the declared column type of each target is checked against the assigned expression's result. Mismatch is a clean error; the row (and all other rows that would have been updated by the same statement) stays untouched.
- UNIQUE enforcement — if the update would collide with another row's value on a UNIQUE / PRIMARY KEY column, the whole statement is rejected before any write. No partial updates.
- NULL assignments respect
NOT NULL—SET col = NULLon aNOT NULLcolumn errors.
DELETE FROM <table> [WHERE <expr>];- No
WHEREdeletes every row (tables and indexes are preserved; only row data is removed). WHEREuses the same expression evaluator asSELECT.- Secondary indexes are updated alongside the row deletes so a subsequent
WHERE col = ...doesn't return stale hits.
ALTER TABLE [IF EXISTS] <table> RENAME TO <new_table>;
ALTER TABLE [IF EXISTS] <table> RENAME COLUMN <old_col> TO <new_col>;
ALTER TABLE [IF EXISTS] <table> ADD COLUMN <col_def>;
ALTER TABLE [IF EXISTS] <table> DROP COLUMN <col>;One operation per statement (SQLite-style). ALTER TABLE foo RENAME TO bar, ADD COLUMN x ... is rejected — issue separate statements instead.
- Reserved-name rejection: cannot rename to
sqlrite_master. - Errors if the target name is already a table.
- Auto-indexes whose names embed the old table name (
sqlrite_autoindex_<old>_<col>) are renamed in lockstep so the schema catalog stays consistent. Explicit indexes carry their user-given name unchanged.
- Errors if the old column doesn't exist or the new name already exists in the table.
- Re-keys the row storage and updates every dependent index (auto + explicit, secondary / HNSW / FTS) — including auto-index name regeneration.
- Renaming the PRIMARY KEY column is allowed; the table's
primary_keypointer follows the new name.
- The column definition reuses the same parser that handles CREATE TABLE columns: same types, same
NOT NULL/DEFAULTsemantics. - Rejected:
PRIMARY KEYandUNIQUEconstraints on the added column. Both would require backfilling the column under uniqueness constraints against existing rows; that path will land alongside multi-column UNIQUE. NOT NULLon a non-empty table requiresDEFAULT. Without one there's no value to backfill existing rowids with. Same rule SQLite applies.- With a
DEFAULT, every existing rowid is backfilled with the default value at ADD COLUMN time. Without aDEFAULT, existing rowids read as NULL for the new column.
- Rejected: dropping the PRIMARY KEY column.
- Rejected: dropping the only remaining column (degenerate table).
- Cascades to every dependent index (auto + explicit, secondary / HNSW / FTS) on the dropped column.
CASCADE/RESTRICTmodifiers are accepted by the parser and ignored — SQLite has no real distinction here either.
DROP TABLE [IF EXISTS] <table>;- Single target per statement.
DROP TABLE a, b, c;is parsed but rejected with a NotImplemented error. - Reserved-name rejection:
DROP TABLE sqlrite_mastererrors with the same messageCREATE TABLEuses. - All indexes attached to the table (auto, explicit, HNSW, FTS) disappear with the table — they live inside the
Tablestruct and ride along. - Without
IF EXISTS, dropping a table that doesn't exist errors. With it, that's a benign 0-tables-dropped no-op. - Disk pages move onto the freelist. Pages the dropped table occupied are pushed onto a persisted free-page list (SQLR-6) so subsequent
CREATE TABLEor inserts can reuse them. The file shrinks automatically when the freelist crosses 25% ofpage_count(SQLR-10 auto-VACUUM, default-on); embedders that need the prior "manualVACUUM;only" behavior can callConnection::set_auto_vacuum_threshold(None)at open time.
DROP INDEX [IF EXISTS] <index_name>;- Single target per statement.
- Walks every table searching for an index with the given name across the secondary B-Tree, HNSW, and FTS index families.
- Refuses to drop auto-indexes.
sqlrite_autoindex_*names are constraint-bound to the column they index — the only way to remove them is to drop the underlying column or table. Same rule SQLite enforces for itssqlite_autoindex_*indexes. IF EXISTSmakes a missing index a benign no-op.
Expressions work inside WHERE (both in SELECT, UPDATE, DELETE) and on the right-hand side of UPDATE's SET.
| Category | Operators |
|---|---|
| Comparison | =, <>, <, <=, >, >= |
| Null tests | IS NULL, IS NOT NULL |
| Pattern | LIKE, NOT LIKE, ILIKE (%, _, \-escape; case-insensitive ASCII) |
| Set | IN (list), NOT IN (list) (literal lists only) |
| Logical | AND, OR, NOT |
| Arithmetic | +, -, *, /, % |
| String | || (concatenation) |
| Unary | +, - |
| Grouping | Parentheses |
Same set accepted by INSERT (see Value literals accepted).
| Function | Returns | Notes |
|---|---|---|
vec_distance_l2(a, b) |
Real (f64) | Euclidean distance √Σ(aᵢ−bᵢ)². Smaller is closer. (Phase 7b) |
vec_distance_cosine(a, b) |
Real (f64) | Cosine distance 1 − (a·b) / (‖a‖·‖b‖). Errors on zero-magnitude vectors (cosine is undefined). Smaller is closer; identical vectors return 0.0, orthogonal vectors return 1.0. (Phase 7b) |
vec_distance_dot(a, b) |
Real (f64) | Negated dot product −(a·b). Negation makes "smaller is closer" consistent with the others. For unit-norm vectors equals vec_distance_cosine(a, b) - 1. (Phase 7b) |
json_extract(json, path) |
Depends on the resolved node | Walks path over json and returns the resolved value coerced to the closest SQL type — JSON strings → TEXT, numbers → INTEGER / REAL, booleans → BOOLEAN, null → NULL, and composites (object / array) → their canonical JSON-text serialization. Path defaults to $ when only one argument is supplied. A path that doesn't resolve returns NULL. (Phase 7e) |
json_type(json[, path]) |
Text | One of 'object', 'array', 'string', 'integer', 'real', 'true', 'false', 'null'. Path defaults to $. (Phase 7e) |
json_array_length(json[, path]) |
Integer | Number of elements in the JSON array at path. Errors if the resolved node is not an array. Path defaults to $. (Phase 7e) |
json_object_keys(json[, path]) |
Text (JSON-array string) | Returns the object's keys as a JSON-array text in insertion order — e.g. '["a","b","c"]'. Path defaults to $. Diverges from SQLite, which exposes keys as a table-valued function (one row per key). SQLRite has no set-returning functions yet, so we return the keys as a JSON array and let callers parse if needed. (Phase 7e) |
fts_match(col, 'q') |
Bool | True iff the row contains at least one tokenized query term in col. Requires an FTS index on col; errors otherwise. Tokenization rules: ASCII split + lowercase, no stemming, no stop-list. Multi-token queries use any-term (OR) semantics. (Phase 8b) |
bm25_score(col, 'q') |
Real (f64) | Per-row BM25 relevance score for the given query. Higher is more relevant. Requires an FTS index on col. Pairs with fts_match in the canonical WHERE … ORDER BY bm25_score(...) DESC LIMIT k shape, which the optimizer probes via the inverted index instead of scanning rows. (Phase 8b) |
All three vector-distance functions take exactly two arguments, both of which must be vectors of the same dimension. Either argument can be a column reference (embedding), a bracket-array literal ([0.1, 0.2, 0.3]), or any sub-expression that evaluates to a vector. Mismatched dimensions error with vector dimensions don't match (lhs=N, rhs=M).
The KNN ranking pattern that motivates this set:
SELECT id, title FROM docs
ORDER BY vec_distance_l2(embedding, [0.1, 0.2, ..., 0.0])
LIMIT 10;Operator forms (
<-><=><#>) are not supported yet. They're the de facto pgvector convention but blocked on a sqlparser limitation — will land as a Phase 7b.1 follow-up. Use the function-call form for now.
The json_* functions accept a string path argument with a small subset of JSONPath:
| Token | Meaning |
|---|---|
$ |
Root of the document (default if path is omitted). |
.key |
Object member access. Bare keys only — no quoted-string variant yet. |
[N] |
Array index (0-based). Negative indices are not supported. |
Tokens chain naturally: $.user.tags[0], $[2].name, $.matrix[1][0]. A malformed path (unbalanced brackets, missing $) errors at runtime with a typed message; a well-formed path that simply doesn't resolve returns NULL.
CREATE TABLE events (id INTEGER PRIMARY KEY, payload JSON);
INSERT INTO events (payload) VALUES
('{"user": {"name": "alice", "tags": ["admin", "ops"]}, "score": 42}'),
('{"user": {"name": "bob", "tags": []}, "score": 7}');
SELECT id,
json_extract(payload, '$.user.name') AS name,
json_extract(payload, '$.user.tags[0]') AS first_tag,
json_array_length(payload, '$.user.tags') AS tag_count,
json_type(payload, '$.score') AS score_type
FROM events
WHERE json_extract(payload, '$.user.name') = 'alice';CREATE TABLE docs (id INTEGER PRIMARY KEY, body TEXT);
INSERT INTO docs (body) VALUES ('rust embedded database');
INSERT INTO docs (body) VALUES ('postgres relational database server');
CREATE INDEX docs_fts ON docs USING fts (body);
-- Lexical filter (Boolean predicate).
SELECT id FROM docs WHERE fts_match(body, 'database');
-- Top-k by BM25 relevance — the optimizer probes the inverted index
-- when WHERE / ORDER BY share the same query string and direction.
SELECT id FROM docs
WHERE fts_match(body, 'embedded database')
ORDER BY bm25_score(body, 'embedded database') DESC
LIMIT 5;See docs/fts.md for the canonical FTS reference (tokenizer rules, BM25 parameters, persistence, hybrid retrieval with vec_distance_*).
- Integer-only ops stay integer.
1 + 2→3(Integer). - Any
REALoperand promotes tof64.1 + 2.0→3.0(Real). - Divide/modulo by zero returns a typed runtime error rather than panicking:
division by zerofor/and%. TEXTin arithmetic context errors —'hello' + 1is not silently coerced.
SQLRite follows standard SQL three-valued logic:
- Comparisons involving NULL (
NULL = 1,1 < NULL) evaluate to unknown, which behaves asfalseinsideWHERE. Neither the NULL = NULL equality nor the NULL <> NULL inequality is true — useIS NULL/IS NOT NULLfor explicit null tests (SELECT … WHERE col IS NULL). NULLs are not stored in secondary, HNSW, or FTS indexes, soIS NULLalways falls through to a full scan; that's correct, just not as fast as an indexed equality probe. - Logical operators with NULL:
NULL AND false→false,NULL AND true→NULL,NULL OR true→true,NOT NULL→NULL. The short-circuit rules prevent NULL from propagating when one operand already decides the result. - Arithmetic with NULL: any operand NULL → result NULL.
NULL + 1→NULL. - String concat with NULL:
'foo' || NULL→NULL(same propagation as arithmetic).
- Keywords (
SELECT,FROM,AND,TRUE,NULL, …) are case-insensitive.select,SELECT,SeLeCtall parse. - Identifiers (table names, column names) are case-sensitive — no normalization is applied at definition or lookup time.
CREATE TABLE Users (…)followed bySELECT * FROM usersfails withTable doesn't exist. (This is the opposite of SQLite's default; we'll revisit once the cursor refactor in Phase 5 lands.) - String literals preserve case:
'Alice'staysAlice.
BEGIN;
INSERT INTO users (name) VALUES ('alice');
UPDATE counters SET n = n + 1 WHERE name = 'signups';
COMMIT;Or:
BEGIN;
DELETE FROM users WHERE banned = TRUE;
ROLLBACK; -- nothing was actually deletedBEGINdeep-clones the in-memory database into a snapshot held ondb.txn. Auto-save is suppressed while the transaction is open — mutations accumulate in memory.COMMITflushes every accumulated change to the WAL in one atomic commit frame and drops the snapshot. Readers of the file after COMMIT see all of the transaction's changes at once.ROLLBACKreplaces the live state with the snapshot and drops the snapshot. Nothing hits disk.
- Nested
BEGINis rejected witha transaction is already open. No savepoints yet. BEGINon a read-only database (sqlrite --readonly foo.sqlrite) is rejected withcannot execute: database is opened read-only.- Runtime errors mid-transaction do NOT auto-rollback. If an
INSERTfails inside a transaction (UNIQUE violation, type mismatch, bad syntax), the transaction stays open. The caller decides whether toROLLBACKorCOMMITwhatever succeeded before the failure. COMMIT's disk write failing DOES auto-rollback. If the save at COMMIT time errors (disk full, permission denied, checksum mismatch), SQLRite restores the pre-BEGIN snapshot and surfacesCOMMIT failed — transaction rolled back: <underlying error>. Leaving in-flight mutations live after a failed COMMIT would be unsafe — any subsequent non-transactional statement's auto-save would silently publish partial work.- Cost:
BEGINisO(N)in the total size of the in-memory database because of the snapshot clone. On a huge database, opening a transaction just to run a single read-only query is wasteful — use a plainSELECTinstead. - Visibility to other processes: with POSIX file locks (Phase 4a–4e), a writer excludes all concurrent readers anyway, so "uncommitted transaction state leaking to a concurrent reader" isn't a concern — no concurrent reader exists during an open transaction.
VACUUM;Compacts the database file: rewrites every live table, index, HNSW graph, FTS posting tree, and sqlrite_master itself contiguously from page 1, drops the freelist, and lets the next checkpoint truncate the tail.
- Bare
VACUUM;only. Modifiers —VACUUM FULL,VACUUM REINDEX, table targets,TO ... PERCENT,BOOST— are parsed (sqlparser supports them) but rejected at execution withVACUUM modifiers (FULL, REINDEX, table targets, etc.) are not supported. - Refused inside a transaction.
BEGIN; VACUUM;errors withVACUUM cannot run inside a transaction. UseCOMMIT;first, thenVACUUM;. - No-op on in-memory databases. Returns a
VACUUM is a no-op for in-memory databasesstatus string and does nothing — there's no file to compact. - Status string carries pages and bytes reclaimed:
VACUUM completed. <N> pages reclaimed (<B> bytes). - Format-version side effect. A v4/v5 file that has been promoted to v6 by an earlier drop stays at v6 after VACUUM (v6 is a strict superset; we don't downgrade). A file that's already at v4/v5 because no drop ever happened on it doesn't get bumped by VACUUM.
When to run it: any time after a string of DROP TABLE / DROP INDEX / ALTER TABLE DROP COLUMN operations if you care about file size. SQLRite reuses freelist pages on subsequent inserts, so a write-heavy workload may not need VACUUM at all — its main use is reclaiming space when you don't expect to grow back.
Manual VACUUM; is rarely needed in practice: by default, every page-releasing DDL (DROP TABLE, DROP INDEX, ALTER TABLE DROP COLUMN) checks the freelist after committing and runs vacuum_database automatically when the freelist exceeds 25% of page_count (SQLite parity). The trigger:
- skips databases under 16 pages (64 KiB) so tiny files don't churn,
- skips inside an explicit transaction (the freelist isn't accurate until
COMMIT), - skips on in-memory and read-only databases.
The threshold is tunable per-connection from Rust:
let mut conn = Connection::open("db.sqlrite")?;
conn.set_auto_vacuum_threshold(Some(0.5))?; // fire only when freelist > 50%
conn.set_auto_vacuum_threshold(None)?; // disable entirely (manual VACUUM only)…or via SQL (SQLR-13), which is the path SDK / FFI / MCP consumers reach for since they can't call the Rust setter directly:
PRAGMA auto_vacuum; -- read; renders a single-row result set
PRAGMA auto_vacuum = 0.5; -- arm the trigger at 50%
PRAGMA auto_vacuum = 0; -- arm at 0% (compact on any released page)
PRAGMA auto_vacuum = OFF; -- disable; equivalent: NONE, 'OFF', 'NONE'Out-of-range values (anything outside 0.0..=1.0, NaN, ±∞) and unknown identifiers like WAL / FULL are rejected with a typed error — the trigger never silently saturates or falls back to a default. The setting is per-Connection runtime state — it's not persisted in the file header, so every reopen starts at the default Some(0.25).
The full Phase 11 user-facing reference — conceptual model, embedding API, SDK error mapping, REPL meta-commands, durability story, limitations — lives at
docs/concurrent-writes.md. This section is the SQL-syntax reference.
Selects the per-database concurrency model. wal (default) is the legacy WAL-backed pager every pre-Phase-11 build used; mvcc opts the database into multi-version concurrency control (Phase 11 — concurrent writes via BEGIN CONCURRENT).
PRAGMA journal_mode; -- read; renders a single-row "wal" or "mvcc"
PRAGMA journal_mode = mvcc; -- opt into MVCC for this database
PRAGMA journal_mode = wal; -- switch back (rejected if the MvStore
-- already carries committed versions)Case-insensitive on both the pragma name and the value. Quoted values ('mvcc') work; numeric values are rejected (the field is enum-shaped). Unknown modes return a typed error and don't disturb the existing setting.
The setting is per-database — every Connection::connect sibling sees the same value. Reachable through the public API as Connection::journal_mode() -> JournalMode.
For the conceptual walkthrough (version chains, snapshot-isolation visibility, the WAL log-record durability story, REPL
.spawndemos), seedocs/concurrent-writes.md. This section is the SQL-syntax reference.
Opens a transaction that doesn't acquire the engine's single-writer lock — multiple BEGIN CONCURRENT transactions can coexist, on the same Connection or across sibling Connection::connect handles. Writes accumulate against a per-transaction snapshot; at COMMIT, the engine validates the write-set against any versions that committed after the transaction's begin_ts and aborts with SQLRiteError::Busy if some other transaction superseded a row.
PRAGMA journal_mode = mvcc; -- opt the database into MVCC
BEGIN CONCURRENT;
UPDATE accounts SET balance = balance - 50 WHERE id = 1;
UPDATE accounts SET balance = balance + 50 WHERE id = 2;
COMMIT; -- may return Busy → caller retriesRetry shape (Rust):
loop {
conn.execute("BEGIN CONCURRENT")?;
conn.execute("UPDATE accounts SET balance = balance - 50 WHERE id = 1")?;
conn.execute("UPDATE accounts SET balance = balance + 50 WHERE id = 2")?;
match conn.execute("COMMIT") {
Ok(_) => break,
Err(e) if e.is_retryable() => continue,
Err(e) => return Err(e),
}
}SQLRiteError::is_retryable() covers both Busy and BusySnapshot. Use it in retry helpers rather than matching the variants individually so adding a third retryable variant later doesn't break callers.
Requirements + restrictions (v0):
- Database must be in
journal_mode = mvccfirst. PlainBEGIN CONCURRENTagainst aWal-mode database returns a typed error. - DDL (
CREATE TABLE/CREATE INDEX/DROP TABLE/DROP INDEX/ALTER TABLE/VACUUM) is rejected insideBEGIN CONCURRENT— the typed error keeps the transaction open so the caller canROLLBACK. - Nested
BEGIN CONCURRENT(or plainBEGINinside an openBEGIN CONCURRENT) is rejected with a typed error. - Reads inside the transaction see the BEGIN-time snapshot through every public read path:
Connection::execute("SELECT …")andStatement::query()/Statement::query_with_params()(the prepared-statement path). Phase 11.5 closed the prepare/query gap by routing the read side through a per-connectionMutex<Option<ConcurrentTx>>+with_snapshot_readhelper. - Tables touched by writes inside
BEGIN CONCURRENTshould not carry FTS / HNSW indexes — the per-row commit-apply path only maintains B-tree secondary indexes today. PlainWHERE col = literalindex probing still works on the post-commit live database. AUTOINCREMENT-bearing INSERTs are not specifically guarded; two concurrent INSERTs that each allocate the same rowid surface as aBusyat the second commit. The plan's "reject AUTOINCREMENT under MVCC" gate is a clean follow-up.
A REPL launched with sqlrite --readonly foo.sqlrite (or sqlrite::open_database_read_only(path, name) programmatically) takes a shared POSIX advisory lock instead of an exclusive one. In that mode:
SELECTworks normally.- Every write statement (
INSERT,UPDATE,DELETE,CREATE TABLE,CREATE INDEX) is rejected before touching memory withcannot execute: database is opened read-only. The in-memory state never diverges from disk. BEGINis rejected.- Multiple read-only openers of the same file coexist (shared flock). Any read-write opener blocks all read-only openers and vice versa — POSIX's "many readers OR one writer, not both" semantics.
- One statement per call —
process_command/Connection::executeexpects a single statement. Multi-statement strings ("INSERT …; INSERT …;") are rejected withExpected a single query statement, but there are N. For multi-statement execution, use the SDK'sexecutescript/execute_batchhelpers (Phases 5c/5d). - Trailing semicolons are optional. Both
SELECT 1andSELECT 1;parse. - Empty / comment-only input is a benign no-op — no error, no auto-save triggered.
- Multi-line statements work. The REPL (via rustyline) buffers continuation lines until a terminating semicolon is seen.
For context when you hit NotImplemented. See Roadmap for when these land:
CROSS JOIN, comma joins,NATURAL JOIN,JOIN ... USING— explicitINNER/LEFT/RIGHT/FULL OUTER JOIN ... ON ...works (SQLR-5); the others don't- Aggregates /
GROUP BY/DISTINCTover a JOIN — pipe through a subquery once subqueries land fts_match/bm25_scoreinside a JOIN expression — single-table-bound today- Subqueries (scalar,
IN (SELECT ...), correlated) - CTEs (
WITH), recursive CTEs - Views (
CREATE VIEW)
HAVING— pre-aggregationWHEREworks; post-aggregation filtering doesn't yetDISTINCTonSUM/AVG/MIN/MAX(onlyCOUNT(DISTINCT col)is supported)GROUP BYon expressions — bare column names only- Other aggregate functions (
GROUP_CONCAT,STRING_AGG, …) — onlyCOUNT/SUM/AVG/MIN/MAXare wired
GLOB,REGEXPBETWEENLIKE … ESCAPE '<char>'— bareLIKE/NOT LIKE/ILIKEwork; the explicit-escape form doesn'tIN (subquery),IN UNNEST(...)— only literal lists are supportedCASE WHEN ... THEN ... END- Expressions in the
SELECTprojection list — aggregate calls are the one allowed expression form (SELECT age + 1 FROM usersis rejected) - Built-in functions (
LENGTH,UPPER,LOWER,COALESCE,IFNULL, date/time,printf, …)
ALTER TABLEextras: multi-operation (ALTER TABLE foo RENAME TO bar, ADD COLUMN x ...),ALTER COLUMN ... SET / DROP DEFAULT,ALTER COLUMN ... TYPEADD COLUMNconstraint extras:PRIMARY KEYandUNIQUEon the added column (would need backfill + uniqueness against existing rows)DROP TABLE/DROP INDEXextras: multi-target (DROP TABLE a, b, c;)CREATE VIEW,CREATE TRIGGER- Table-level constraints (composite PK, composite UNIQUE,
FOREIGN KEY,CHECK) - Non-literal
DEFAULTexpressions (CURRENT_TIMESTAMP, function calls, column references) - Composite / multi-column indexes
- Savepoints (
SAVEPOINT,RELEASE SAVEPOINT,ROLLBACK TO SAVEPOINT) - Isolation-level control (
BEGIN IMMEDIATE,BEGIN EXCLUSIVE)
OFFSET- Multi-column
ORDER BY,NULLS FIRST/LAST UNION,INTERSECT,EXCEPTINSERT ... SELECTUPDATE ... FROM,DELETE ... USING- Window functions (
OVER (...),FILTER (WHERE ...),WITHIN GROUP)
- Named placeholders (
:foo,$1,@x) — only positional?is supported (SQLR-23)
- Multiple attached databases (
ATTACH DATABASE,DETACH DATABASE) PRAGMAstatements other thanauto_vacuum(SQLR-13) andjournal_mode(SQLR-22 / Phase 11.3). The dispatcher is in place — adding a pragma is a single arm inexecute_pragma.synchronous,cache_size, etc. are not yet wired upREPLACE INTO,INSERT OR IGNORE,INSERT OR REPLACE(conflict-resolution clauses)
- Using SQLRite — REPL flow, meta-commands, history, read-only mode
- Embedding — the
Connection/Statement/RowsAPI surfacing the same SQL - Storage model — how columns, rows, and indexes live in memory and on disk
- SQL engine — how a query flows from tokens to executor to rows
- Roadmap — when each Not yet supported entry lands