All notable changes to BoilStream will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Re-release of 0.10.28 with the multi-tenant bootstrap fix actually compiled into the binary. The 0.10.28 images on Docker Hub (x64-linux-0.10.28, aarch64-linux-0.10.28) were built against the AMI's pre-baked libduckdb_static.a, which scripts/build_linux_release.sh deliberately keeps from re-extracting on every build (# duckdb is intentionally NOT in this list ... Re-bake the AMI to update DuckDB.). The pragma_functions.cpp fix lives in DuckDB C++ source, so it never made it into the 0.10.28 binary and staging continued to fail with the original Secret with name "<catalog>_adm_postgres" not found signature after rollout. The 0.10.29 build re-bakes the build AMIs first so the static library matches the source tarball, then ships the same fix.
If you pulled 0.10.28, upgrade to 0.10.29 — the fix is identical, but only 0.10.29 actually contains it. (Mirrors the 0.10.25 → 0.10.26 retraction we did earlier this week for an analogous "stale source on EC2" pipeline gap.)
- Chart version 0.3.39 tracks appVersion
0.10.29. - ARM64 (
aarch64-linux-0.10.29) and x86_64 (x64-linux-0.10.29) Docker images built on freshly-baked AMIs.
libduckdb_static.a from the AMI rather than recompiling it from the (correct) source tarball it had just uploaded. The 0.10.28 images therefore still hit the original bootstrap failure on staging. Use 0.10.29.
⚠️ Critical: multi-tenant secret bootstrap was broken in 0.10.27. Every freshly-registered user that connected via PGWire and tried toATTACH 'ducklake:<catalog>'gotFATAL: Database '<catalog>' is not available: Secret with name "<catalog>_adm_postgres" not foundand the connection was killed during the eager-attach phase. Bootstrap was reading a stale path for the per-connection tenant identifier, so the/secretsfetch ran in single-tenant mode and cached the wrong (unprefixed) variants of each catalog's postgres-creds reference; the subsequent multi-tenant ATTACH then couldn't resolve them. Bootstrap now reads the tenant identifier through the supported settings path and passes its just-established session state directly to the secret fetch, so the multi-tenant secrets are cached before ATTACH runs. Reproduces locally in seconds withmatview_stress --smokeagainst a freshly-registered user; verified with the full e2e suite (3728 tests, 0 fail) plus DuckDB-isolated serial suite (15/15) plus matview/tantivy stress smokes.
- Chart version 0.3.38 tracks appVersion
0.10.28. - ARM64 (
aarch64-linux-0.10.28) and x86_64 (x64-linux-0.10.28) Docker images built on AWS EC2 (Graviton 2 / Intel Xeon).
- SPA
POST /auth/api/credentialsregular-user path was returning bare:5432. 0.10.26's SPA round-robin landed on the superadmin branch only — the regular-user branch indashboard.rs::generate_credentialskept the pre-fixhost: state.pgwire_host.clone()/port: state.pgwire_port, so everyuser_*cred handed to the SPA pointed atapp.boilstream.com:5432, the gateway listener the chart had just removed. The twoJson(CredentialsResponse {…})blocks differed only in indentation (12 spaces in the superadmin nested block, 8 spaces at the function tail), which is why the originalreplace_allEdit only matched one of them. Caught when matview_stress (which registers fresh regular users) failedConnection refused (os error 61)on Phase 2; the existing staging guard authenticates as superadmin so it didn't see it. Addedauth::spa_credentials_distribution_test::test_spa_credentials_for_regular_user_returns_per_pod_portto lock both paths down.
make ec2-releasenow hard-prereqsmake sync-s3so the EC2 build host can't compile against a stale source tarball — the failure mode that caused the 0.10.25 retraction.SKIP_SYNC_S3=1bypasses for retries against an already-uploaded tarball.- Test-fixture
dan@boilingdata.comrenamed tointegration-test-user@example.com.scripts/reset_staging_test_users.shdeletes every entry on every staging-test run; the previous address was real-looking enough to shadow operator inboxes. Reset list now carries a "RULE: only@example.com" comment to keep future maintainers from putting plausibly-real addresses on the auto-delete list.
matview_stressandtantivy_stressboth extractedhostfrom the SPA cred response but ignoredport, so--pgwire-port(default 5432) was silently used. Both now readcreds["port"]with the CLI flag as the local-dev fallback.
- Chart version 0.3.37 tracks appVersion
0.10.27. - Single ARM64 Docker image (
aarch64-linux-0.10.27) plusx64-linux-0.10.27, both built on AWS EC2 (Graviton 2 / Intel Xeon).
Re-release of 0.10.25. The 0.10.25 images on Docker Hub were built
from a stale source tarball (the EC2 build host pulls source from
s3://boilingdata-demo/source-packages/ and that prefix was a day
behind main), so the binaries shipped under that tag are
pre-fix code — broker entries don't carry public_tcp_port, the
SPA still vends port: 5432, and the bare-:5432-removal in this
release set is not effective on a cluster running 0.10.25. Use
0.10.26 instead. Same scope as the original 0.10.25 entry below;
no new product changes.
Operational note added to the release pipeline: make sync-s3 is
now a hard prereq for make ec2-release so subsequent EC2 builds
can't compile against a stale tarball.
- Chart version 0.3.36 tracks appVersion
0.10.26.
Do not use the published
0.10.25Docker images —boilinginsights/boilstream:aarch64-linux-0.10.25andboilinginsights/boilstream:x64-linux-0.10.25were built from a stale source tarball (S3 source-packages prefix was 1 day behindmain) and ship pre-fix code under the new version label. The originally-intended changes are released as 0.10.26 instead; upgrade to that tag.
- Bare
:5432Gateway listener removed. DuckLake catalog token claims live only on the catalog's master pod (no cross-pod replication), so any L4-round-robin or SNI-mismatched client landing on the wrong pod failed withDuckLake session metadata not found. The chart no longer exposes:5432at all. The only public PGWire entry points are now the per-pod TCPRoute listeners onpgwire.publicTcpPortBase + pod_index(default15432, 15433, …) — pure L4 passthrough, plainsslmode=requireworks for any libpq, including DuckDB's bundled libpq <17.boilstream-admin catalog credentialsand the Web Auth Console both already vend per-pod ports; clients pasting those connection strings need no changes. Operators with custom dashboards or scripts hard-coded to:5432must update them to use the per-pod port from the vended credentials.
- SPA
POST /auth/api/credentialsround-robins across pods. Previously the Web Auth Console handed every user the staticconfig.pgwire.port(5432— the listener we just removed). It now resolves(host, port)by round-robiningcoord.broker_registry().cached_brokers(), filtering brokers withoutpublic_tcp_portso:5432is never returned. Successive vend calls land on different pgwire pods — the horizontal scale-out path for SPA-vendeduser_*creds (workers fetch the SCRAM hash from the leader at connect time, so any pod can authenticate them). BrokerInfo.public_tcp_portpublished in S3 broker registry. New optional field (#[serde(default)]for back-compat with pre-0.10.25 brokers) plus anexternal_pgwire_port()helper mirroringNodeInfo. The leader uses this to discover where each peer pod is reachable from outside the cluster.
- New staging guard
auth::spa_credentials_distribution_test::test_spa_credentials_distribute_with_working_ports— vends 12 SPA credentials, asserts none point at:5432, opens a real psql connection against each unique(host, port)and runsSELECT 1, asserts ≥2 distinct pods on multi-replica clusters. Added totests/staging-green.txt.
- Chart version 0.3.35 tracks appVersion
0.10.25. - Single ARM64 Docker image (
aarch64-linux-0.10.25) — built on AWS EC2 Graviton 2 (c6gd.8xlarge= Neoverse N1) so the same artefact runs on AWS Graviton 2/3/4, Hetzner cax*, Oracle Ampere A1, and Apple Silicon under Docker. The historicalaarch64-generic-linux-infix is dropped — there is no longer a separate Graviton-tuned variant.
- License-binding prefix mismatch on multi-pod clusters. The runtime
ClusterBindingwas built from the cluster_state directory prefix (e.g.cluster/cluster_state/) while licenses are signed against, andboilstream-admin license requestreports, the parent prefix (cluster/). On 0.10.23 every valid license was rejected withWrongBindingand the cluster fell back to community caps. Fixed by trimming the trailingcluster_state/segment inClusterConfig::license_binding_prefix()before binding comparison.
- Automatic
cluster_secret.jsonbootstrap on first leader election. Previously,boilstream-admin license installwould fail its pre-flight binding check on a brand-new cluster because nothing wrotecluster_state/cluster_secret.jsonuntil an operator did so by hand. The elected leader (both at initial startup and during failover promotion) now creates it via an atomicPutMode::Createon first leadership acquisition. The persisted bytes become the source of truth forcluster_secret_hashgoing forward, and the file is available before the first license refresh runs. - Race-safe across multi-pod startup — when several pods race for leadership and call the bootstrap simultaneously, exactly one wins the conditional PUT; the loser falls through to GET and continues with the winner's secret.
- Pre-existing secrets are never overwritten — operators migrating from a previous deployment can pre-seed
cluster_secret.json; the bootstrap detects the existing object and adopts it.
- Chart version 0.3.31 tracks appVersion
0.10.24.
License enforcement gains four layers of defence against binary tampering and signed-license replay. No customer-visible API changes.
- Embedded public-key hash-pin. The license module's primary and secondary ed25519 PEMs are SHA-256 hashed at build time and re-checked at every load. A binary patcher who swaps the embedded PEM bytes must also find and patch the (separately-located) hex digest constants.
- Memory integrity checker. The leader heartbeat re-hashes both PEMs in-memory each cycle and trips a sticky
tamperedflag on drift. Once set, all data-plane cap checks (PGWire, Flight, Kafka) deny new connections until the operator restarts the process. - M-of-2 dual-key signature. Operators can mint licenses dual-signed with two offline private keys. Both signatures must verify; either side's failure trips the same
LicenseError::Signatureas a wholly-invalid token. - Replay watermark persistence. The highest accepted
iatis now persisted tocluster_state/license_min_iat.jsonalongside the license file. After a leader restart, an attacker who tries to reinstall an older signed license trips the watermark. - Runtime cap re-check at four sites — broker register, PGWire connection accept, Flight
do_get/do_put, and Kafka producer connect all consult the cached license + cached cluster usage. A binary patcher who removes one site still trips on the others. - Audit-grep startup line —
License enforcement: vCPU cap=N pod cap=M tier=T (in_grace=…, tampered=…)emitted at process start.
- Chart version 0.3.30 tracks appVersion
0.10.23.
BoilStream now enforces cluster-wide vCPU and pod caps from a signed, ed25519-verified license. Without a license the cluster runs on the free tier (8 vCPU / 2 pods cluster-wide); a valid license unlocks higher caps. The upgrade path for paying customers is a single S3 file drop — no restart, no helm upgrade, no environment variable changes.
boilstream-admin license install <path-to-license.jwt>— locally validates the signature, the bucket/prefix binding, and the cluster-secret hash, then PUTs the license tos3://<bucket>/<prefix>cluster_state/license.json. The leader picks it up within 30 s without a restart.boilstream-admin license status— prints the installed license (tier, expiry, caps), and the cluster's current usage in vCPU and active pods.boilstream-admin license request— prints the bucket / prefix /cluster_secret_hashtriple a customer pastes into an email tolicenses@boilstream.comto receive a signed license.boilstream-admin license remove— deletes the license file (interactive confirm). The cluster reverts to free tier on the next heartbeat.
- Each broker reports its physical-vCPU count to S3 in its registration record. The leader sums vCPU and pod count across non-stale brokers and rejects new registrations that would exceed the effective cap with a structured error:
License cap exceeded: cluster currently uses N vCPU / M pods, license allows X vCPU / Y pods, this pod would push to N+k / M+1. - Existing pods (heartbeat refresh, leader↔worker flip, restart) are never rejected — only NEW registrations are gated. Failover semantics preserved.
- 14-day grace period after a license expires (configurable). During grace, the leader logs WARN per heartbeat but keeps running the licensed tier. Past grace, the cluster falls back to free tier.
- ±300 s clock-skew tolerance for
iatandexp. - License binding to bucket + prefix +
sha256(cluster_secret.json)prevents cross-cluster license reuse.
| Tier | vCPU cap | Pod cap |
|---|---|---|
| community | 8 | 2 |
- Chart version 0.3.28 tracks appVersion
0.10.22. - License enforcement only activates when the cluster coordinator is built with a non-empty bucket name (i.e., real cluster mode). Single-node / standalone deployments are unaffected.
- PGWire stability for clients that auto-name prepared statements. Connection-pool reuse no longer leaks server-side prepared statements between successive PGWire clients, which previously surfaced as intermittent
prepared statement "sN" already existserrors against any libpq / tokio-postgres / pgjdbc workload that allocates statement names automatically (DBeaver, Tableau ODBC, Power BI, JDBC apps, psycopg, DuckDB's bundledpostgresextension). 31 binary-parameter / parameter-types / boolean-encoding / prepared-statement test cases now pass deterministically against staging where they previously flaked. No client-side changes required.
values-hetzner-example.yamlnow selects nodes viakarpenter.sh/nodepool: boilstream-arm64-helinstead of the CFKE-template labelboilstream.com/nodepool: arm64-hel. The Karpenter-managed label is auto-applied to every node Karpenter provisions, so freshly-spun-up nodes are picked up without any extra requirement on the NodePool spec — and CFKE forbids customers from modifying NodePool requirements directly.- Chart catches up to the per-pod TCPRoute work shipped in 0.10.17 / 0.10.18:
pgwire.publicTcpPortBase(default15432), the__POD_TCP_PORT__substitution in the rendered config, and the per-podTCPRoutelisteners are now on the public chart. Customers running the public chart no longer need to pin to a pre-0.10.17 image.
- Chart version 0.3.27 tracks appVersion
0.10.21. - Skip 0.10.20 — same area, partial fix; superseded by 0.10.21.
- DuckLake catalog spec 1.0. Catalogs created by BoilStream now carry the spec 1.0 stability marker, matching what the bundled DuckLake extension in stock DuckDB ≥ 1.5.2 expects. Existing 0.4 catalogs auto-upgrade in place on the first ATTACH that opts in:
No data migration runs — 0.4 → 1.0 is a no-schema-change marker. After this release, anybody on stock DuckDB / pip
ATTACH 'ducklake:postgres:host=…;port=…;…' AS cat (TYPE ducklake, AUTOMATIC_MIGRATION true);
duckdb/ Homebrewduckdbcan connect without a custom build.
- Chart version 0.3.25 tracks appVersion
0.10.19.
- Vended DuckLake credentials default to the per-pod TCP path. When BoilStream hands a remote DuckDB / psql / DBeaver client a connection string, the host + port now point at the dedicated per-pod gateway listener introduced in 0.10.17 instead of the SNI-multiplexed
:5432. Result: the connection string copied out of the auth GUI or the admin CLI just works with traditionalsslmode=requireand any libpq version, including the older one bundled inside DuckDB's stockpostgresextension.
- Chart version 0.3.24 tracks appVersion
0.10.18.
- Per-pod TCP listener for PGWire — direct-TLS-free connectivity for any libpq. The Helm chart now provisions one extra Envoy
TCPRouteper BoilStream pod onpgwire.publicTcpPortBase + pod_index(default15432,15433, …). These listeners are pure L4 passthrough — Envoy does not parse TLS — so the pod handles the full Postgres SSL handshake itself. Clients can therefore use the traditionalsslmode=require(nosslnegotiation=direct, no ALPN) and BoilStream is reachable from:- DBeaver (any pgjdbc bundle), Tableau, PowerBI ODBC, Metabase
- DuckDB's stock
postgresextension (libpq < 17) - Python
psycopg, Golib/pq/pgx, Node.jspg, Rubypg - Anything else that speaks libpq ≥ 9
pgwire.publicTcpPortBasechart value (default15432). Operators can shift the port range if it conflicts with an existing service.- The classical
:5432SNI-routed listener is unchanged; clients with libpq 17 / psql 17 / pgjdbc 42.7+ that sendsslnegotiation=directcontinue to use it for SNI-based pod selection.
- Chart version 0.3.23 tracks appVersion
0.10.17.
- Vended DuckLake credentials carry public DNS names. The
hostfield in vended credentials is now the externally-resolvable per-pod FQDN (boilstream-N.<your domain>) rather than the in-cluster*.svc.cluster.local. Remote DuckDB / BI tools running outside the cluster can now use the credentials directly without DNS rewrites or a kubectl port-forward.
- Chart version 0.3.22 tracks appVersion
0.10.16.
boilstream-adminpins its session to whichever pod served the login. When a login follows a leader-redirect across pod subdomains, the CLI now records the final endpoint and uses it for every subsequent admin call in the same session. No more "session token not found" surprises after a redirect on a multi-pod deployment.
- Chart version 0.3.21 tracks appVersion
0.10.15.
- Session cookies scoped to the configured
app_domain. Cookies issued at login now span the bare domain and every per-pod subdomain. Browsers and CLI clients that follow a leader-redirect (https://app.example.com/→https://boilstream-1.app.example.com:8443/) keep the same authenticated session — no re-login, no broken Set-Cookie scope. - SAML logout cookie scope aligned with the same domain rule.
- Chart version 0.3.20 tracks appVersion
0.10.14.
- DuckLake catalogs self-heal on first credential vend. Catalogs that didn't fully complete the two-phase initialization (DuckDB metadata row + Postgres role) now repair themselves the next time someone asks for credentials, instead of returning an opaque "missing role" error. A new
boilstream-admin catalog repair <id>command exposes the same logic for operators who want to run it explicitly. The healer holds a per-catalog mutex so concurrent vends don't race onCREATE ROLE. boilstream-adminhonoursmfa_secret_pathfrom the active profile config. Previously the path was only loaded for some commands; now the resolution is consistent across login, vend, and admin operations, and falls back to the legacy~/.boilstream/sessions/<profile>_mfa.txtlocation when the profile config doesn't set it.
- Chart version 0.3.19 tracks appVersion
0.10.13.
- Helm chart auto-generates the superadmin TOTP secret. First install creates
boilstream-superadmin-mfaas a Kubernetes Secret withhelm.sh/resource-policy: keep, so subsequenthelm upgraderuns preserve the value. Operators can pre-create the Secret to bring their own MFA secret instead. Removes the manualkubectl create secretstep from the Hetzner / EKS bring-up flow. - Documented end-to-end customer setup flow for k8s deployments — see the new "Kubernetes / Helm deployments" section in the admin CLI guide.
- Staging E2E test harness now uses the real customer path (login through admin CLI, then vend through CLI). Removes a test-only HTTP shortcut that diverged from production behaviour.
- Chart version 0.3.18 tracks appVersion
0.10.12.
- Bootstrap-token generation works on any pod. Previously the
/auth/api/admin/bootstrap-tokenendpoint only succeeded when called against the leader; non-leader pods returned an error. Workers now fetch the bootstrap context from the leader over the internal mTLS RPC channel introduced in 0.10.7, so the customer-onboarding flow is symmetric across all pods. - Staging E2E test suite is now parameterizable: setting
AUTH_URL=https://app.<domain>lets the same test binary drive a live cluster instead of a local dev server.
- Chart version 0.3.17 tracks appVersion
0.10.11.
- Cross-pod PGWire session rejection ("Could not determine tenant for user
user_…. Connection rejected for security."): workers resolved the PGWire username → (internal_user_id, user_id, tenant_id) tuple against their local DuckDB. Both lookups are unsafe on a follower —temp_credentialsis leader-exclusive (never replicated), and theusers.duckdbsnapshot restored on pod start can lag behind leader-side user creation. A user who signed up while the follower was already running was effectively invisible to that pod, so sticky source-IP load-balancing routing them to it produced a hard rejection after a successful SCRAM handshake. The 0.10.7 SCRAM-hash RPC closed only half of this gap.- Fix: new
GetPgwireUserContext(username)RPC on theClusterSyncservice. Workers atomically fetch the full tuple from the leader; a leader-confirmed not-found is authoritative (no fall-back to stale local state). Leader-side handler onClusterSyncStatereads bothoauth_storeandusers_store, mirrors the prefix whitelist frompgwire_server::get_user_ids_for_username, and returns(found, internal_user_id, user_id, tenant_id). Safe to run on a 0.10.10 leader with older workers — the RPC is additive, workers from 0.10.9 and earlier simply don't call it.
- Fix: new
- MFA re-enrollment required after every pod restart:
/datais an emptyDir volume;users.duckdbis restored at cold-start from the leader's last S3 snapshot. But the backup manager only ever uploaded when a handler explicitly calledtrigger_backup(), andmfa_handlers.rsnever did. TOTP enrollment, passkey registration, passkey counter updates, backup-code consumption — none of them hit S3, so a helm upgrade wiped the user's MFA state and forced a re-enroll.- Fix:
trigger_users_backup(state)called after every MFA mutation (TOTP enroll, passkey add, passkey counter update on auth, TOTP/passkey delete, backup-code consumption, backup-code regeneration,remove_mfa_method, and themark_session_mfa_verifiedpaths). - Also fixed:
trigger_backup()itself was partially synchronous — the first-ever trigger and interval-elapsed triggers awaitedperform_backup().awaitinline, meaning the auth handler that just registered a passkey blocked on an S3 PUT before returning to the browser. Nowtrigger_backup()is a pure non-blocking flag-flip; the background task owns all upload execution and ticks every 1s on the leader. Burst triggers inside theinterval_secondswindow (default 60s) coalesce to one upload; triggers arriving during an in-flight upload re-armpendingfor the next cycle so nothing is silently dropped. - Regression tests verify the three contract properties: non-blocking triggers, burst coalescing, and in-flight re-arming.
- Fix:
- Auth GUI connection-string missing direct-TLS hint: the database-credentials page rendered
postgresql://user:pass@host:5432/dbwith nosslmode/sslnegotiationsuffix, so psql users got "server closed the connection unexpectedly" on first paste. The Envoy Gateway TLSRoute-passthrough listener accepts raw TLS but not libpq's default plaintextSSLRequestpreamble, so libpq needssslmode=require&sslnegotiation=direct(libpq 17+) to speak ALPN-negotiated TLS from byte zero.- Fix:
static/auth/common.jsandstatic/auth/db_credentials.htmlnow emit the full query string.boilstream-adminvend paths (superadmin + user vend, server-status output, and the spawnedpsqlchild viaPGSSLMODE=require+PGSSLNEGOTIATION=direct) include the same suffix.
- Fix:
- Chart version 0.3.16 tracks appVersion
0.10.10.
- Broken auth landing page on
https://<domain>/(introduced 0.10.7 and earlier):https://app.boilstream.com/served the auth HTML but the browser got 404 for/auth/boilstream-auth.min.jsand an empty-MIME response for/auth/boilstream-auth.min.css. Result: no layout (floated to the left), no JS (sign-up button dead, login submit dead). Root cause:.gitignoreexcludedstatic/auth/*.min.{js,css}. EC2 release builds usegit archiveon HEAD, so the minified GUI bundles never made it into the source tarball;rust_embedthen had nothing to serve at those paths. The non-minifiedauth.cssfallback via the HTML'sonerrorhandler worked, but no such fallback existed for the JS bundle, so the page broke silently.- Fix: the minified bundles are now re-generated on every
cargo buildbybuild.rs(runsnpm run buildinstatic/auth/whenmain.js/auth.css/ HTML /package.jsonchange) and are never committed. The release build scripts install Node 20 on the EC2 AMI (setup_ubuntu_vm.sh) and runnpm ci && npm run buildbefore cargo as a belt-and-suspenders step. A regression test (tests/auth_assets_embedded_test.rs) assertsboilstream-auth.min.{js,css}andindex.htmlare present in the final binary — CI fails if the gitignore bug ever recurs.
- Fix: the minified bundles are now re-generated on every
boilstream-admin user delete401 with empty body: the CLI loaded the superadmin MFA secret via a non-profile-aware path that probed./superadmin_mfa_secret.txt(CWD) before~/.boilstream/<profile>/mfa_secret.txt. On a dev-checkout running against a live cluster, a stale local-dev secret in the repo root silently signed the delete request with the wrong TOTP. The server correctly rejected with "Invalid fresh TOTP code". Login worked because login explicitly usesload_mfa_secret_for_profile(profile). Fixed by threading the profile name throughdelete_user,verify_superadmin_mfa, andverify_user_mfa, and by makingload_mfa_secret_for_profilecheck~/.boilstream/<profile>/mfa_secret.txt(thescripts/boilstream-admin-k8s-setup.shconvention) in addition to the older~/.boilstream/sessions/<profile>_mfa.txtlayout.boilstream-admin tenant listmisleading error: printed "Tenant list requires cluster connection. Use --server flag" where--serveris not a valid argument. Now prints an honest "not yet implemented — no REST endpoint for catalog→node placement yet" and points operators atcatalog list/node list/tenant handoveras working alternatives.
- Chart version 0.3.14 tracks appVersion
0.10.8.
- Horizontal PGWire scaling with leader-exclusive auth state: non-leader pods now complete SCRAM-SHA-256 auth locally by fetching the password hash from the leader over the internal mTLS cluster API (:8444), rather than requiring cross-pod replication of
users.duckdb. Thetemp_credentialstable stays leader-exclusive — no new per-pod cred sync, no stale replicas, no extra storage — while every pod can accept PGWire connections and run the subsequent query session. One short gRPC round-trip at connect time on workers; queries themselves run locally.- Why it matters:
app.boilstream.com:5432as a bare load-balanced entry point now just works. Clients don't need to parse the leader hostname out of a 307 redirect or pin to a pod-specific SNI; any pod accepts the connection and the auth path converges to a consistent hash on the leader. Fixes the ~50% auth-failure-rate observation in 2-pod Hetzner deploys under theboilstream-any-*TLSRoutes. - How it works:
OAuthCredentialStore::get_scram_hash()now consults the cluster coordinator; on non-leader pods it opens a tonic mTLS channel to{leader.host}:{internal_port}and calls the newClusterSync.GetScramHash(username)RPC. Leader answers from its local DuckDBtemp_credentials. The RPC reuses the existingClusterTlsConfig(cert_path / key_path / ca_cert_path) so no new cert material is required. - New proto:
GetScramHash(GetScramHashRequest) returns (GetScramHashResponse)on theClusterSyncservice. Request is a singleusernamestring; response carries the SCRAM-SHA-256 hash plus afoundflag to disambiguate not-found from genuinely-empty. Backwards compatible — workers connecting to a pre-0.10.7 leader just fall back to their local (empty) lookup and surface the standard "Invalid username or password" error.
- Why it matters:
- FlightRPC ingestion and FlightSQL consumer servers now serve TLS: the top-level
tls.disabled: falseblock is now honored end-to-end. All three Flight servers (ingestion Thread 1 on:50051, admin, consumer FlightSQL on:50250) present the configured identity; the Envoy GatewayTLSRoutepassthrough works through to the backend. Previously the Hetzner chart'sflightlistener targeted:50050(Thread 0) while clients and docs referenced:50051(Thread 1, the canonical entry point) — that's now aligned on50051in the chart, with a newflightsqllistener for50250.
- PGWire clients should use
sslnegotiation=direct(libpq 17+). BoilStream's PGWire SSL handshake runs via ALPN and pure-TLS (not the PG-classic plaintextSSLRequest→Snegotiation); direct-TLS is what Envoy passes through. Older libpq (<17) +sslmode=requirewithout direct negotiation will fail with "received direct SSL connection request without ALPN protocol negotiation extension". - Chart version 0.3.13 tracks appVersion
0.10.7. Contains the newBackendTrafficPolicyCRDs for source-IP consistent-hash LB on theboilstream-any-*TLSRoutes, so clients behind different source IPs stay on the same pod across auth + data-plane listeners (the SCRAM fallback above means this is no longer a correctness requirement, but it keeps per-connection latency predictable by avoiding the leader round-trip when the pod is already the leader). - Flight TLS gating is driven by the
TLS_CERT_PATH/TLS_KEY_PATHenvironment variables on non-free-tier builds (seesrc/tls.rs::load_tls_config_from_env). The Hetzner chart's StatefulSet now exports these to/tls/tls.crtand/tls/tls.key— the same wildcard cert that pgwire and kafka already mount. No new Secrets required.
- SIGILL on ARM64 hosts without SHA3 / SHA512 hardware (Ampere Altra / Neoverse N1: Hetzner cax*, Oracle Ampere A1, AWS Graviton 2): the
aarch64-generic-linux-0.10.5binary shippedsha512h,sha512su0,sha512su1,bcax,eor3,rax1,xarinstructions./proc/cpuinfoon these CPUs correctly reportsaes pmull sha1 sha2 crc32 asimddpwithoutsha3 sha512, so every code path that hit those insns crashed with exit code 132 (SIGILL). Root cause: the previous "-generic" artefact was built on Apple Silicon under OrbStack, and that VM exposessha3+sha512in/proc/cpuinfo;aws-lc-sys,openssl-sys,nettle-syseach autodetect CPU features at build time and compile in hardware-accelerated paths when the build host supports them, regardless ofRUSTFLAGS/CFLAGSoverrides.- Fix:
aarch64-generic-linux-0.10.6is now built on AWS EC2c6gd.8xlarge(Graviton 2 = Neoverse N1) — same microarchitecture as Ampere Altra — so the C autoconf layer innettle-sys/openssl-sysnever sees SHA3/SHA512 in/proc/cpuinfoand never compiles unconditional (non-runtime-dispatched) calls into those libraries.aws-lc-sysstill ships its own pre-generated SHA3/SHA512 assembly, but it is guarded byHWCAP_SHA3/HWCAP_SHA512runtime checks and takes a scalar fallback on Ampere, so those insns remain in the binary as dead code. Verified empirically: 60/60 OPAQUE signups (20 sequential + 2×20 parallel) against a 0.10.6 leader pod on Hetzner Ampere Altra (Hetzner cax21) with zero pod restarts. - Impact: every OPAQUE-heavy auth path (
/auth/email/signup,/auth/email/login, SCIM provisioning, TOTP enrollment) reliably crashed the pod on Ampere hosts. Kubelet restarted the container so the workload kept making forward progress, but ~every successful request cost a restart cycle.0.10.6eliminates the crash entirely.
- Fix:
- Hetzner / CloudFleet clusters running
0.10.5MUST upgrade to0.10.6to avoid the SIGILL. AWS Graviton 2+ customers using theaarch64-generic-linuximage should also upgrade (they'd hit the same crash on Graviton 2 but are unaffected on Graviton 3+). Theaarch64-linux-0.10.5(Graviton-tuned, separate artefact) is NOT affected — it's built on c7gd.8xlarge and intentionally targets Neoverse V1+ features.
- Only
aarch64-generic-linuxis rebuilt at 0.10.6. The other platforms (darwin-aarch64,linux-aarch64Graviton-tuned,linux-x64) never had the SIGILL — their build hosts matched their release targets — so they stay at0.10.5. No action needed for users on those variants. aarch64-linux-0.10.5(AWS Graviton-tuned) is still built onc7gd.8xlarge(Graviton 3 = Neoverse V1) and keeps SHA3+SHA512 hardware paths — AWS Graviton 3/4 users should use that variant for the perf.- Chart version 0.3.11 tracks appVersion
0.10.6.
- 3-way Karpenter/CloudFleet deadlock on
values-hetzner-example.yaml: During the 0.10.5 rollout the Hetzner/CloudFleet cluster repeatedly wedged with pods stuckPendingand phantomUnknownNodeClaims piling up. Root cause: the example's hard pod anti-affinity (requiredDuringSchedulingIgnoredDuringExecution) combined with Karpenter's aggressiveWhenEmptyOrUnderutilizedconsolidation (consolidateAfter: 1m) meant every scale or pod-kill event tainted existing nodeskarpenter.sh/disrupted:NoScheduleto drain them, while CFKE's lockedNodePool.limits.cpu: "16"blocked Karpenter from provisioning replacements. Result: no node could accept the pods (all tainted) and no new node could be launched (over cap). Fix ships two values-hetzner-example defaults:anti-affinity.preferredDuringSchedulingIgnoredDuringExecution(pods still prefer distinct nodes but may co-locate during churn), and akarpenter.sh/disrupted:NoScheduletoleration so pods can temporarily schedule on draining nodes. App code (0.10.5) unchanged.
- Auth-database backup/restore in K8s cluster mode: The pre-0.10.5 flow used
EXPORT DATABASE/IMPORT DATABASEagainst an already-attached users / superadmin DuckDB database. On every pod restart after 0.10.3 enabled/data-emptyDir backup-to-S3,IMPORT DATABASEcollided on pre-created objects (e.g.duckdb_secrets_id_seq) and the restore retried forever, so the emptyDir rollover effectively wiped the user table each time a pod was rescheduled. Replaced with a byte-level file snapshot (CHECKPOINT +fs::copyof the on-disk.duckdbfile, gzip, S3 PUT) that the server atomically drops into place BEFORE the DuckDB ATTACH on the next boot. No schema replay, no sequence collisions, no retry loops. - Leader-failover correctness in 2+ pod clusters: On leader failover, the newly-promoted worker's local auth DB was stale (the previous leader had been writing locally since this worker's last boot-time S3 restore) and the promotion path kept serving those stale reads as the new leader. 0.10.5 purges the local
/data/users.duckdb+data/superadmin.duckdband callsstd::process::exit(1)on worker→leader transition; Kubernetes restarts the container and the normal boot path pulls the latest S3 snapshot before ATTACH + re-enters election with authoritative data. - Non-leader shutdown no longer clobbers S3:
perform_shutdown_backupandperform_backup_with_retrynow checkis_leader()before uploading. Previously a stale worker receiving SIGTERM would upload its local (potentially older) snapshot and overwrite the authoritative blob written by the live leader. Non-leaders on shutdown now lognot the cluster leader — worker state is not authoritative and must not overwrite the leader's S3 snapshotand exit clean.
- Helm chart — optional
updateStrategy.rollingUpdate.partitionon the StatefulSet template, for explicit canary rollouts (update pod-1 first, verify, then lift the partition for pod-0). - Helm chart —
users_backup_pathdefaults toauth/users.snapshot.duckdb.gz(new format);superadmin_backup_pathfalls through tobackups/superadmin/superadmin.snapshot.duckdb.gzvia server-side defaults so no explicit config is needed. - Chart version 0.3.9 / appVersion 0.10.5.
- 6 test users + superadmin persist across both
pod-0→pod-1andpod-1→pod-0leader deletions. - S3 snapshot stays consistent at 28,066,348 bytes across both pods post-failover.
- Promoted pod logs show the expected
"💥 Promoted from worker to leader — purging local auth DBs..."→Deleted local database: /data/users.duckdb→✅ Restored users database from S3 snapshotsequence on the container restart.
- Any
auth/users.duckdb(oldtar.gzformat from 0.10.3) left in S3 is orphaned by this release — safe to delete. The new backup writes to a fresh key, so there is no format-upgrade migration on the bucket; the old blob just becomes unreferenced. - Rolling 0.10.3 → 0.10.5 through 0.10.4 is not supported: 0.10.4 was only ever an iterative Docker-tag build (never git-tagged) and had known promotion-staleness bugs in the
b1/b2sub-builds. Roll directly from 0.10.3 to 0.10.5.
- WebAuthn passkey enrollment in K8s cluster mode: Broker→leader redirects land the browser on a per-pod subdomain (
boilstream-N.<domain>) different from the configuredwebauthn_rp_origin, and webauthn-rs's strict origin check rejected the ceremony withHTTP 400 – Registration failed - The clients relying party origin does not match our servers information.
webauthn_additional_rp_originsserver config: Optional list of extra origins accepted during WebAuthn registration/authentication.rp_idstays as the base domain so credentials roam across all pods;rp_originis the canonical primary URL; the new list covers the redirect-target subdomains.- Helm chart: ConfigMap overlay auto-renders both
:443and:8443variants of every per-pod subdomain (boilstream-0.<domain>, …) intowebauthn_additional_rp_origins. Chart version 0.3.5.
- Cluster admin redirects + responses use public hostname: Broker→leader redirects from the auth server's leader-check middleware, plus the
hostfields inLeaderInfoResponse/BrokerInfoResponse, now advertise an externally-reachable hostname when the newcluster_mode.public_hostis set. Previously these paths only knew aboutadvertised_host(in Kubernetes: the pod's internal headless-Service DNS, unreachable from outside the cluster), soboilstream-adminagainst a cluster's public endpoint could be redirected to a*.svc.cluster.localtarget or receive an internal hostname in the cluster status response.
cluster_mode.public_hostconfig: Optional per-node public hostname used for all client-facing paths — admin redirects,boilstream-admincluster status responses, and future extensions for secret / bootstrap-URL vending.advertised_hostremains the inside-the-cluster DNS name for pod-to-pod gRPC oninternal_api_port, unchanged. Backward-compatible: missingpublic_hostin S3 state (leader.json,brokers/*.json) deserialises cleanly and falls back tohost.- Helm chart:
configmap.yamloverlay auto-renderspublic_host = {POD_NAME}.{values.domain}into each pod's runtime config, kept in sync with the per-podTLSRouteresources (boilstream-N.<domain>). Chart version 0.3.3.
-
Official Kubernetes Helm Chart: Production-ready chart for multi-pod cluster deployments
- StatefulSet with headless Service for stable per-pod DNS
- Per-pod ClusterIP Services exposing PGWire, Kafka, FlightRPC, auth, and cluster ports
- Envoy Gateway integration with TLS passthrough + SNI-based routing — one external LoadBalancer IP terminates all protocols across all pods
- cert-manager wiring: public wildcard cert (Let's Encrypt ACME, DNS-01 or HTTP-01) and a separate internal CA for pod-to-pod mTLS
- PodDisruptionBudget, standard
app.kubernetes.io/*labels, configurableaffinity/nodeSelector/tolerations/topologySpreadConstraints preStophook +terminationGracePeriodSecondsfor graceful connection drain during rolling updates- IRSA / Pod Identity annotations for AWS; image pull secrets for private registries
- Superadmin password and MFA secret sourced from pre-created
Secrets — never committed to values - Example overlays:
values-eks-example.yaml(AWS / NLB) andvalues-hetzner-example.yaml(CloudFleet / Hetzner ARM64) - K8s production-readiness test suite under
tests/k8s/(pod health, leader election, broker registry, SNI routing, failover)
-
Cluster-Mode mTLS: Pod-to-pod cluster coordination traffic can now be encrypted with mutual TLS
- Separate trust root from the public-facing cert — internal CA is isolated from browser trust
cluster_mode.tls.{cert_path,key_path,ca_cert_path,require_client_cert}configuration blockrequire_client_certis now enforced at the TLS handshake (not just at the application layer)- Works out of the box with the chart's cert-manager
ClusterIssuer
-
PGWire Direct-TLS ALPN: Server now advertises the
postgresqlALPN token during TLS negotiation, enablinglibpq >= 18direct-TLS clients to connect without a downgrade round-trip -
DuckDB 1.5.2: Upgraded from 1.4.4 LTS
- New PostgreSQL type inference for
format_type(oid, typmod)parameter binding — now infers[INT8, INT4](was[TEXT, INT4]). BI tools that previously relied on the old inference may need to cast explicitly - Inherits all DuckDB 1.5.x improvements (planner, vector operations, extension loader)
- New PostgreSQL type inference for
- Leader heartbeat on non-AWS S3: Heartbeat now retries with a re-read confirmation and an unconditional PUT fallback when
If-MatchETag comparisons fail. Fixes stalled leadership on S3 implementations where ETags don't round-trip identically between GET and PUT (Hetzner Object Storage, some MinIO configurations) - Cluster-sync mTLS server on promotion: The internal coordination server now starts when a worker is promoted to leader mid-life (previously only started at boot for pods that booted as leader — caused failover gaps)
- Auth server loopback TLS: The auth server on
:8443now presents a self-signed loopback certificate forlocalhost/127.0.0.1SNI connections and the public cert for its real hostname. Fixes in-pod TLS handshakes when the public cert doesn't cover loopback addresses (e.g. Let's Encrypt deployments)
- WebAuthn / RP config from single source: The Helm chart now propagates
values.domaininto bothwebauthn_rp_idandwebauthn_rp_origin, removing one place where the external hostname could drift - Helm chart de-localized: Example charts no longer embed localhost certificates or hard-coded dev hostnames — production deploys use real FQDNs end-to-end
- boilstream-admin wrapper for K8s: New
scripts/boilstream-admin-k8s.shreads CA, superadmin password, and MFA secret live from KubernetesSecrets; computes TOTP locally and execs the admin CLI against the in-cluster deployment testMode.disableTurnstilechart value: Lets CI/test clusters skip the Turnstile CAPTCHA on/auth/email/signupwithout rebuilding the image- Portable ARM64 Linux variant (
-generic/aarch64-generic-linux-0.10.0): The defaultaarch64-linux-0.10.0build is AWS Graviton-tuned and uses extensions not present on Ampere Altra / Oracle Ampere / Apple Silicon (when run in a Linux Docker container); the new-genericvariant is built against the ARMv8-a baseline so it runs everywhere. Both are published side-by-side on S3 and Docker Hub.
-
Materialized Views (Windowed Aggregations): Tumbling and sliding window aggregations over streaming data
CREATE MATERIALIZED VIEW ... WITH (window_type, window_size, timestamp_column)DDL- Tumbling windows (non-overlapping) and sliding windows (overlapping with
slide_interval) - Ingestion timestamp mode: omit
timestamp_columnto window by server ingestion time (__boils_meta_timestamp) - Wall-clock aligned window boundaries at Unix epoch multiples
- Automatic
window_startandwindow_endcolumn injection for consumer-side deduplication - Crash recovery with PostgreSQL watermark persistence — no duplicate or skipped windows on restart
- Dual semaphore executor: fast views (< 60s) get priority, slow views (≥ 60s) capped at N-1 slots
- Per-view FIFO queue with round-robin dequeue — no dropped windows on backpressure
- All standard aggregations: COUNT, SUM, AVG, MIN, MAX, PERCENTILE, MEDIAN, MODE, approx_count_distinct
-
CREATE/DROP STREAMING VIEW DDL: Row-by-row derived topics with continuous SQL transformations
CREATE STREAMING VIEW name AS SELECT ... FROM source WHERE ...DROP STREAMING VIEW [IF EXISTS] name- Supports filtering (WHERE), projections, CASE expressions, scalar functions (UPPER, DATE_TRUNC, casts)
- Three-level view hierarchy:
CREATE VIEW(query-time) →CREATE STREAMING VIEW(continuous row-level) →CREATE MATERIALIZED VIEW(windowed aggregation)
-
Tantivy Full-Text Search: Per-table full-text search indexing with two-tier hot/cold architecture
- Enable via
ALTER TABLE ... SET (tantivy_enabled=true, tantivy_text_fields='col1,col2') - Tantivy-only mode: set
parquet_enabled=falsefor search-only tables without Parquet overhead - Hot tier: local disk indexes, searchable within seconds of ingestion
- Cold tier: segments packed into
.bundlefiles and uploaded to S3, registered in DuckLake - Shadow DuckLake table (
{table}__tantivy_idx) automatically tracks all cold tier bundles - Query with
multilake_search(catalog, shadow_table, query [, limit])— returns results with_scorerelevance column - Automatic Arrow-to-tantivy type mapping: TEXT (tokenized), STRING (exact-match), numeric/timestamp (range queries)
- Enable via
-
Tenant Management: Multi-tenant config schema, tenant admin API endpoints, dashboard, landing page, and member management
-
Auth Invite System: Auto-create tenant at signup with URL-based invite tokens
-
Playwright Smoke Tests: End-to-end browser smoke tests for auth flows
- Schema Registry cascade: Soft-delete schema_registry entries when topics are deleted, preventing stale topic_id references after DROP/CREATE cycles
- Schema re-registration: Clear
deleted_aton schema re-registration to fix ghost soft-deleted entries after table recreation - Matview TIMESTAMP stats: Convert TIMESTAMP column statistics from epoch integers to ISO strings for correct DuckLake registration
- Matview persistence: Harden matview persistence load and window boundary alignment
- Tantivy S3 paths: Fixed S3 key prefix stripping, shadow table data_path handling, and double-slash prevention in upload paths
- Tantivy shutdown: Fixed shutdown hang and durability ack broadcasting in tantivy-only mode
- Tantivy shadow tables: Create via DuckDB DDL instead of direct SQL for proper catalog integration
- PgWire ATTACH performance: Fixed clean-data ATTACH hang where all connections timed out at 15s. Moved role/schema setup to normal user init path, skip redundant work on DuckLake self-connections, and spawn post-ATTACH index creation as background tasks. 30-user concurrent P99: 15s → 800ms
- PgWire ATTACH hang: Fixed DuckLake ATTACH hanging after raw bytes relay by resetting client state
- PgWire deadlock: Eliminated
get_duckdb_context()deadlock in streaming INSERT detection - PgWire streaming view errors: Hardened CREATE/DROP STREAMING VIEW error handling with rollback on failure
- Session init timeout: Added initialization timeout to prevent indefinite connection hangs
- DuckLake auto-attach: Fixed
memorydatabase context being lost after DuckLake auto-attach - DuckLake ATTACH regression: Fixed topic name resolution for ALTER TABLE after ATTACH
- DuckLake table_id collision: Fixed CDC metadata query collision when DuckLake reuses table_id values
- Self-connection stability: Replaced connection abort with
pg_connection_limit=2and idle timeout cleanup - Airport loopback: Fixed table discovery and schema mapping for streaming INSERT
- Auth dark mode: Corrected CSS variables across all UI files
- PgWire performance: Eliminated redundant SQL parsing in Extended Query protocol, added AST caching for parse_sql and detect_client_type
- PgWire refactor: Extracted shared cursor handlers, query classification, and streaming INSERT detection into reusable modules
- Matview executor: Redesigned to connect via pgwire as regular client with tenant isolation, replacing direct DuckDB access
-
SSE Consumer Endpoint: Real-time streaming consumer via Server-Sent Events (SSE)
GET /stream/{token}endpoint for browser and HTTP clients- Arrow IPC base64 encoding for efficient binary transport
- Heartbeat and schema change events
- PULL mode catchup via
Last-Event-IDheader for resumable streams - Per-user/topic rate limiting and connection limits
- Shared token validation with configurable expiry
- JS consumer SDK for browser integration
-
FlightSQL Multi-Tenant Bootstrap: Full tenant isolation parity with pgwire
- Per-user session bootstrap with DuckLake catalog attachment
- Tenant-isolated metadata queries and prepared statements
-
DuckLake Parquet Statistics: Extract real min/max column statistics from Parquet files for DuckLake catalog registration, improving query planning and pruning
- Tenant isolation: Fixed DDL handler using shared processor instead of bootstrapped connection, ensuring proper tenant separation
- Postgres stability: Fixed pgwire ATTACH failures and CDC retry reliability
- Connection cache: Fixed TTL expiry, added per-query RBAC enforcement and VIEW metadata support
- PK/FK metadata APIs: Return empty results instead of unimplemented error for better client compatibility
- Kafka consumer: Fixed ListOffsets fallback, HotChunkManager initialization, and stderr pipe handling
- DuckLake column stats: Fixed stats registration and Kafka server TOCTOU race condition
- HotChunkManager startup reconstruction: Automatic recovery of in-flight data on restart with slow consumer PULL fallback
- Shared database utilities: Extracted
shared_db_utilsmodule for hot chunk database queries - SSE view cache: Cached view metadata for streaming DuckLake queries
- Boolean array encoding: Fixed boolean arrays returning corrupted values over the PostgreSQL wire protocol. The Arrow Int8-based boolean array encoder incorrectly treated scalar byte values as bit-packed data, causing all values after the first
falseto becomefalse. AffectsSELECT $1::BOOLEAN[]roundtrips and any query returning boolean arrays via DuckDB'sarrow_lossless_conversion. - Streaming INSERT classifier: Fixed fully-qualified 3-part table names with
__streamsuffix (e.g.catalog__stream.schema.table) not being detected as streaming INSERTs when no session context was provided.
- S3 connectivity on non-EC2 environments: Fixed S3 storage backend failing to connect on Hetzner, bare-metal, and other non-AWS environments
- Explicitly enable virtual-hosted-style URLs for AWS S3 (required for buckets created after Sep 2020)
- Removed forced HTTP/2-only mode that caused connection failures in some network environments
- Added retry error logging to surface actual S3 errors instead of silent infinite retries
- Added credential source logging (
explicitvsfrom-environment/IMDS) for easier debugging
-
boilstream-admin CLI Improvements: Enhanced CLI for scripting and AI agent integration
--jsonflag: Shorthand for--output jsonhelp-jsoncommand: Machine-readable command structure for tooling/skill discoverycompletionscommand: Shell completions for bash, zsh, fish, powershell, elvishBOILSTREAM_PROFILEenv var: Set default profile without--profileflag--dry-runflag: Preview destructive operations before execution (catalog/user/token delete, s3-state clear)- Semantic exit codes: 0=success, 2=auth, 3=not found, 4=permission denied, 5=validation, 6=network
- Structured JSON error output: Error responses include
code,details, andexit_codefields
-
Confluent Schema Registry (Read-Only Compatible): Production-ready schema registry for Kafka clients
- Full read API compliance:
/subjects,/schemas/ids/{id},/config,/compatibility - Confluent wire format: magic byte
0x00+ 4-byte global ID + Avro payload - All 7 compatibility levels supported (BACKWARD, FORWARD, FULL + TRANSITIVE variants)
- Bearer token authentication with tenant isolation
- Schemas auto-registered via DuckLake DDL (
CREATE TABLE→ schema,ALTER TABLE→ new version) - Subject naming:
{catalog_id}.{schema}.{table}-value(TopicNameStrategy)
- Full read API compliance:
-
Kafka Consumer Group Improvements
- Fixed
seekToBeginningsupport for re-reading from offset 0 - Member validation on OFFSET_COMMIT (Kafka protocol compliance)
- Consumer group offset state persisted in metadata database
- Fixed
- Linux stability on high-CPU machines: Fixed glibc malloc arena fragmentation causing "memory allocation failed" crashes on machines with 28+ vCPUs. Root cause: glibc creates up to
8 × num_cpusarenas, fragmenting virtual address space so large allocations fail despite available physical memory. Fix:mallopt(M_ARENA_MAX, 4)at startup. - DuckDB FFI data race: Fixed
global_call_countrace condition in C++ FFI layer usingstd::atomic
- DuckDB v1.4.4 LTS: Rebased embedded DuckDB fork to v1.4.4 LTS
- C++ API only: Migrated all DuckDB FFI from mixed C/C++ API to C++ API only
- FFI safety test suites: Added ASAN (memory safety) and TSAN (thread safety) test suites for the C++ FFI boundary with Makefile build targets
- Test reorganization: Restructured test files under
tests/directory - Safety checks: Architecture-aware sanitizer builds supporting both x86_64 and aarch64
- Schema Registry API documentation
- Kafka interface consumer group semantics
- Multi-tenant DuckDB: Boilstream runs single DuckDB instance with tenant isolation security
- Secrets, ATTACHments, DuckLakes, filesystem (chroot like), etc. separation between tenants
- Tenants don't see each other, but share the same resources
- Preliminary metrics collection for fair scheduling/billing in the future when needed
- JIT Avro Decoder for Kafka Ingestion: New state-of-the-art just-in-time (JIT) compiled Avro decoder
- Achieving 3-5x faster performance compared to the Rust Apache Arrow decoder released Oct 2025
- Bounds checking in/out to protect against corrupted/malicious data
- All Avro types included and thoroughly tested, including complex/nested types, roundtrip and perf tests
- Embedded DuckLake PostgreSQL Catalog: Native pg_catalog support for DuckLake databases
- Automatic catalog backup/restore to S3 based on user login/logout and changes
- Seamless schema discovery with tools like DBeaver (ensure multidatabase support setting is on)
- DuckLake Data Inlining Support for Stream Ingested Data: Transactional batch commits to hot tier every 1s
- Automatic hot and cold tier DuckLake snapshots
- Realtime cpp appender data committed once per second, immediately visible for ducklake users
- DuckLake Vending Support: Unified data access across multiple client types.
- In-server queries with multi-tenant DuckDB attached DuckLake databases
- Remote native DuckDB clients with full PostgreSQL DuckLake catalog support, including the hot inline data
- DuckDB-WASM browser clients with cached/synced (1min) DuckDB catalogs on S3 (DuckDB-WASM lacks postgres scanner)
- Each client automatically vends correct temporary credentials per client type for each user
- Cold Tier Hydration API: Lift DuckLake tables from cold tier to hot tier
- DuckDB cpp level appender with >1GB/s hydration speed, prioritised with ingestion streams
- Entra ID SAML SSO and SCIM: Enterprise SSO integration with XML metadata file download/upload
- Download/upload XML files for easy setup
- SCIM User Synchronization for automatic user provisioning and deprovisioning via SCIM protocol
- If you enable SAML SSO, local users are disabled (except superadmin)
- Preliminary Horizontal Cluster Mode: Horizontal scaling support for distributed deployments
- Cluster leader for user management and metadata with S3 locking and heartbeats
- Users' DuckLake PG catalog leaders distributed over the cluster with on-demand backup/restore (login/logout/dirty)
- Control with boilstream-admin CLI tool
- boilstream-admin CLI: New command-line tool for managing and observing BoilStream clusters (uses admin API)
- Hydrating ducklake tables, demote/promote leader nodes
- Let AI manage and observe your boilstream clusters
- download as boilstream-admin-x.y.z matching with boilstream-x.y.z version, arch, and OS
- matching boilstream extension version: 0.5.0
- Use with native DuckDB clients as well as with DuckDB-WASM
- More details at https://github.com/dforsber/boilstream-extension
- Correct multi-database visibility over our 1st class Postgres interface, showing "memory" database and any attached databases as their own (like Ducklakes). DBeaver supports "multiple databases" and shows each database as a seprate Database on the navigator
- Audit logging to separate logs folder on disk with partitioning
- Fixed CORS for auth server to work with boilstream duckdb wasm extension from browser
- Fixed session timestamp for opaque pake login response
- Less bloated info logs
- Server does not try to encrypt empty response body, but sends HTTP 204 instead
- Session resumption support for Remote Secrets Store API, matches DuckDB boilstream extension v0.3.1
- Complete separation of Web Auth GUI sessions from OPAQUE login sessions
- Re-designed the DuckDB Secure Remote Secrets Store protocol to be based on industry standard approaches (Facebook OPAQUE PAKE, OAuth2, HKDF, SHA256, etc.). See the DuckDB client extension and its SECURITY_SPECIFICATION.md that also includes full conformance test suite with test vectors. We have independently developed both the server (Rust) and the DuckDB extension using the specification and its conformance test suites to make them fully interoperable. The Facebook's OPAQUE PAKE was audited by NCC back in 2021.
- Secrets Storage comms are integrity protected inside the TLS channel and secrets are encrypted inside the TLS channel with AEAD (i.e. application level e2e protection). Mounting the Remote Secrets Storage happens with anonymised one-time bootstrap token (privacy).
- Shutdown is more swift now (e.g. for rolling restarts/updates)
- Browser caching disabled with the Web Auth GUI
- Security improvement: secrets token vending starts with bootstrap token that is exchanged to session token with PKCE token exchange (anti-theft)
- Web GUI shows token status
- Matching DuckDB boilstream community extension version: v0.2.0
- DuckDB Secure Remote Secrets Storage REST API along with DuckDB Community Extension (https://github.com/dforsber/boilstream-extension)
- GDPR compliant user management with nonrepudiation/nondisputability with PGP encrypted user email address (identity) when user is deleted. Only if public PGP key is configured.
- Web tokens can be revoked like sessions. E.g. a revoked secrets scoped token used in the BoilStream DuckDB Extension does not have access to remote secrets storage anymore after revocation.
- Added verify password field to user manual sign up
- Clearing Web Auth portal password fields on timeout and tab change
- Added verify encryption key on initial boilstream ceremony
- The superadmin ("boilstream") password now has similar strength requirements as the encryption key
- If max sessions were reached, user was blocked. Now, the oldest session is revoked to allow user log in via API / WebAuth console (authentication must succeed).
- TOTP code cannot be reused
- Improved auth API input validations
- Web tokens are generated per purpose/scope (e.g. "secrets", "ingest") to adhere with least privilege security principle
- NEW: Web Portal GUI. Start boilstream and go to https://host:443/ for vending Postgres interface and http ingestion token credentials with social logins (GitHub, Google) and SAML based SSO supported (e.g. AWS SSO SP) through https auth server interface. Includes CloudFlare turnstile captcha.
- MFA with TOTP and PassKey are supported. You can manage these on the auth portal and also revoke sessions, which also close the established postgres sessions if any with the respective credentials.
- BoilStream maintains encrypted users DuckDB database encrypted with key passed during server start (or from file if configued). Key is mem locked and zeroised immediately after use (dbs have been opened). The encrypted dbs are locked into the auth server only. The db encryption is DuckDB v1.4 new feature. By configuring the encryption key path, the key is stored on disk and reused from there, otherwise asked from the user every time the server starts.
- Proper implementation of Postgres
SCRAM-SHA-256based logins with short time credentials vended with OAuth2/creds via login page served through server's auth https server. Postgres md5 passwords not supported anymore. Server never stores user's salted passwords. - The users encrypted database is backed up on selected backend. The system validates the backend exists at startup, recovers the users database from backup if missing locally, and automatically backs up after user creation with configurable interval throttling.
- Superadmin account ("boilstream") password is bootstrapped when the server starts the first time and there is no encrypted superadmin.duckb database yet. Using the "boilstream" as username and the associated password, the postgres connection is established to a separate in-memory DuckDB instance that has the users database attached.
- The users.duckdb database is backed up on the primary backend storage
- Vend http ingestion token through BoilStream auth portal and use it with audio-arrow-streamer.html to stream audio into BoiilStream DuckDB and Data Lake
- Derived views (materialised topics) were still using old DuckDB instance per view approach. Now derived view processor uses single duckdb instance for much improved scalability.
- DuckDB 1.4.0, extensions work again
- Arbitrary number of parameters supported (hard coded max is 10k to avoid OOM)
- Parametrized INSERT/DELETE queries
- DuckDB Arrow lossless Boolean extension type was misinterpreted when returning multiple boolean values
- JSON Array parameters, they were quoted but must not be
- NEW INTERFACE: HTTPS ingestion with Arrow payloads, e.g. from Browsers with Flechette JS. >2GB/s and tens of thousands of concurrent connections.
- Configurable query/connection timeouts. Default from 5min to 30min. (pgwire.connection_timeout_seconds)
- True streaming through postgres interface with lazy fetching from DuckDB to minimise memory consumption. Allows e.g. streaming tens of millions of rows concurrently to multiple clients without consuming much memory.
- Allow streaming all rows, not just first 1M. Allows e.g. Power BI to download all data.
- Fix "time with time zone", "timestamp with time zone", "uuid array", "boolean array" binary parameters handling for prepared queries
- PG type name mapping vs native type naming fixed for allowing Power BI to detect all types properly
- 1st class support for prepared statements including binary parameter types support (also arrays)
- Higher resiliency against attacks and hundreds of concurrent clients, including malicious
- Improved type compliancy HTML report: https://boilstream.com/test_report.html
- Many PG catalog fixes to make type system more complete
- Postgres interface hardening in face of attacks and misbehaving clients
- Improved Postgres interface robustness and resource management (query timeouts, idle connection mgmt, etc.)
- Postgres interface result row record improvements and type modifiers for allowing Power BI to use proper query folding (query pushdown)
- Type compliance report: https://www.boilstream.com/type_coverage_report.md
- Grafana Dashboard updated with more metrics
- NEW: Preliminary Kafka interface with Avro and schema validation. The boilstream.topic_schemas now also include avro_schema column that is the schema for Kafka clients.
- Storage backend now supports multiple Object Storage backends, not just e.g. S3 + filesystem
- By default having DuckDB arrow_lossless_conversion = true (preserves e.g. time zone information with "tiem with time zone" type). Both settings works.
- Full support of Tableou in place. Tableou does not complain about any types it seems, so we only need a minor change to make Tableou work.
- Extensive tests for various data types and special handling for Power BI as its npgsql version is outdated and can't handle TIME, TIMESTAMP, TIMESTAMPTZ and ARRAYs with NULLs. Thus, we convert them to TEXT (temporal) and JSON (ARRAY), but only for Power BI clients. Other clients get these types without conversion. See the demo_database.sql that we used for testing with Power BI Desktop client.
- Fixed more Power BI connection failures due to type mismatch.
- Using object_store crate for generalised Object Store and Filesystem support. E.g. AWS, GCP, and Azure object stores, and Minio.
- Performance: Fixed serialised metadata envelope recycling causing some operations to be serialised
- Defect: Power BI connection failure due to type mismatch
- Flight SQL interface (e.g. with ADBC drivers)
- Self and cross-BoilStream writes with Airport extension (pre-compiled downloadable)
- Graceful shutdown sequence fixed to avoid data loss with derived view processor
- Derived topic id assignment and topic cache miss handling
- Improved BI Tool support: Power BI compatibility
- 1st tier derived topics (aka materialised views) support
- Support for recursive derived topics
- Data persistence layer tiered sticky load balancing for improved parquet locality
- The metadata.duckdb database catalog schema changed (keying by u64 not varchar)
- improved memory management with vector recycling, also switched from jemalloc to mimalloc
- Embedded DuckDB now has more inbuilt core extensions
- Linux and OSX x64 builds
- improved FlightRPC client communications with retries
- Derived view refresh: Materialized views now automatically refresh within 1 second when created or dropped via SQL, eliminating the need to restart the agent
- View changes made through the
boilstream.s3schema are now immediately picked up by the streaming processor
- Added periodic cache invalidation (1s interval) to the derived view processor
- Improved cache consistency between SQL operations and stream processing