Skip to content

perf: cache prepared statement IDs for projected system table queries (follow-up to #858) #863

@nikagra

Description

@nikagra

Context

PR #858 introduced dynamic column projection for topology monitor queries: the first query uses SELECT * to discover available columns, then subsequent queries use a projected SELECT col1, col2, ... string. However, those subsequent projected queries are still sent as plain Query messages — the query string is re-parsed by the server on every call.

Proposed improvement

After the first SELECT * populates a column cache (e.g. localColumns, peersColumns, peersV2Columns), immediately issue a PREPARE for the resulting projected query string and cache the returned statementId/resultMetadataId alongside the column list. All subsequent queries then use Execute instead of Query, sending only the prepared statement ID (~16 bytes) rather than the full query text (~130–160 bytes), and skipping server-side re-parsing on every call.

Scope

The cleanest targets are the full-scan queries (no WHERE clause, no bind values):

  • SELECT col1, col2, ... FROM system.local WHERE key='local' (fixed WHERE, prepare once)
  • SELECT col1, col2, ... FROM system.peers
  • SELECT col1, col2, ... FROM system.peers_v2

The WHERE-clause single-node queries in refreshNode() already use named bind parameters (:address, :port) and pass null columns (i.e. SELECT *). These could also be extended to use prepared+projected form with positional bind values, but that is a separate concern.

Implementation sketch

In DefaultTopologyMonitor:

  1. Add three additional volatile cache fields: localStatementId, peersStatementId, peersV2StatementId (type ByteBuffer, matching AdminRequestHandler's existing Prepared handler return type).
  2. Add a new AdminRequestHandler.prepare(channel, queryString) factory method (the infrastructure for handling Prepared responses already exists in AdminRequestHandler at line ~188).
  3. After each SELECT * populates a column cache, immediately chain a PREPARE call for the projected query string and store the returned ID.
  4. In the query-building step, if a statementId is available, use Execute instead of Query.
  5. Extend resetColumnCaches() to also clear the prepared IDs — so a reconnect re-issues SELECT * and re-prepares with whatever columns the new server exposes.

Notes

  • Prepared statements on system tables are supported by both Scylla and Cassandra (confirmed by existing test in PreparedStatementTest).
  • Prepared IDs are per-node and do not transfer between connections. Clearing them in resetColumnCaches() (already called by ControlConnection on reconnect) is sufficient.
  • The first query per connection still pays full SELECT * cost; this optimization only affects steady-state repeated queries.

Follows up on #858. Parent epic: DRIVER-274.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions