Skip to content

perf: inline row decoding and eliminate closures in recv_results_rows#765

Draft
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/inline-row-decode
Draft

perf: inline row decoding and eliminate closures in recv_results_rows#765
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/inline-row-decode

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Mar 25, 2026

Summary

  • Split recv_results_rows into fast path (no column encryption) and slow path (CE enabled)
  • Eliminate per-call closure allocation and merge two-pass row processing into single-pass decoding

Details

Problem

The current recv_results_rows has three sources of overhead on every call:

  1. Two passes over row data: First recv_row reads all raw bytes into a list[list[bytes]], then decode_row iterates again to deserialize — doubling iteration and creating intermediate lists that are immediately discarded.

  2. Per-call closures: decode_val and decode_row are defined as closures inside recv_results_rows, meaning Python allocates new function objects on every result set.

  3. Unconditional ColDesc creation: ColDesc namedtuples are built for every column even when column encryption is not configured (the vast majority of deployments).

Solution

Fast path (no column encryption — the common case):

  • _decode_row_inline(f, colcount, col_types, protocol_version) reads each column's size, reads the bytes, and immediately calls from_binary() — one pass, no intermediate list
  • ColDesc creation is skipped entirely
  • No closures allocated

Slow path (column encryption enabled):

  • Preserves the existing two-pass logic (needed because CE must decrypt before type decoding)
  • decode_val/decode_row moved to module-level functions (_decode_val_ce, _decode_row_ce) to avoid per-call closure overhead

Benchmark results

Scenario Rows × Cols Speedup
Standard decode 10 × 5 1.19×
Standard decode 100 × 5 1.12×
Standard decode 1000 × 5 1.12×
50% NULL columns 100 × 10 1.33×
50% NULL columns 1000 × 10 1.26×

The speedup is higher with NULL-heavy workloads because the inline path short-circuits from_binary() for negative-length (NULL) columns.

Merge conflict note

⚠️ This PR modifies the same recv_results_rows method as PR #630, which also splits the method into CE/non-CE branches. If both PRs are accepted, there will be a merge conflict requiring manual resolution.

Testing

  • All 651 existing unit tests pass (16 pre-existing skips)
  • No new tests added (this is a pure refactor of the decode path; the existing test_protocol.py and test_response_future.py tests exercise recv_results_rows via ResultMessage.recv_body)

Split recv_results_rows into fast path (no column encryption) and slow
path (column encryption enabled):

Fast path (common case):
  - Reads raw column bytes and decodes types in a single pass per row
    via _decode_row_inline(), eliminating the intermediate list-of-lists
  - Skips ColDesc namedtuple creation entirely (only needed for CE)
  - No closure allocation per call

Slow path (column encryption):
  - Preserves full CE logic with ColDesc creation
  - Moves decode_val/decode_row closures to module-level functions
    (_decode_val_ce, _decode_row_ce) to avoid per-call closure overhead

Note: This PR modifies the same method as PR scylladb#630 (which also splits
recv_results_rows into CE/non-CE branches). There will be a merge
conflict that needs manual resolution if both PRs are accepted.
@mykaul mykaul marked this pull request as draft March 25, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant