Add FlatLayout range read for sub-segment IO#6974
Add FlatLayout range read for sub-segment IO#6974jiaqizho wants to merge 1 commit intovortex-data:developfrom
Conversation
|
Note: If a segment contains a validity bitmap, it falls back to reading the entire segment. Encodings with range read support
Encodings without range read support
|
There was a problem hiding this comment.
I really like the idea of sub segment reads however this must be added in an extensible way @gatesn any thoughts on where do we put this?
It seems to me that we want to allow arrays to specify how slice their buffers.
|
I think this should be moved to a design discussion, please open one and then we can refine the design before impl this feature. |
|
A couple of immediate thoughts:
|
|
just as an aside, I do like the |
Merging this PR will improve performance by 11.12%
Performance Changes
Comparing Footnotes
|
6081b5b to
533ce6a
Compare
|
@joseph-isaacs @gatesn @connortsui20 Thanks for the thorough review and great suggestions! I've force-pushed an update addressing the feedback. I have replaced the static match Design discussion: Opened #6991 with the motivation, full design, benchmark data(in my case), and future directions (nullable support, BufferHandle-based approach, etc.). Would love any further feedback on the design or implementation — happy to keep iterating! |
|
@jiaqizho If you look at the CI actions, you'll see that this change doesn't pass a few checks (formatting and docs). Let us know when you've fixed it and we can run CI for you! |
533ce6a to
458273c
Compare
@connortsui20 Thanks for the heads up! I've pushed fixes for the formatting and public API lock files. These were generated locally so not 100% sure they match your CI environment — pls let me know if anything still fails. |
|
Seems like https://github.com/vortex-data/vortex/actions/runs/23193429228/job/67395598365?pr=6974 is still failing, just need to fix the doc links |
When a FlatLayout has its array_tree metadata inlined in the footer, we can figure out exactly which bytes of the segment are needed for a given row range without any IO. This lets us issue a single small read instead of fetching the entire segment, which is a big win for point lookups and narrow scans on wide tables. The range read planner walks the encoding tree via `VTable::plan_range_read`, where each encoding (Primitive, Bool, BitPacked, Delta, FoR, ZigZag, ALP, ALPRD, Dict, FixedSizeList, Constant, Null, Sequence, ByteBool, DateTimeParts, DecimalByteParts) declares its own buffer sub-ranges and child recursion strategy. If the resulting byte range is less than 50% of the full segment, we issue the targeted read; otherwise we fall back to reading the whole segment. To make Delta work with sub-ranged buffers, Delta::build() now derives child array lengths from `len + offset` instead of metadata.deltas_len. On disk, offset is always 0 so this is a no-op for the normal decode path, but it lets the range read pass a smaller decode_len without the decoder panicking on buffer size mismatch. Also adds `request_range()` to the SegmentSource trait with a default fallback implementation, efficient overrides in FileSegmentSource and BufferSegmentSource, a `RangeReadEnabled` session flag, and `ScanBuilder::with_split_row_indices` to generate per-index tight ranges for point lookups. Signed-off-by: jiaqizho <jiaqi.zhou@zilliz.com>
458273c to
3d68adc
Compare
@connortsui20 done, retrigger pls, thanks. |
|
@joseph-isaacs @gatesn @connortsui20 Hi, wanted to follow up on this. The discussion in #6991 has the full design and S3 benchmark data. The key question is whether the vtable-based plan_range_read approach addresses the extensibility concern from the original PR review. Let me know if you'd prefer a different direction — happy to iterate. |
|
See #6991 (comment) for more discussion |
Summary
When a FlatLayout has its array_tree metadata inlined in the footer, we can figure out exactly which bytes of the segment are needed for a given row range without any IO. This lets us issue a single small read instead of fetching the entire segment, which is a big win for point lookups and narrow scans on wide tables.
The range read planner walks the encoding tree (Primitive, Bool, BitPacked, Delta, FoR, ZigZag, ALP, ALPRD, Dict, FixedSizeList, Constant) and computes the minimal contiguous byte range covering the needed buffers. If that range is less than 50% of the full segment, we issue the targeted read; otherwise we fall back to reading the whole segment.
To make Delta work with sub-ranged buffers, Delta::build() now derives child array lengths from
len + offsetinstead of metadata.deltas_len. On disk, offset is always 0 so this is a no-op for the normal decode path, but it lets the range read pass a smaller decode_len without the decoder panicking on buffer size mismatch.Also adds
request_range()to the SegmentSource trait with a default fallback implementation, efficient overrides in FileSegmentSource and BufferSegmentSource, aRangeReadEnabledsession flag, andpub const NAMEon all encoding structs for pattern matching in the planner.The current implementation requires the array encoding tree (
ArrayNode) to be inlined in the footer viaFLAT_LAYOUT_INLINE_ARRAY_NODE=1. Without this flag, theArrayNodeis stored inside the segment data and is not available to the range read planner until the entire segment is fetched (it would be possible to add an extra IO per column to fetch just theArrayNodefrom the segment, but the overhead would negate much of the benefit). Since the planner needs the encoding tree to determine which byte ranges to read, range read is effectively disabled without inlining, and every take falls back to reading the full segment. A follow-up change will make inlining the default behavior.Testing