fix: gate scan-size on access-plan estimate, not full file size by shefeek-jinnah · Pull Request #6 · hotdata-dev/liquid-cache

shefeek-jinnah · 2026-06-15T05:19:30Z

The max_scan_bytes gate in try_optimize_parquet_source summed each file's full object_meta.size, so point-fetches that read only a few row groups (via a pre-attached ParquetAccessPlan) were judged by the whole file's size and passed through as uncached vanilla DataFusion reads.

Add estimated_scan_bytes(&PartitionedFile): when a precise ParquetAccessPlan is attached to file.extensions, scale the file size by the selected (non-Skip) row-group fraction; otherwise fall back to the full object_meta.size so normal/full scans are unaffected. Use a u128 intermediate to avoid overflow on large files. Wire it into the gate's size computation.

Add a focused unit test covering the targeted vs. full-scan estimate and the cap-crossing regression.

The max_scan_bytes gate in try_optimize_parquet_source summed each file's full object_meta.size, so point-fetches that read only a few row groups (via a pre-attached ParquetAccessPlan) were judged by the whole file's size and passed through as uncached vanilla DataFusion reads. Add estimated_scan_bytes(&PartitionedFile): when a precise ParquetAccessPlan is attached to file.extensions, scale the file size by the selected (non-Skip) row-group fraction; otherwise fall back to the full object_meta.size so normal/full scans are unaffected. Use a u128 intermediate to avoid overflow on large files. Wire it into the gate's size computation. Add a focused unit test covering the targeted vs. full-scan estimate and the cap-crossing regression.

claude

Correct, focused fix. The estimate scales file size by the selected (non-Skip) row-group fraction using plan.len() (validated as total row groups in create_initial_plan) and row_group_indexes(), consistent with existing usage in opener.rs/row_group_filter.rs. Overflow is handled via the u128 intermediate, and edge cases (no plan, total == 0, all-skip) are sound. Test meaningfully covers the cap-crossing regression.

claude

Focused, well-tested fix. The access-plan-aware estimate is correct (safe fallback to full size, zero-guard, u128 overflow guard, conservative row-group-count ratio) and the unit test covers the targeted estimate, full-scan fallback, and cap-crossing regression.

shefeek-jinnah added 2 commits June 15, 2026 10:45

style: rustfmt PartitionedFile builder line in scan-gate test

ec2d70b

claude Bot previously approved these changes Jun 15, 2026

View reviewed changes

shefeek-jinnah dismissed claude[bot]’s stale review via ec2d70b June 15, 2026 05:22

claude Bot approved these changes Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: gate scan-size on access-plan estimate, not full file size#6

fix: gate scan-size on access-plan estimate, not full file size#6
shefeek-jinnah wants to merge 2 commits into
mainfrom
shefeek/scan-gate-access-plan-estimate

shefeek-jinnah commented Jun 15, 2026

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shefeek-jinnah commented Jun 15, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant