Commit ea101ed
catalog: query-aware statistics requests via ScanArgs / ScanResult
Adds an opt-in handshake that lets callers ask a `TableProvider` for
specific stats by name and receive only what the provider can answer
cheaply, instead of the all-or-nothing dense `Statistics` we have today.
## What's new
* `datafusion-common::stats::StatisticsRequest` — enum of stat kinds
that mirror `Statistics` / `ColumnStatistics` (Min, Max, NullCount,
DistinctCount, Sum, ByteSize, RowCount, TotalByteSize). `Hash + Eq`
so it can key a `HashMap`.
* `datafusion-common::stats::StatisticsValue` — `Scalar(Precision<...>)
| Distribution(Arc<dyn Any>) | Sketch(Arc<dyn Any>) | Absent`. Whether
a value is exact or estimated travels in the `Precision` wrapper, not
the variant.
* `ScanArgs::with_statistics_requests` / `statistics_requests()` — the
caller's question.
* `ScanResult::with_statistics` / `statistics()` / `into_parts()` — the
provider's answer, paired 1:1 with the requests slice.
* `PartitionedFile::satisfied_stats` — sparse,
`Arc<HashMap<StatisticsRequest, StatisticsValue>>` for per-file
answers. Memory scales with what was asked, not with table width.
Providers that store stats out-of-band (Delta/Iceberg/Hudi manifests,
Hive Metastore, custom catalogs) can populate this directly without
rebuilding a full dense `Statistics`.
* `FilePruner` learns to consume the sparse map. Internally,
`file_stats_pruning` is now `Box<dyn PruningStatistics + Send + Sync>`
so we can dispatch between the existing `PrunableStatistics` (dense)
and a new `SparseFilePruningStats` adapter (sparse). The sparse
adapter looks up each `StatisticsRequest` directly in the map and
materializes single-row arrays only for the columns the pruning
predicate touches — no densify-then-throw-away.
* `ListingTable::scan_with_args` populates `ScanResult.statistics` from
the merged dense `Statistics` it already computed when
`args.statistics_requests()` is set and `collect_statistics=true`.
When `collect_statistics=false` it returns `Absent` for everything
(the contract is "answer what's free"). `DistinctCount`/`Sum`/
`ByteSize` are likewise `Absent` for parquet — those aren't in
thrift footers; layered helpers (or richer providers) can fill the
gaps.
## Backwards compat
All additions are opt-in:
* `ScanArgs` / `ScanResult` gain new fields with `Default`-friendly
initializers; existing callers that don't use the new builders see
no change.
* `FilePruner`'s field-type change is internal (private field).
* The only minor source-level break is a new pub field on
`PartitionedFile` (`satisfied_stats`). Callers using
`PartitionedFile::new` / `From<ObjectMeta>` / the existing builders
are unaffected. Direct struct literals — uncommon, none in-tree —
need to add `satisfied_stats: None` (or use the new
`with_satisfied_stats` builder).
## Tests
* `datafusion-common::stats::tests::statistics_request_is_hashable_keyable`
— round-trip a `StatisticsRequest` through a `HashMap`.
* `datafusion-pruning::file_pruner::tests` — three tests demonstrating
end-to-end pruning against a sparse-only `PartitionedFile` (`x > 100`
prunes a `[10, 20]` file, `x > 15` doesn't, no stats at all → no
pruner).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent d648982 commit ea101ed
5 files changed
Lines changed: 548 additions & 13 deletions
File tree
- datafusion
- catalog-listing/src
- catalog/src
- common/src
- datasource/src
- pruning/src
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
515 | 516 | | |
516 | 517 | | |
517 | 518 | | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
518 | 528 | | |
519 | 529 | | |
520 | 530 | | |
| |||
583 | 593 | | |
584 | 594 | | |
585 | 595 | | |
586 | | - | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
587 | 614 | | |
588 | 615 | | |
589 | 616 | | |
| |||
688 | 715 | | |
689 | 716 | | |
690 | 717 | | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
691 | 798 | | |
692 | 799 | | |
693 | 800 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
406 | 407 | | |
407 | 408 | | |
408 | 409 | | |
| 410 | + | |
409 | 411 | | |
410 | 412 | | |
411 | 413 | | |
| |||
467 | 469 | | |
468 | 470 | | |
469 | 471 | | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
470 | 500 | | |
471 | 501 | | |
472 | 502 | | |
473 | 503 | | |
474 | 504 | | |
475 | 505 | | |
476 | 506 | | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
477 | 510 | | |
478 | 511 | | |
479 | 512 | | |
480 | | - | |
| 513 | + | |
| 514 | + | |
481 | 515 | | |
482 | 516 | | |
483 | 517 | | |
484 | 518 | | |
485 | | - | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
486 | 534 | | |
487 | 535 | | |
488 | 536 | | |
| |||
493 | 541 | | |
494 | 542 | | |
495 | 543 | | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
496 | 551 | | |
497 | 552 | | |
498 | 553 | | |
499 | 554 | | |
500 | 555 | | |
501 | 556 | | |
502 | 557 | | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
503 | 563 | | |
504 | 564 | | |
505 | 565 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1172 | 1172 | | |
1173 | 1173 | | |
1174 | 1174 | | |
| 1175 | + | |
| 1176 | + | |
| 1177 | + | |
| 1178 | + | |
| 1179 | + | |
| 1180 | + | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
| 1223 | + | |
| 1224 | + | |
| 1225 | + | |
| 1226 | + | |
| 1227 | + | |
| 1228 | + | |
| 1229 | + | |
| 1230 | + | |
| 1231 | + | |
| 1232 | + | |
| 1233 | + | |
| 1234 | + | |
| 1235 | + | |
| 1236 | + | |
| 1237 | + | |
| 1238 | + | |
| 1239 | + | |
| 1240 | + | |
| 1241 | + | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
1175 | 1261 | | |
1176 | 1262 | | |
1177 | 1263 | | |
1178 | 1264 | | |
1179 | 1265 | | |
1180 | 1266 | | |
1181 | 1267 | | |
| 1268 | + | |
| 1269 | + | |
| 1270 | + | |
| 1271 | + | |
| 1272 | + | |
| 1273 | + | |
| 1274 | + | |
| 1275 | + | |
| 1276 | + | |
| 1277 | + | |
| 1278 | + | |
| 1279 | + | |
| 1280 | + | |
| 1281 | + | |
| 1282 | + | |
| 1283 | + | |
| 1284 | + | |
| 1285 | + | |
| 1286 | + | |
| 1287 | + | |
| 1288 | + | |
| 1289 | + | |
| 1290 | + | |
1182 | 1291 | | |
1183 | 1292 | | |
1184 | 1293 | | |
| |||
0 commit comments