Skip to content

[CORE] Avoid driver oom caused by unnecessary metadata columns in huge number of splitinfos#11899

Open
taiyang-li wants to merge 1 commit intoapache:mainfrom
taiyang-li:avoid_driver_oom
Open

[CORE] Avoid driver oom caused by unnecessary metadata columns in huge number of splitinfos#11899
taiyang-li wants to merge 1 commit intoapache:mainfrom
taiyang-li:avoid_driver_oom

Conversation

@taiyang-li
Copy link
Copy Markdown
Contributor

@taiyang-li taiyang-li commented Apr 9, 2026

What changes are proposed in this pull request?

When scanning tens of millions of files, the driver may run out of heap memory during split construction.

Currently we always generate per-file metadata maps (InputFileName / InputFileBlockStart / InputFileBlockLength) for each split, even if the query does not reference these metadata columns. This causes a large number of HashMap instances to be kept in memory and may trigger OOM in the planner.

How was this patch tested?

Manully tested in production environment

Was this patch authored or co-authored using generative AI tooling?

Generated-by: trae

@taiyang-li taiyang-li changed the title [Core] Avoid driver oom caused by unnecessary metadata columns in huge number of splitinfos [CORE] Avoid driver oom caused by unnecessary metadata columns in huge number of splitinfos Apr 9, 2026
@github-actions github-actions bot added CORE works for Gluten Core VELOX CLICKHOUSE labels Apr 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLICKHOUSE CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant