feat: get_ranges 2x performance degradation #7380
feat: get_ranges 2x performance degradation #7380comphead wants to merge 13 commits intoapache:mainfrom
get_ranges 2x performance degradation #7380Conversation
|
Putting this to draft, as adding more parallelism increases significantly the memory usage |
get_ranges 2x performance degradation
| .map_err(|err| format_object_store_error(err, &location_ref)) | ||
| } | ||
| }) | ||
| .buffered(10) |
There was a problem hiding this comment.
10 paralellism also used in coalesce_ranges which have been called in original object store get_ranges
|
I think the current change improves
That seems different from the PR description, which suggests this should turn something like 50 range reads into ~5 merged reads. The earlier regression looks more like we lost So this patch may still help wall-clock time, but I don't think it fixes the root cause if the goal is to actually reduce HDFS opens. That likely needs a merged-range path again, e.g. something based on OpenDAL's range-fetch/merge logic instead of buffering individual |
| results.push(data); | ||
| } | ||
| Ok(results) | ||
| let location_ref: Arc<str> = Arc::from(location.as_ref()); |
There was a problem hiding this comment.
Maybe we just call fetch + concurrent here?
|
Thanks @morristai and @Xuanwo let me try to get back the |
|
|
Rolled back to without using
On our cluster with 1200 files × 140MB across 30 machines, this increased total memory from ~0.5T to ~1T. The simple per-range concurrent approach fetches exactly what's needed with no over-allocation. This PR can act like a patch to fix the performance having relatively same memory utilization, and happy to participate in longer term solution in follow up PR if needed |
Which issue does this PR close?
Closes #7383
In Apache DataFusion Comet we figured out that HDFS access slow down dramatically, leading HDFS tasks performance to degrade 2x likely after #7192
Original issue apache/datafusion-comet#3926
Rationale for this change
Summary
get_ranges: Replace the sequential per-range read loop with concurrent reads usingfutures::stream::buffered(8), improving wall-clock latency for multi-range reads (e.g., Parquet rowgroups from HDFS)stat()call: The existingget_rangesalready avoidsstat(); this preserves that by using a singlereadercreated once per call