Skip to content

branch-4.1: [fix](outfile) handle delete_existing_files before parallel export #61223#61726

Open
suxiaogang223 wants to merge 1 commit intoapache:branch-4.1from
suxiaogang223:codex/pick-61223-branch-4.1
Open

branch-4.1: [fix](outfile) handle delete_existing_files before parallel export #61223#61726
suxiaogang223 wants to merge 1 commit intoapache:branch-4.1from
suxiaogang223:codex/pick-61223-branch-4.1

Conversation

@suxiaogang223
Copy link
Contributor

Cherry-pick #61223 to branch-4.1

What problem does this PR solve?

Handle delete_existing_files=true for remote outfile once in FE before parallel export, clear the delete flag before sink options are sent to BE, and reject file:/// with delete_existing_files=true to align outfile behavior with export.

Cherry-pick commit

…pache#61223)

Issue Number: N/A

Related PR: apache#38400

Problem Summary:
When `select ... into outfile` uses `delete_existing_files=true`,
parallel outfile writers can race on directory cleanup and delete files
uploaded by other writers. This PR follows the same FE-side cleanup
pattern used by export in apache#38400: remote outfile cleanup is executed
once in FE before query execution, and the delete flag is cleared before
sink options are sent to BE.

This PR also aligns local outfile behavior with export: `file:///` does
not support `delete_existing_files=true`, so FE rejects that combination
during analysis instead of relying on BE-side cleanup.

To reduce duplicated logic, the FE-side parent-directory cleanup used by
export/outfile/TVF is refactored into shared `BrokerUtil` helpers.

(cherry picked from commit 1576653)
@suxiaogang223 suxiaogang223 requested a review from yiguolei as a code owner March 25, 2026 09:43
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.47% (1786/2276)
Line Coverage 64.26% (32026/49838)
Region Coverage 65.10% (16023/24612)
Branch Coverage 55.61% (8530/15340)

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.85% (19591/37071)
Line Coverage 36.19% (182651/504741)
Region Coverage 32.50% (141105/434201)
Branch Coverage 33.71% (61944/183770)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 69.48% (25207/36279)
Line Coverage 52.09% (262031/503028)
Region Coverage 49.50% (216879/438172)
Branch Coverage 50.85% (93712/184296)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants