Skip to content

[hive] Bug fix for stuck thread on hive prepareCommit failure#8142

Open
ArnavBalyan wants to merge 1 commit into
apache:masterfrom
ArnavBalyan:arnavb/hive-abort-tolerant-precommit
Open

[hive] Bug fix for stuck thread on hive prepareCommit failure#8142
ArnavBalyan wants to merge 1 commit into
apache:masterfrom
ArnavBalyan:arnavb/hive-abort-tolerant-precommit

Conversation

@ArnavBalyan
Copy link
Copy Markdown
Member

@ArnavBalyan ArnavBalyan commented Jun 6, 2026

Purpose

  • Today, any hive job writing to Paimon table failing before precommit, causes job to be permanently stuck and does not exit cleanly.
  • This is due to Paimon throwing another exception in the cleanup process, (when there are no files to be cleaned up). In such scenarios, Paimon throws unchecked runtime exception not caught at caller.
  • Causing the parent thread to be permanently stuck waiting job failure to arrive.
  • Ensure that we avoid unprotected file access, and can handle exceptions when precommit files are not generated.
  • This also fixes an issue of CI being stuck and timing out in some cases when the UT has a failure.

Tests

  • UT

@ArnavBalyan
Copy link
Copy Markdown
Member Author

cc @JingsongLi @leaves12138 could you PTAL thanks!

@ArnavBalyan ArnavBalyan force-pushed the arnavb/hive-abort-tolerant-precommit branch from 8bfdafa to 122e74d Compare June 6, 2026 12:51
@ArnavBalyan ArnavBalyan force-pushed the arnavb/hive-abort-tolerant-precommit branch from 122e74d to b03c03e Compare June 6, 2026 14:33
@JingsongLi
Copy link
Copy Markdown
Contributor

Thanks for fixing the abort cleanup path. I think this needs one more guard before merge.

readPreCommitFile now returns Collections.emptyList() when a .preCommit file is missing, but the same getAllPreCommitMessage(...) path is used by both abortJob and successful commitJob. That tolerance is safe for abort cleanup, but unsafe for commit: if one task's .preCommit file is missing during commitJob, the job will commit only the remaining task messages and then delete the temp directory, which can produce a partial table commit instead of failing atomically.

Could we keep the missing-file tolerance only for the abort path? For example, pass an ignoreMissing flag into getAllPreCommitMessage / readPreCommitFile; abortJob can use tolerant mode, while commitJob should still fail if any expected pre-commit file is absent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants