Skip to content

RUM-16501: Make file write atomic#3476

Open
0xnm wants to merge 1 commit into
developfrom
nogorodnikov/rum-16501/make-file-write-atomic
Open

RUM-16501: Make file write atomic#3476
0xnm wants to merge 1 commit into
developfrom
nogorodnikov/rum-16501/make-file-write-atomic

Conversation

@0xnm
Copy link
Copy Markdown
Member

@0xnm 0xnm commented May 27, 2026

What does this PR do?

We may have a write failure, leading to the corrupted TLV block. This makes further read fail.

This PR makes file write atomic: we will try to rollback to the previous state if write failed.

The approach used is basically to record the original file size and truncate to that mark if write fails. It should work fine even if there is not enough space on the disk.

This change doesn't cover the situation when there is a process death, such case is harder to deal with. Maybe this change will be enough to get rid of majority read errors.

Performance impact

  ┌──────────────────────────┬────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────┐
  │        Operation         │         Before         │                                           After                                           │                      Δ                       │
  ├──────────────────────────┼────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────┤
  │ open()                   │ FileOutputStream       │ RandomAccessFile("rw")                                                                    │ same — both single syscalls, comparable cost │
  ├──────────────────────────┼────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────┤
  │ flock                    │ yes                    │ yes                                                                                       │ unchanged                                    │
  ├──────────────────────────┼────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────┤
  │ fstat for channel.size() │ —                      │ +1 syscall                                                                                │ ~1 µs                                        │
  ├──────────────────────────┼────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────┤
  │ ftruncate                │ —                      │ append=true: no-op (JDK short-circuits when new size ≥ current); append=false: +1 syscall │ 0 to ~2 µs                                   │
  ├──────────────────────────┼────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────┤
  │ lseek/position update    │ —                      │ in-memory field update, no syscall on Linux JDK                                           │ ~0                                           │
  ├──────────────────────────┼────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────┤
  │ write()                  │ one big buffered write │ same write() call, just looped for short-write safety                                     │ ~same                                        │
  ├──────────────────────────┼────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────┤
  │ buffer.flip()            │ —                      │ field update                                                                              │ ~0                                           │
  ├──────────────────────────┼────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────┤
  │ close()                  │ yes                    │ yes                                                                                       │ unchanged                                    │
  └──────────────────────────┴────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────┘

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Make sure you discussed the feature or bugfix with the maintaining team in an Issue
  • Make sure each commit and the PR mention the Issue number (cf the CONTRIBUTING doc)

@0xnm 0xnm requested review from a team as code owners May 27, 2026 13:10
@aleksandr-gringauz
Copy link
Copy Markdown
Contributor

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

kikoveiga
kikoveiga previously approved these changes May 27, 2026
hamorillo
hamorillo previously approved these changes May 27, 2026
Copy link
Copy Markdown
Contributor

@hamorillo hamorillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@0xnm
Copy link
Copy Markdown
Member Author

0xnm commented May 27, 2026

/ddci trigger

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented May 27, 2026

View all feedbacks in Devflow UI.

2026-05-27 14:58:51 UTC ℹ️ Start processing command /ddci trigger


2026-05-27 14:59:08 UTC ℹ️ Devflow: /ddci trigger

Tasks sourcing running asynchronously in workflow devflow:c928ba52-e583-413b-99fa-e7f74ee950df_46:019e69f2-115c-7c12-a979-e65a20700554 under run id 019e69f2-4e13-702f-b7c2-b16156ad8d0b

@datadog-prod-us1-3

This comment has been minimized.

@0xnm 0xnm dismissed stale reviews from hamorillo, kikoveiga, and aleksandr-gringauz via 0e8aa24 May 27, 2026 15:18
@0xnm 0xnm force-pushed the nogorodnikov/rum-16501/make-file-write-atomic branch from 9493555 to 0e8aa24 Compare May 27, 2026 15:18
hamorillo
hamorillo previously approved these changes May 27, 2026
@0xnm 0xnm dismissed stale reviews from hamorillo and aleksandr-gringauz via a35f4be May 27, 2026 15:32
@0xnm 0xnm force-pushed the nogorodnikov/rum-16501/make-file-write-atomic branch 2 times, most recently from 0e8aa24 to a35f4be Compare May 27, 2026 15:32
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 27, 2026

Codecov Report

❌ Patch coverage is 71.42857% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.39%. Comparing base (a8fb3c6) to head (a35f4be).
⚠️ Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
...rsistence/file/batch/PlainBatchFileReaderWriter.kt 71.43% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3476      +/-   ##
===========================================
+ Coverage    72.28%   72.39%   +0.11%     
===========================================
  Files          964      964              
  Lines        35561    35576      +15     
  Branches      5922     5926       +4     
===========================================
+ Hits         25703    25753      +50     
+ Misses        8250     8234      -16     
+ Partials      1608     1589      -19     
Files with missing lines Coverage Δ
...rsistence/file/batch/PlainBatchFileReaderWriter.kt 75.78% <71.43%> (+0.11%) ⬆️

... and 32 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants