fix(vfs): resolve umount_lock self-deadlock and writeback page leak in sync path#1936
fix(vfs): resolve umount_lock self-deadlock and writeback page leak in sync path#1936fslongjin wants to merge 3 commits into
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: abdb16692a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Metadata sync (flush_dirty_inodes) is the responsibility of sync()/fsync()/umount(), | ||
| // not the page reclaim thread — calling it here would frequently contend on io_guard | ||
| // and hold umount_lock.read(), blocking umount. |
There was a problem hiding this comment.
这里移除了页面回收线程中唯一的周期性 sync_fs_with_umount_read(true) 调用,但没有新增替代的后台元数据回写路径;我在仓库内检查到剩余 sync_fs 调用只来自显式 sync()/syncfs()/umount() 路径,而 ext4 的 sync_fs 正是用来刷 dirty_inodes。因此在只发生 chmod/utimens/truncate 等元数据变更、应用不显式 sync/fsync 且系统长期运行或崩溃前,这些 dirty inode 会无限期滞留内存,较本提交前每轮回收线程都会刷元数据的行为退化。建议在移出 reclaimer 前先接入独立的周期性 inode writeback。
Useful? React with 👍 / 👎.
| // Mark root as shared so subsequent mounts inherit shared propagation. | ||
| if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) != 0) { |
There was a problem hiding this comment.
这个测试只把 / 标记为 shared,但没有创建第二个 peer mount;在当前传播实现里 propagate_umount() 会把触发卸载的 parent 预先加入 processed,如果 peer 组里只有它自己,循环会直接跳过,umount_at_peer() 根本不会被调用。因此该用例即使回退到会在 umount_at_peer() 中重入 sync_filesystem() 的实现也不会复现死锁,容易让传播卸载回归漏检;需要先构造两个同 peer group 的挂载点再在其中一个下面挂载/卸载子挂载。
Useful? React with 👍 / 👎.
3952707 to
45d9c0c
Compare
…n sync path The sync/syncfs refactoring introduced in DragonOS-Community#1934 caused the test-x86 CI to hang indefinitely. This patch fixes three critical issues: 1. Self-deadlock on umount_lock RwSem: MountFS::umount() acquired umount_write, then through propagate_umount → umount_at_peer called child.sync_filesystem() which attempted umount_read on the SAME RwSem (shared via Arc::clone in deepcopy). Since RwSem is non-reentrant, writer + reader on same thread = permanent sleep. Fix: Move sync_filesystem() BEFORE acquiring umount_write (aligning with Linux's generic_shutdown_super pattern), and remove redundant sync from umount_at_peer since all propagated peers share the same superblock. 2. Writeback page leak: writeback_entry() could return Err after prepare_writeback_entry set page state to Writeback, without calling finish_writeback_entry. The page would be stuck in Writeback state forever, causing any future wait_writeback_entry to sleep indefinitely. Fix: Add RAII WritebackGuard that ensures finish_writeback_entry is always called on early exit paths. 3. Over-aggressive page reclaim: The page reclaim thread was calling sync_fs_with_umount_read(true) for all mounts every 500ms, competing with user I/O on io_guard and holding umount_read which delays umount. Metadata sync is not the reclaimer's responsibility. Fix: Remove sync_fs from page reclaim thread; increase writeback interval from 500ms to 5s (matching Linux dirty_writeback_centisecs). Additionally: - Add has_dirty_pages() short-circuit to sync_inodes_of_mount() to skip clean page caches without expensive inode upgrade + Arc::ptr_eq. - Add SharedMountPropagationUmountNoDeadlock dunitest that exercises the exact deadlock scenario (MS_SHARED + concurrent sync/umount). Signed-off-by: longjin <longjin@DragonOS.org>
Signed-off-by: longjin <longjin@dragonos.org>
9111699 to
9dfc8e6
Compare
Drop the temporary PR 1936 review analysis document from the branch before merge. Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
Fixes a CI-blocking deadlock introduced in #1934 where
umount()hangsindefinitely when mount propagation triggers
sync_filesystem()on apeer that shares the same non-reentrant
umount_lockRwSem.sync_filesystem()beforeumount_write()(Linux
generic_shutdown_superpattern); remove sync from propagation pathWritebackGuardto prevent pagesfrom getting permanently stuck in Writeback state
sync_fsfrom page reclaim thread;metadata sync belongs to
sync()/fsync()/umount(), not the reclaimerhas_dirty_pages()checkRoot Cause
deepcopy()sharessuper_block_state(containingumount_lock: RwSem)between source and propagated mounts via
Arc::clone(). The call chain:Design
Aligns with Linux 6.6.21
generic_shutdown_super()which syncs thefilesystem before acquiring
s_umountwrite-lock. The propagationpath (
umount_tree→propagate_umount) does not re-sync each peersince all peers share the same superblock — one top-level sync suffices.