Skip to content

fix(vfs): resolve umount_lock self-deadlock and writeback page leak in sync path#1936

Open
fslongjin wants to merge 3 commits into
DragonOS-Community:masterfrom
fslongjin:fix-umount-deadlock
Open

fix(vfs): resolve umount_lock self-deadlock and writeback page leak in sync path#1936
fslongjin wants to merge 3 commits into
DragonOS-Community:masterfrom
fslongjin:fix-umount-deadlock

Conversation

@fslongjin
Copy link
Copy Markdown
Member

Summary

Fixes a CI-blocking deadlock introduced in #1934 where umount() hangs
indefinitely when mount propagation triggers sync_filesystem() on a
peer that shares the same non-reentrant umount_lock RwSem.

  • Fix self-deadlock: Move sync_filesystem() before umount_write()
    (Linux generic_shutdown_super pattern); remove sync from propagation path
  • Fix writeback page leak: Add RAII WritebackGuard to prevent pages
    from getting permanently stuck in Writeback state
  • Fix page reclaim contention: Remove sync_fs from page reclaim thread;
    metadata sync belongs to sync()/fsync()/umount(), not the reclaimer
  • Optimize sync scan: Skip clean page caches via has_dirty_pages() check

Root Cause

deepcopy() shares super_block_state (containing umount_lock: RwSem)
between source and propagated mounts via Arc::clone(). The call chain:

umount()
  → sb_state.umount_write()           // WRITE acquired
  → do_umount()
    → propagate_umount()
      → umount_at_peer()
        → child.sync_filesystem()
          → sb_state.umount_read()    // same RwSem → DEADLOCK

Design

Aligns with Linux 6.6.21 generic_shutdown_super() which syncs the
filesystem before acquiring s_umount write-lock. The propagation
path (umount_treepropagate_umount) does not re-sync each peer
since all peers share the same superblock — one top-level sync suffices.

@github-actions github-actions Bot added the Bug fix A bug is fixed in this pull request label May 28, 2026
@fslongjin
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: abdb16692a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread kernel/src/mm/page.rs Outdated
Comment on lines +272 to +274
// Metadata sync (flush_dirty_inodes) is the responsibility of sync()/fsync()/umount(),
// not the page reclaim thread — calling it here would frequently contend on io_guard
// and hold umount_lock.read(), blocking umount.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge 恢复周期性元数据刷盘

这里移除了页面回收线程中唯一的周期性 sync_fs_with_umount_read(true) 调用,但没有新增替代的后台元数据回写路径;我在仓库内检查到剩余 sync_fs 调用只来自显式 sync()/syncfs()/umount() 路径,而 ext4 的 sync_fs 正是用来刷 dirty_inodes。因此在只发生 chmod/utimens/truncate 等元数据变更、应用不显式 sync/fsync 且系统长期运行或崩溃前,这些 dirty inode 会无限期滞留内存,较本提交前每轮回收线程都会刷元数据的行为退化。建议在移出 reclaimer 前先接入独立的周期性 inode writeback。

Useful? React with 👍 / 👎.

Comment on lines +266 to +267
// Mark root as shared so subsequent mounts inherit shared propagation.
if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) != 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge 创建实际 peer 挂载以覆盖传播卸载路径

这个测试只把 / 标记为 shared,但没有创建第二个 peer mount;在当前传播实现里 propagate_umount() 会把触发卸载的 parent 预先加入 processed,如果 peer 组里只有它自己,循环会直接跳过,umount_at_peer() 根本不会被调用。因此该用例即使回退到会在 umount_at_peer() 中重入 sync_filesystem() 的实现也不会复现死锁,容易让传播卸载回归漏检;需要先构造两个同 peer group 的挂载点再在其中一个下面挂载/卸载子挂载。

Useful? React with 👍 / 👎.

@fslongjin fslongjin force-pushed the fix-umount-deadlock branch 2 times, most recently from 3952707 to 45d9c0c Compare May 29, 2026 00:49
fslongjin and others added 2 commits May 29, 2026 15:23
…n sync path

The sync/syncfs refactoring introduced in DragonOS-Community#1934 caused the test-x86 CI
to hang indefinitely. This patch fixes three critical issues:

1. Self-deadlock on umount_lock RwSem: MountFS::umount() acquired
   umount_write, then through propagate_umount → umount_at_peer called
   child.sync_filesystem() which attempted umount_read on the SAME
   RwSem (shared via Arc::clone in deepcopy). Since RwSem is
   non-reentrant, writer + reader on same thread = permanent sleep.

   Fix: Move sync_filesystem() BEFORE acquiring umount_write (aligning
   with Linux's generic_shutdown_super pattern), and remove redundant
   sync from umount_at_peer since all propagated peers share the same
   superblock.

2. Writeback page leak: writeback_entry() could return Err after
   prepare_writeback_entry set page state to Writeback, without calling
   finish_writeback_entry. The page would be stuck in Writeback state
   forever, causing any future wait_writeback_entry to sleep
   indefinitely.

   Fix: Add RAII WritebackGuard that ensures finish_writeback_entry is
   always called on early exit paths.

3. Over-aggressive page reclaim: The page reclaim thread was calling
   sync_fs_with_umount_read(true) for all mounts every 500ms, competing
   with user I/O on io_guard and holding umount_read which delays
   umount. Metadata sync is not the reclaimer's responsibility.

   Fix: Remove sync_fs from page reclaim thread; increase writeback
   interval from 500ms to 5s (matching Linux dirty_writeback_centisecs).

Additionally:
- Add has_dirty_pages() short-circuit to sync_inodes_of_mount() to skip
  clean page caches without expensive inode upgrade + Arc::ptr_eq.
- Add SharedMountPropagationUmountNoDeadlock dunitest that exercises the
  exact deadlock scenario (MS_SHARED + concurrent sync/umount).

Signed-off-by: longjin <longjin@DragonOS.org>
Signed-off-by: longjin <longjin@dragonos.org>
@fslongjin fslongjin force-pushed the fix-umount-deadlock branch from 9111699 to 9dfc8e6 Compare May 29, 2026 07:24
Drop the temporary PR 1936 review analysis document from the branch before merge.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug fix A bug is fixed in this pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant