Skip to content

fix(fetcher): wait for IDLE goroutines#1342

Open
mvanhorn wants to merge 1 commit into
floatpane:masterfrom
mvanhorn:osc/731-idle-goroutine-leak
Open

fix(fetcher): wait for IDLE goroutines#1342
mvanhorn wants to merge 1 commit into
floatpane:masterfrom
mvanhorn:osc/731-idle-goroutine-leak

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

What?

Track every IDLE watcher goroutine on the IdleWatcher via sync.WaitGroup so StopAllAndWait catches lingering goroutines whose map entry was already replaced. Add StopAllAndWaitTimeout for shutdown paths that need a bounded wait, and switch the daemon shutdown from the fire-and-forget StopAll() to the bounded variant so one stuck IMAP connection cannot block the daemon.

Why?

Closes #731 (filed by @andrinoff). The existing pattern in fetcher/idle.go:48-60 and :72-75 does close(existing.stop); delete(w.watchers, account.ID) and leaves the goroutine to tear down in the background. If the IMAP connection is stuck (server stops responding mid-IDLE, network hangs without RST), the goroutine never exits, the map no longer holds the done channel, and StopAllAndWait() has no way to see it. The original report's suggested fix was "use WaitGroup or ensure goroutines terminate within timeout" - this PR does both.

The daemon side surfaces the problem: daemon/daemon.go:129 previously called idleWatcher.StopAll() and immediately continued to closeAllClients(). If any underlying connection hung, the lingering IDLE goroutines outlived the daemon's shutdown path with no log entry. The new bounded-wait shutdown logs idle watcher: stop timed out when goroutines don't exit within 5s, so operators see leaks instead of silently losing them.

Two new tests cover both behaviors: replaced-goroutine tracking via WaitGroup, and timeout-on-slow-exit returning ErrStopTimeout.

Testing

  • go test ./fetcher/... -run TestIdleWatcher -count=1 - pass
  • go test -race ./fetcher - pass (added tests plus existing tests)
  • go build ./... - clean
  • gofmt -l fetcher/idle.go daemon/daemon.go fetcher/idle_test.go - clean

Fixes #731

AI-assisted.

@mvanhorn mvanhorn requested a review from a team as a code owner May 22, 2026 07:16
@floatpanebot floatpanebot added area/fetcher IMAP fetch / IDLE / search area/daemon Daemon / RPC bug Something isn't working labels May 22, 2026
Copy link
Copy Markdown
Member

@floatpanebot floatpanebot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mvanhorn! Please fix the following issues with your PR:

  • Title: Is too long (65 characters). The PR title must be strictly under 40 characters.

@floatpanebot floatpanebot added ci CI / build pipeline size/M Diff: 51–200 lines labels May 22, 2026
@mvanhorn mvanhorn changed the title fix(fetcher): track IDLE goroutines so shutdown can wait for them fix(fetcher): wait for IDLE goroutines May 22, 2026
@floatpanebot floatpanebot dismissed their stale review May 22, 2026 08:21

Formatting issues have been resolved. Thank you!

Watch() and Stop() removed accountIdle entries from the map before the
goroutine had a chance to exit, leaving lingering goroutines unobservable
to StopAllAndWait. A hung IMAP connection could turn that into a slow leak.

Track every spawned goroutine on the IdleWatcher via sync.WaitGroup so
the wait-for-completion path catches lingering goroutines whose map
entries were already replaced. Add StopAllAndWaitTimeout for shutdown
paths that need a bounded wait and switch the daemon shutdown to it so
one stuck connection does not block the daemon.

Fixes floatpane#731
@andrinoff andrinoff force-pushed the osc/731-idle-goroutine-leak branch from 4f9122b to b2c95f3 Compare May 23, 2026 09:59
@floatpanebot
Copy link
Copy Markdown
Member

Benchmark report — no significant change

Metrics worse: 0 · better: 0 (threshold: ±3%).

benchstat output
goos: linux
goarch: amd64
pkg: github.com/floatpane/matcha/backend
cpu: AMD EPYC 7763 64-Core Processor                
                           │   old.txt    │               new.txt               │
                           │    sec/op    │    sec/op      vs base              │
ParseSearchQuery_Simple-4    2.527µ ± 48%   3.213µ ± 350%       ~ (p=0.394 n=6)
ParseSearchQuery_Complex-4   8.204µ ± 35%   6.651µ ±  22%       ~ (p=0.132 n=6)
TokenizeSearchQuery-4        3.647µ ± 37%   4.338µ ±  21%       ~ (p=0.368 n=6)
geomean                      4.228µ         4.525µ         +7.03%

                           │   old.txt   │              new.txt               │
                           │    B/op     │    B/op     vs base                │
ParseSearchQuery_Simple-4    26.00 ± 23%   26.00 ± 0%       ~ (p=1.000 n=6)
ParseSearchQuery_Complex-4   762.0 ±  0%   762.0 ± 0%       ~ (p=1.000 n=6) ¹
TokenizeSearchQuery-4        176.0 ±  0%   176.0 ± 0%       ~ (p=1.000 n=6) ¹
geomean                      151.6         151.6       +0.00%
¹ all samples are equal

                           │  old.txt   │              new.txt               │
                           │ allocs/op  │ allocs/op   vs base                │
ParseSearchQuery_Simple-4    2.000 ± 0%   2.000 ± 0%       ~ (p=1.000 n=6) ¹
ParseSearchQuery_Complex-4   23.00 ± 0%   23.00 ± 0%       ~ (p=1.000 n=6) ¹
TokenizeSearchQuery-4        9.000 ± 0%   9.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                      7.453        7.453       +0.00%
¹ all samples are equal

pkg: github.com/floatpane/matcha/tui
                    │   old.txt    │               new.txt               │
                    │    sec/op    │    sec/op     vs base               │
LogPanelView-4        159.2µ ± 16%   179.6µ ± 18%        ~ (p=0.065 n=6)
SearchOverlayView-4   181.0µ ±  7%   186.9µ ± 18%        ~ (p=0.394 n=6)
InboxConstruction-4   585.6µ ± 46%   860.3µ ± 20%  +46.90% (p=0.026 n=6)
geomean               256.5µ         306.8µ        +19.62%

                    │    old.txt    │               new.txt                │
                    │     B/op      │     B/op       vs base               │
LogPanelView-4        33.23Ki ± 34%   44.67Ki ± 51%        ~ (p=0.545 n=6)
SearchOverlayView-4   56.14Ki ± 41%   56.14Ki ± 41%        ~ (p=0.781 n=6)
InboxConstruction-4   873.9Ki ±  0%   873.9Ki ±  0%        ~ (p=1.000 n=6)
geomean               117.7Ki         129.9Ki        +10.37%

                    │   old.txt   │              new.txt              │
                    │  allocs/op  │  allocs/op   vs base              │
LogPanelView-4         713.0 ± 0%    714.0 ± 0%       ~ (p=0.545 n=6)
SearchOverlayView-4    926.0 ± 0%    926.0 ± 0%       ~ (p=1.000 n=6)
InboxConstruction-4   3.475k ± 0%   3.475k ± 0%       ~ (p=0.545 n=6)
geomean               1.319k        1.320k       +0.05%

auto-generated by benchmarks.yml

Copy link
Copy Markdown

@mx-gp mx-gp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve reviewed the changes for potential breaking impacts. This is a solid, non-breaking fix for the goroutine leak. The addition of StopAllAndWaitTimeout and the switch to sync.WaitGroup for tracking IDLE goroutines correctly improves shutdown reliability without affecting existing public API signatures or configurations.

@mvanhorn
Copy link
Copy Markdown
Contributor Author

Thanks @mx-gp.

@andrinoff
Copy link
Copy Markdown
Member

andrinoff commented May 23, 2026

@mvanhorn please, fix the conflicts with the new linter fix, and then merge the branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/daemon Daemon / RPC area/fetcher IMAP fetch / IDLE / search bug Something isn't working ci CI / build pipeline size/M Diff: 51–200 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Potential goroutine leak in IDLE watcher cleanup

4 participants