Skip to content

Fix macOS crash in GetAvailableServers (CGo write barrier)#8541

Open
myleshorton wants to merge 11 commits intomainfrom
fix/macos-getavailableservers-crash
Open

Fix macOS crash in GetAvailableServers (CGo write barrier)#8541
myleshorton wants to merge 11 commits intomainfrom
fix/macos-getavailableservers-crash

Conversation

@myleshorton
Copy link
Contributor

@myleshorton myleshorton commented Mar 18, 2026

Problem

Lantern's macOS app crashes intermittently with a Go runtime panic:

runtime.throw -> runtime.bulkBarrierPreWrite -> runtime.wbMove
_cgoexp_410822c1f227_proxymobile__GetAvailableServers

The crash is in the gomobile-generated CGo wrapper, not in application code. Gomobile-exported functions run on a CGo callback stack. The generated _cgoexp_ wrapper copies Go pointer-containing return values to the C thread stack, whose memory the GC heap bitmap doesn't cover. When the write barrier fires, bulkBarrierPreWrite panics.

This doesn't affect the FFI path (ffi.go) because //export functions return *C.char (C-allocated memory, no GC pointers).

Fix

1. Goroutine hop for gomobile exports (mobile/mobile.go): Wrap GetAvailableServers and GetAutoLocation with common.RunOffCgoStack (from radiance), which runs the entire function body on a real Go goroutine via channel. The pointer-rich intermediate types (sing-box option.Outbound with Options any interfaces) never exist on the CGo callback stack -- the gomobile wrapper only sees the final []byte/string result.

2. CGo-safe JSON methods in radiance (companion PR getlantern/radiance#361): ServersJSON() and GetServerByTagJSON() marshal pointer-rich sing-box types inside radiance under the read lock, returning plain []byte. This is a cleaner API regardless -- callers don't need to handle Server structs with pointer-rich fields.

Also:

  • Removed GetServerByTag from the App interface (all callers are internal)
  • Removed wasteful slog.Debug that allocated a string copy of the servers JSON on every call
  • Ran go mod tidy

Test plan

  • Build and run on macOS
  • Verify server list loads without crash
  • Leave running for extended period -- the crash was intermittent

GetAvailableServers was crashing with a runtime.throw in
bulkBarrierPreWrite when called from Swift via gomobile. The Servers()
return value contains pointer-rich types (Options with any interfaces)
whose copy triggers GC write barriers on the CGo callback stack, which
the GC heap bitmap doesn't cover.

Move the Servers() + json.Marshal work to a dedicated goroutine so it
runs on a proper Go stack where write barriers function correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 18, 2026 17:35
Apply the same goroutine-based fix to getAutoLocation() in both
ffi/ffi.go and mobile/mobile.go. These functions marshal Server types
containing pointer-rich Options interfaces, which can trigger GC write
barrier panics on the CGo callback stack.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a macOS crash when GetAvailableServers is invoked from Swift via gomobile by ensuring pointer-rich server data and JSON marshaling run on a proper Go stack (instead of a cgo callback stack that can trip GC write barriers).

Changes:

  • Move ServerManager().Servers() and json.Marshal into a dedicated goroutine and return the result via a channel.
  • Add inline documentation explaining the cgo/write-barrier crash and why the goroutine workaround is needed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an intermittent macOS crash when server data is marshalled from a gomobile/CGo callback context by moving pointer-rich server retrieval + json.Marshal work onto a dedicated goroutine (i.e., a normal Go stack).

Changes:

  • Update LanternCore.GetAvailableServers() to fetch Servers() and marshal JSON on a goroutine, returning the result via a channel.
  • Apply the same goroutine-hop approach to auto-location server marshalling in the mobile binding and the FFI entrypoint.
  • Add explanatory comments documenting the GC write-barrier/CGo-stack failure mode being avoided.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
lantern-core/core.go Moves Servers() + json.Marshal off the CGo callback stack in GetAvailableServers() to avoid write barrier panics.
lantern-core/mobile/mobile.go Runs GetServerByTag + json.Marshal for auto-location on a goroutine to avoid the same CGo stack issue in gomobile calls.
lantern-core/ffi/ffi.go Runs GetServerByTag + json.Marshal for auto-location on a goroutine to avoid CGo stack write-barrier panics in the FFI path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

myleshorton and others added 2 commits March 18, 2026 11:47
Address PR review feedback:
- Extract a generic RunOnGoStack[T] helper in utils/gostack.go that runs
  a closure on a dedicated goroutine and returns the result via channel.
  Includes defer/recover so a panic in the closure returns an error
  instead of blocking the caller forever.
- Refactor all three call sites (GetAvailableServers, getAutoLocation in
  FFI, GetAutoLocation in mobile) to use the shared helper.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move the CGo safety concern into radiance where the pointer-rich types
originate. Callers now use ServersJSON() and GetServerByTagJSON() which
marshal internally and return plain []byte.

- Remove utils/gostack.go (moved to radiance common/gostack.go)
- GetAvailableServers uses Manager.ServersJSON()
- getAutoLocation (FFI) and GetAutoLocation (mobile) use
  Core.GetServerByTagJSON() which delegates to Manager.GetServerByTagJSON()
- Upgrade radiance to 477c01d which adds the new methods

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a macOS crash when GetAvailableServers (and related server lookups) are invoked from Swift via gomobile/CGo by avoiding returning/copying pointer-rich server structs on the CGo callback stack.

Changes:

  • Switch GetAvailableServers() to use ServersJSON() instead of Servers() + json.Marshal.
  • Add GetServerByTagJSON() to the core interface and route mobile/FFI auto-location lookup through it.
  • Bump github.com/getlantern/radiance to a newer pseudo-version.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
lantern-core/mobile/mobile.go Uses GetServerByTagJSON() for auto-location JSON to avoid copying server structs.
lantern-core/ffi/ffi.go Same shift to GetServerByTagJSON() for the C FFI path.
lantern-core/core.go Adds GetServerByTagJSON() API and switches available-servers retrieval to ServersJSON().
go.mod Updates the radiance dependency version.
go.sum Adds new radiance checksums for the bumped versions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@myleshorton myleshorton requested review from atavism and jigar-f March 18, 2026 18:30
myleshorton and others added 3 commits March 18, 2026 12:45
- Remove slog.Debug that converted potentially large []byte to string on
  every GetAvailableServers call regardless of log level
- Update radiance to 405169d which extracts shared lookup logic and uses
  MarshalContext consistently

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All callers are internal to lantern-core and access serverManager
directly. Removing the unnecessary interface method and delegation
wrapper simplifies the API surface — external callers should use
GetServerByTagJSON which is CGo-safe.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
myleshorton and others added 4 commits March 18, 2026 13:29
The previous fix (using ServersJSON/GetServerByTagJSON) moved the
marshal inside radiance but didn't address the root cause: the crash
is in the gomobile-generated CGo wrapper (_cgoexp_proxymobile__*),
which copies Go pointer-containing return values to the C thread
stack. The GC heap bitmap doesn't cover that memory, so
bulkBarrierPreWrite panics.

The only reliable fix is to run the entire function body on a real Go
goroutine stack via runOffCgoStack. The goroutine result is sent back
through a channel (which only involves copying the final []byte/string,
not pointer-rich intermediate types).

The FFI path (ffi.go) doesn't need this because its //export functions
return *C.char (C-allocated memory, no GC pointers).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move the goroutine hop helper to radiance so any project consuming
radiance through gomobile can use it. Replace the local runOffCgoStack
in mobile.go with common.RunOffCgoStack.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ServersJSON and GetServerByTagJSON in radiance now handle the goroutine
hop internally, so callers don't need to wrap anything. mobile.go goes
back to straightforward delegation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants