Skip to content

fix: don't block the calling thread when queueing events#297

Merged
luxscious merged 1 commit into
DevCycleHQ:mainfrom
nnoel-grubhub:nnoel/issue-296-runblocking-in-EventQueue
Jun 4, 2026
Merged

fix: don't block the calling thread when queueing events#297
luxscious merged 1 commit into
DevCycleHQ:mainfrom
nnoel-grubhub:nnoel/issue-296-runblocking-in-EventQueue

Conversation

@nnoel-grubhub

Copy link
Copy Markdown
Contributor

PR: fix: don't block the calling thread when queueing events

Fixes #296.

What

EventQueue.queueAggregateEvent() and queueEvent() wrapped their in-memory critical sections in
runBlocking { Mutex.withLock { … } }. This PR guards those sections with a ReentrantLock and
removes runBlocking, so queueing an event no longer blocks the calling thread.

Why

variable() / variableValue() / track() are synchronous APIs, and DevCycleClient._variable()
is @Synchronized. Because queueAggregateEvent() does runBlocking { aggregateMutex.withLock { … } },
calling variable() on the Android main thread parks the main thread inside runBlocking while
holding the DevCycleClient monitor. Under contention with the periodic event flush this wedges and
freezes the main thread → main-thread ANRs (see the issue for production stacks; we were seeing this
as a high-volume production ANR).

The change (one file, EventQueue.kt)

The critical sections guarded by aggregateMutex and queueMutex are pure in-memory work on
aggregateEventMap / eventQueue with no suspension points, so they don't need a coroutine
Mutex:

  • aggregateMutex, queueMutexjava.util.concurrent.locks.ReentrantLock (aggregateLock,
    queueLock), using kotlin.concurrent.withLock.
  • queueAggregateEvent() / queueEvent(): runBlocking { } removed; the body runs synchronously
    under the lock.
  • flushEvents() keeps flushMutex as a coroutine Mutex — it is held across the suspending
    request.publishEvents(...). Its two inner drain blocks switch to the new ReentrantLocks; those
    blocks never suspend, so the lock is acquired and released on one thread before the publish (noted
    in a comment).

@Synchronized on _variable() is intentionally left as-is. It was the amplifier, not the cause;
with queueing now non-blocking the monitor is held only for microseconds. Keeping the change scoped
to EventQueue keeps it easy to review and low-risk.

  • Behavior unchanged. Same map/list mutations, same BigDecimal.ONE aggregation increments,
    same IllegalArgumentException validation (still thrown synchronously), same
    scheduler.scheduleWithDelay { run() } scheduling and flush path.
  • No thread lock held across a suspension. Every aggregateLock/queueLock block contains only
    non-suspending work (collection ops, the non-suspending eventsFromAggregateEventMap(), and
    launch, which returns a Job without suspending), and in flushEvents() they release before the
    publishEvents(...) suspension — so there is no resume-on-another-thread / unlock-from-wrong-thread
    hazard.
  • Deadlock-free. No thread holds two inner locks at once — flushEvents() takes queueLock and
    aggregateLock in separate sequential blocks, producers take exactly one, and flushMutex is only
    ever outermost and is never acquired by the producers.

Tests

Unit tests in EventQueueTests (deterministic, no timing assertions):

  • events are aggregated correctly (existing) — aggregation counts unchanged.
  • queueAggregateEvent throws synchronously when target is missing — validation equivalence.
  • concurrent queueAggregateEvent produces correct aggregate counts — 8 threads × 5k calls produce
    exact aggregate counts (no lost updates), with a bounded latch await.

The existing DevCycleClientTests (real MockWebServer flush/event path) pass unchanged, covering
end-to-end flush equivalence. ./gradlew build testDebugUnitTest is green (JDK 18).

Verification of the ANR fix (on-device)

The wedge requires the real Android main Looper, so it does not reproduce in JVM/Robolectric
unit tests — it only manifests on a device/emulator (ART). We reproduced and verified it on-device
with a deterministic before/after: driving variable() on the real main Looper while background
threads + identifyUser churn keep the flush contending the event-queue mutex, the unpatched SDK
freezes the main thread within ~8s, and the patched SDK soaks for 90s with no stall. The runnable
reproduction is in the linked issue. (We intentionally do not commit that instrumented test here —
it's a ~90s connected test, not suited to CI.)

@nnoel-grubhub nnoel-grubhub requested a review from a team as a code owner June 3, 2026 18:50
@nnoel-grubhub nnoel-grubhub changed the title Issue 296: variable() / track() can freeze the main thread → ANRs on Android (runBlocking in EventQueue) fix: don't block the calling thread when queueing events Jun 3, 2026
@kaushalkapasi

kaushalkapasi commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Hey @nnoel-grubhub , thanks for raising the issue #296 and this PR with a fix.

We're reviewing the changes and will let you know if we have any feedback or next steps shortly.

@jonathannorris jonathannorris left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for the fix!

@luxscious luxscious merged commit 9d1b0f2 into DevCycleHQ:main Jun 4, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

variable() / track() can freeze the main thread → ANRs on Android (runBlocking in EventQueue)

5 participants