fix: don't block the calling thread when queueing events#297
Merged
luxscious merged 1 commit intoJun 4, 2026
Merged
Conversation
…Android (runBlocking in EventQueue) DevCycleHQ#296
upusapati
approved these changes
Jun 3, 2026
Contributor
|
Hey @nnoel-grubhub , thanks for raising the issue #296 and this PR with a fix. We're reviewing the changes and will let you know if we have any feedback or next steps shortly. |
jonathannorris
approved these changes
Jun 3, 2026
jonathannorris
left a comment
Member
There was a problem hiding this comment.
Looks good to me, thanks for the fix!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR:
fix: don't block the calling thread when queueing eventsFixes #296.
What
EventQueue.queueAggregateEvent()andqueueEvent()wrapped their in-memory critical sections inrunBlocking { Mutex.withLock { … } }. This PR guards those sections with aReentrantLockandremoves
runBlocking, so queueing an event no longer blocks the calling thread.Why
variable()/variableValue()/track()are synchronous APIs, andDevCycleClient._variable()is
@Synchronized. BecausequeueAggregateEvent()doesrunBlocking { aggregateMutex.withLock { … } },calling
variable()on the Android main thread parks the main thread insiderunBlockingwhileholding the
DevCycleClientmonitor. Under contention with the periodic event flush this wedges andfreezes the main thread → main-thread ANRs (see the issue for production stacks; we were seeing this
as a high-volume production ANR).
The change (one file,
EventQueue.kt)The critical sections guarded by
aggregateMutexandqueueMutexare pure in-memory work onaggregateEventMap/eventQueuewith no suspension points, so they don't need a coroutineMutex:aggregateMutex,queueMutex→java.util.concurrent.locks.ReentrantLock(aggregateLock,queueLock), usingkotlin.concurrent.withLock.queueAggregateEvent()/queueEvent():runBlocking { }removed; the body runs synchronouslyunder the lock.
flushEvents()keepsflushMutexas a coroutineMutex— it is held across the suspendingrequest.publishEvents(...). Its two inner drain blocks switch to the newReentrantLocks; thoseblocks never suspend, so the lock is acquired and released on one thread before the publish (noted
in a comment).
@Synchronizedon_variable()is intentionally left as-is. It was the amplifier, not the cause;with queueing now non-blocking the monitor is held only for microseconds. Keeping the change scoped
to
EventQueuekeeps it easy to review and low-risk.BigDecimal.ONEaggregation increments,same
IllegalArgumentExceptionvalidation (still thrown synchronously), samescheduler.scheduleWithDelay { run() }scheduling and flush path.aggregateLock/queueLockblock contains onlynon-suspending work (collection ops, the non-suspending
eventsFromAggregateEventMap(), andlaunch, which returns aJobwithout suspending), and influshEvents()they release before thepublishEvents(...)suspension — so there is no resume-on-another-thread / unlock-from-wrong-threadhazard.
flushEvents()takesqueueLockandaggregateLockin separate sequential blocks, producers take exactly one, andflushMutexis onlyever outermost and is never acquired by the producers.
Tests
Unit tests in
EventQueueTests(deterministic, no timing assertions):events are aggregated correctly(existing) — aggregation counts unchanged.queueAggregateEvent throws synchronously when target is missing— validation equivalence.concurrent queueAggregateEvent produces correct aggregate counts— 8 threads × 5k calls produceexact aggregate counts (no lost updates), with a bounded latch await.
The existing
DevCycleClientTests(realMockWebServerflush/event path) pass unchanged, coveringend-to-end flush equivalence.
./gradlew build testDebugUnitTestis green (JDK 18).Verification of the ANR fix (on-device)
The wedge requires the real Android main
Looper, so it does not reproduce in JVM/Robolectricunit tests — it only manifests on a device/emulator (ART). We reproduced and verified it on-device
with a deterministic before/after: driving
variable()on the real mainLooperwhile backgroundthreads +
identifyUserchurn keep the flush contending the event-queue mutex, the unpatched SDKfreezes the main thread within ~8s, and the patched SDK soaks for 90s with no stall. The runnable
reproduction is in the linked issue. (We intentionally do not commit that instrumented test here —
it's a ~90s connected test, not suited to CI.)