Skip to content

ADFA-3863 Fix LayoutInflaterTest hang that wedged Jacoco/SonarQube CI#1371

Open
hal-eisen-adfa wants to merge 1 commit into
stagefrom
ADFA-3863-restore-jacoco-sonarqube
Open

ADFA-3863 Fix LayoutInflaterTest hang that wedged Jacoco/SonarQube CI#1371
hal-eisen-adfa wants to merge 1 commit into
stagefrom
ADFA-3863-restore-jacoco-sonarqube

Conversation

@hal-eisen-adfa
Copy link
Copy Markdown
Collaborator

The analyze.yml workflow was timing out at the 180-minute job limit because :xml-inflater:testV8DebugUnitTest hung waiting on a Tooling API child JVM that never exited. Add defense in depth so a future hang fails in minutes, not hours, and fix the underlying race.

Safety nets:

  • ToolingApiTestLauncher: bound cancelCurrentBuild()/shutdown() RPCs with 30s timeouts and fall back to proc.destroyForcibly() if the child JVM does not exit. The child can no longer leak regardless of RPC behavior.
  • LayoutInflaterTest: @rule Timeout caps each test at 2 minutes.
  • build.gradle.kts: project-wide Test task timeout of 10 minutes as a final backstop.
  • analyze.yml: per-step timeout of 60 minutes on "Build and analyze" so a wedge cannot burn the full 180-minute job budget.

Root cause:

  • XmlInflaterTest.initIfNeeded used a non-atomic get/set pair on AtomicBoolean. Replace with compareAndSet, and reset the flag on failure so a failed init does not silently poison subsequent tests.

The analyze.yml workflow was timing out at the 180-minute job limit
because :xml-inflater:testV8DebugUnitTest hung waiting on a Tooling
API child JVM that never exited. Add defense in depth so a future
hang fails in minutes, not hours, and fix the underlying race.

Safety nets:
- ToolingApiTestLauncher: bound cancelCurrentBuild()/shutdown() RPCs
  with 30s timeouts and fall back to proc.destroyForcibly() if the
  child JVM does not exit. The child can no longer leak regardless
  of RPC behavior.
- LayoutInflaterTest: @rule Timeout caps each test at 2 minutes.
- build.gradle.kts: project-wide Test task timeout of 10 minutes as
  a final backstop.
- analyze.yml: per-step timeout of 60 minutes on "Build and analyze"
  so a wedge cannot burn the full 180-minute job budget.

Root cause:
- XmlInflaterTest.initIfNeeded used a non-atomic get/set pair on
  AtomicBoolean. Replace with compareAndSet, and reset the flag on
  failure so a failed init does not silently poison subsequent tests.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

📝 Walkthrough

Release Notes

CI Reliability Improvements

  • CI workflow timeout safeguard: Added 60-minute step-level timeout to "Build and analyze" workflow step in GitHub Actions to prevent unbounded execution
  • Gradle test task timeout: Added 10-minute timeout to all Test tasks across subprojects to prevent hung test JVMs from consuming entire CI budget
  • Per-test timeout: Added JUnit @Rule Timeout of 2 minutes to LayoutInflaterTest to catch hanging tests early
  • Tooling API shutdown hardening: Wrapped cancelCurrentBuild() and shutdown() RPC calls in ToolingApiTestLauncher with 30-second timeouts and added fallback to forcibly destroy the child JVM process if it fails to exit gracefully

Bug Fix

  • XmlInflaterTest initialization race condition: Fixed non-atomic get/set logic in initIfNeeded() by replacing with atomic compareAndSet() operation. The initialization flag now resets to false on failure, preventing subsequent tests from being permanently poisoned by failed initialization

Risk Notes

  • Hard timeouts may mask performance issues: Test timeouts (2 minutes per test, 10 minutes per task, 60 minutes per step) will fail slow tests without addressing underlying performance problems. Recommend monitoring for timeout failures and investigating root causes.
  • Forced process destruction: proc.destroyForcibly() may leave system resources (files, locks, temp directories) uncleaned if the JVM does not exit via normal shutdown. Ensure CI environment cleanup handles orphaned resources.
  • Multiple timeout layers: Three levels of timeout configuration (step, task, test) could lead to confusion about which timeout is actually active. Document timeout strategy for maintainability.

Walkthrough

The PR hardens the CI pipeline against indefinite test hangs by introducing timeout protection at four independent levels: workflow step timeout, Gradle task timeout, test launcher cleanup timeout, and individual test method timeout. Additionally, XmlInflaterTest initialization is refactored to use atomic compare-and-set semantics, allowing retries when setup fails instead of permanently blocking.

Changes

Layered Test Timeout Protection

Layer / File(s) Summary
CI and Gradle test task timeouts
.github/workflows/analyze.yml, build.gradle.kts
Workflow step "Build and analyze" times out at 60 minutes; all Gradle Test tasks are capped at 10 minutes to prevent hung JVMs from consuming the full job timeout budget.
ToolingApiTestLauncher bounded cleanup
testing/tooling/src/main/java/com/itsaky/androidide/testing/tooling/ToolingApiTestLauncher.kt
Launcher shutdown now uses timeout-bounded RPC calls and child process termination with explicit timeout constants; failures are logged and the process is force-destroyed if cleanup does not complete.
JUnit test method timeout rule
xml-inflater/src/test/java/com/itsaky/androidide/inflater/LayoutInflaterTest.kt
Individual test methods are protected with a 2-minute JUnit @Rule Timeout, failing any test that exceeds the duration.
XmlInflaterTest initialization atomic synchronization
xml-inflater/src/test/java/com/itsaky/androidide/inflater/XmlInflaterTest.kt
Initialization uses atomic compare-and-set to handle concurrent access and resets the flag on exception, allowing subsequent test runs to retry initialization instead of being permanently blocked.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • itsaky-adfa
  • jomen-adfa
  • jatezzz

Poem

🐰 A rabbit hops through timeouts true,
No hanging tests will pass on through,
With Gradle bounds and JUnit rules,
The CI pipeline stays sharp as tools,
Atomic flags retry with grace,
No more hangs in this hallowed place!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main issue: fixing a LayoutInflaterTest hang that caused CI timeout, matching the changeset's core objective.
Description check ✅ Passed The description comprehensively explains the problem, all safety nets implemented, and the root cause fix, directly corresponding to the changeset modifications.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ADFA-3863-restore-jacoco-sonarqube

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.github/workflows/analyze.yml (1)

95-100: ⚡ Quick win

Apply the same backstop to the earlier Gradle step.

Build and analyze is now bounded, but Assemble V8 Debug is still an unbounded ./gradlew invocation. A hang there would still sit until the 180-minute job timeout, so the workflow only has partial protection right now.

Suggested change
       - name: Assemble V8 Debug
+        timeout-minutes: 60
         run: |
           echo "gradle_time_start=$(date +%s)" >> $GITHUB_ENV
           flox activate -d flox/base -- ./gradlew :app:assembleV8Debug --no-daemon
           echo "gradle_time_end=$(date +%s)" >> $GITHUB_ENV
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/analyze.yml around lines 95 - 100, The earlier unbounded
Gradle step "Assemble V8 Debug" needs the same timeout and env backstop as the
"Build and analyze" step; update the "Assemble V8 Debug" job/step to include
timeout-minutes: 60 and the GRADLE_OPTS and SONAR_TOKEN env entries (or at least
GRADLE_OPTS="-Xmx10g -XX:MaxMetaspaceSize=512m") and keep the existing run
command that invokes ./gradlew (the Assemble V8 Debug step) so the Gradle
invocation is bounded and uses the same memory/metaspace settings as the Build
and analyze step.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@xml-inflater/src/test/java/com/itsaky/androidide/inflater/XmlInflaterTest.kt`:
- Around line 40-44: The current init.compareAndSet(false, true) lets
non-winning callers return immediately and use the inflater before setup
finishes; change the init logic so losers block until initialization completes
(success or failure) instead of returning. Replace the AtomicBoolean init with a
state machine or synchronization primitive (e.g., an enum/AtomicInteger states
like UNINITIALIZED/INITIALIZING/READY/FAILED plus wait/notify or a
CountDownLatch) and update the initialization block and all callers to: the
winning thread performs setup and sets the terminal state (READY or FAILED),
then notifies/writes the latch, while non-winning threads wait for that terminal
state and then proceed or throw based on success/failure; ensure you update the
same init usage shown in XmlInflaterTest.kt (the init variable and the init
compareAndSet block spanning the original lines and the related 46-55 section).

---

Nitpick comments:
In @.github/workflows/analyze.yml:
- Around line 95-100: The earlier unbounded Gradle step "Assemble V8 Debug"
needs the same timeout and env backstop as the "Build and analyze" step; update
the "Assemble V8 Debug" job/step to include timeout-minutes: 60 and the
GRADLE_OPTS and SONAR_TOKEN env entries (or at least GRADLE_OPTS="-Xmx10g
-XX:MaxMetaspaceSize=512m") and keep the existing run command that invokes
./gradlew (the Assemble V8 Debug step) so the Gradle invocation is bounded and
uses the same memory/metaspace settings as the Build and analyze step.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c366db36-381c-4eff-b6c9-e2a814a48bc9

📥 Commits

Reviewing files that changed from the base of the PR and between d71e643 and 5e276d0.

📒 Files selected for processing (5)
  • .github/workflows/analyze.yml
  • build.gradle.kts
  • testing/tooling/src/main/java/com/itsaky/androidide/testing/tooling/ToolingApiTestLauncher.kt
  • xml-inflater/src/test/java/com/itsaky/androidide/inflater/LayoutInflaterTest.kt
  • xml-inflater/src/test/java/com/itsaky/androidide/inflater/XmlInflaterTest.kt

Comment on lines +40 to 44
// Atomic claim of the init slot. If another caller already won the race,
// they are responsible for completing setup; we just return.
if (!init.compareAndSet(false, true)) {
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Non-winning callers can continue before initialization finishes.

compareAndSet prevents duplicate init, but losers return immediately and may use the inflater before setup completes. This creates a race/flaky path when tests run in parallel. Use a wait/notify (or latch/state-machine) so non-winning callers block until init reaches success/failure.

Suggested fix (state machine + wait/notify)
 object XmlInflaterTest {
-	private val init = AtomicBoolean(false)
+	private enum class InitState { UNINITIALIZED, INITIALIZING, INITIALIZED }
+	private val initLock = Object()
+	`@Volatile` private var initState = InitState.UNINITIALIZED
 	internal val activity by lazy { Robolectric.buildActivity(AppCompatActivity::class.java).get() }

 	fun initIfNeeded() {
-		// Atomic claim of the init slot. If another caller already won the race,
-		// they are responsible for completing setup; we just return.
-		if (!init.compareAndSet(false, true)) {
-			return
-		}
+		synchronized(initLock) {
+			while (initState == InitState.INITIALIZING) initLock.wait()
+			if (initState == InitState.INITIALIZED) return
+			initState = InitState.INITIALIZING
+		}

 		try {
 			ToolingApiTestLauncher.launchServer {
 				assertThat(result is InitializeResult.Success).isTrue()
 				runBlocking { IProjectManager.getInstance().setup(gradleBuild.get()) }
 			}
+			synchronized(initLock) {
+				initState = InitState.INITIALIZED
+				initLock.notifyAll()
+			}
 		} catch (error: Throwable) {
-			// Roll back so subsequent tests can retry rather than silently skipping setup.
-			init.set(false)
+			synchronized(initLock) {
+				initState = InitState.UNINITIALIZED
+				initLock.notifyAll()
+			}
 			throw error
 		}
 	}
 }

Also applies to: 46-55

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@xml-inflater/src/test/java/com/itsaky/androidide/inflater/XmlInflaterTest.kt`
around lines 40 - 44, The current init.compareAndSet(false, true) lets
non-winning callers return immediately and use the inflater before setup
finishes; change the init logic so losers block until initialization completes
(success or failure) instead of returning. Replace the AtomicBoolean init with a
state machine or synchronization primitive (e.g., an enum/AtomicInteger states
like UNINITIALIZED/INITIALIZING/READY/FAILED plus wait/notify or a
CountDownLatch) and update the initialization block and all callers to: the
winning thread performs setup and sets the terminal state (READY or FAILED),
then notifies/writes the latch, while non-winning threads wait for that terminal
state and then proceed or throw based on success/failure; ensure you update the
same init usage shown in XmlInflaterTest.kt (the init variable and the init
compareAndSet block spanning the original lines and the related 46-55 section).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant