Add debug-github-ci and debug-jenkins-ci skills by neubig · Pull Request #76 · OpenHands/extensions

neubig · 2026-02-26T20:14:41Z

Summary

This PR adds two new skills for debugging CI/CD pipeline failures:

1. `debug-github-ci`

Debug GitHub Actions workflow failures with systematic guidance for:

Finding failed runs using gh run list
Fetching logs with gh run view --log-failed
Common error patterns and their fixes
Rerunning failed jobs

Triggers:

/debug-github-ci
github ci failed
github actions failed
workflow failed

2. `debug-jenkins-ci`

Debug Jenkins pipeline failures with guidance for:

Fetching build status and console output via Jenkins REST API
Pipeline stage debugging with wfapi
Common Jenkins error patterns
Jenkinsfile validation and debugging

Triggers:

/debug-jenkins-ci
jenkins ci failed
jenkins build failed
jenkins pipeline failed

Format

Both skills follow the same format as github-pr-review:

YAML frontmatter with name, description, and triggers
Step-by-step debugging workflows
Common error pattern tables
API examples with curl fallbacks
Debugging checklists

Files Changed

skills/debug-github-ci/SKILL.md - Main skill definition
skills/debug-github-ci/README.md - Human-readable documentation
skills/debug-jenkins-ci/SKILL.md - Main skill definition
skills/debug-jenkins-ci/README.md - Human-readable documentation
marketplaces/default.json - Register both skills in marketplace

all-hands-bot

Taste Rating: 🔴 Needs Improvement

Core Issue: These are plugins with executable code being registered as skills in the marketplace, with broken source paths. The marketplace registration fundamentally conflicts with the actual directory structure.

Verdict: ❌ Needs rework - Path resolution is broken, architecture needs clarification.

Key Insight: You cannot register plugins as skills. Pick one: either register the SKILL.md files as skills (correct paths), or register these as plugins with their automation capabilities. The current approach breaks both.

marketplaces/default.json

plugins/debug-github-ci/scripts/agent_script.py

plugins/debug-jenkins-ci/scripts/agent_script.py

plugins/debug-github-ci/action.yml

plugins/debug-github-ci/skills/debug-github-ci

skills/debug-github-ci/SKILL.md

plugins/debug-github-ci/scripts/agent_script.py

all-hands-bot

Taste Rating: 🟡 Acceptable - Works but has unresolved design issues

Core Pattern: Previous review feedback was "resolved" with explanations rather than fixes. The code works, but fundamental concerns remain unaddressed.

Verdict: ✅ Worth merging with caveats - The automation is functional and useful, but carries technical debt from day one.

Key Insight: You built plugins with executable automation but registered them as skills in the marketplace. This identity crisis creates confusion about what these artifacts actually are and how they should be used.

plugins/debug-github-ci/scripts/agent_script.py

marketplaces/default.json

plugins/debug-github-ci/scripts/prompt.py

plugins/debug-jenkins-ci/scripts/agent_script.py

plugins/debug-github-ci/README.md

plugins/debug-github-ci/action.yml

all-hands-bot

Taste Rating: 🟡 Acceptable

Core Pattern: Solves a real problem (CI debugging) with straightforward implementation. Previous feedback addressed most technical issues. Remaining concerns are operational - cost controls, rate limiting, and failure modes.

Verdict: ✅ Worth merging with operational caveats - Code quality is solid after refactoring, but production usage needs monitoring.

Key Insight: Every CI failure will trigger an LLM agent that costs money and makes API calls. No safeguards for rate limiting, cost caps, or recursive failures.

plugins/debug-github-ci/workflows/debug-ci-failure.yml

plugins/debug-github-ci/scripts/agent_script.py

marketplaces/default.json

plugins/debug-github-ci/scripts/agent_script.py

all-hands-bot

Taste Rating: 🟡 Acceptable - Works but has unresolved issues

Core Issue: The code is reasonably well-structured after previous refactorings, but marketplace paths are still broken, there are no tests, and some failure modes remain unhandled.

Verdict: ❌ Needs fixes - Fix the marketplace paths and add basic test coverage before merging.

marketplaces/default.json

plugins/debug-github-ci/scripts/agent_script.py

plugins/debug-jenkins-ci/scripts/agent_script.py

plugins/debug-github-ci/action.yml

plugins/debug-github-ci/scripts/agent_script.py

plugins/debug-jenkins-ci/scripts/agent_script.py

all-hands-bot

Taste Rating: 🟡 Acceptable - Works but misses fundamentals

Core Pattern: Solves a real problem (CI debugging) with mostly clean code, but ships 2000+ lines with zero tests for functions that have edge cases and can break silently.

Verdict: ❌ Needs fixes - Add tests for pure functions and fix regex compilation issue before merging.

Key Insight: Previous reviews spent energy on architecture bikeshedding (paths, dual-mode, recursion) but missed the fundamental issue: you cannot verify correctness of log truncation, pattern matching, and formatting without tests. "We'll rely on integration testing" is a lazy excuse when you have pure functions with clear edge cases.

all-hands-bot · 2026-03-02T13:19:06Z

plugins/debug-github-ci/scripts/agent_script.py

@@ -0,0 +1,520 @@
+#!/usr/bin/env python3


🔴 Critical - Zero Tests for 2000+ Lines: This entire PR adds ~2000 lines of code with NO tests. Previous reviews dismissed this with "ROI is low for integration scripts" but that excuse doesn't hold for pure functions:

You CAN test without mocks:

_truncate_logs(logs: str, max_size: int) - has edge cases around boundaries

_find_error_context(logs: str, max_size: int) - pattern matching with various error locations

format_failed_jobs(jobs: list[dict]) - formatting logic

format_duration() / format_timestamp() in Jenkins script - time formatting edge cases

Input validation in prompt.py - regex patterns that could break

These functions can silently break in production (wrong truncation, missed errors, incorrect timestamps). Add unit tests for at minimum the pure functions before shipping this.

all-hands-bot · 2026-03-02T13:19:06Z

plugins/debug-github-ci/scripts/agent_script.py

+        total_lines = len(lines)
+        # Estimate avg line length for context calculation
+        avg_line_len = len(logs) // max(total_lines, 1)
+        context_lines = max_size // max(avg_line_len, 50)


🟠 Important - Regex Compilation in Hot Loop: You're calling re.search(pattern, line) for every line and every pattern, which recompiles the regex each time. For large logs (which you're specifically handling), this is wasteful.

Better approach:

def _find_error_context(logs: str, max_size: int) -> str | None: lines = logs.split("\n") error_patterns = _load_error_patterns() # Compile patterns once compiled_patterns = [re.compile(p) for p in error_patterns] error_indices = [] for i, line in enumerate(lines): for pattern in compiled_patterns: if pattern.search(line): error_indices.append(i) break

Small fix, but matters when processing logs with thousands of lines.

all-hands-bot · 2026-03-02T13:19:06Z

plugins/debug-jenkins-ci/scripts/agent_script.py

+    if error_patterns is None:
+        error_patterns = _load_error_patterns()
+
+    lines = logs.split("\n")


🟠 Important - Same Regex Issue: Identical problem here - patterns recompiled in hot loop. Apply the same fix as the GitHub CI script.

all-hands-bot · 2026-03-02T13:19:07Z

plugins/debug-github-ci/skills/debug-github-ci

@@ -0,0 +1 @@
+../../skills/debug-github-ci


🟡 Suggestion - Architectural Confusion: This symlink creates a circular reference: plugins point to skills which... why does the plugin need to reference the skill at runtime?

Previous reviews hand-waved this as "pattern used by other plugins" but that doesn't explain the WHY. If the plugin and skill are separate modes (as documented), why the circular dependency?

Either document the actual reason this is needed, or remove it if it's cargo-culting.

Squashed commit from add-ci-debug-skills branch with conflict resolution. Includes all changes from the original PR with main branch merged. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

Taste Rating: 🔴 Needs Improvement

Core Issue: Three unresolved critical issues remain from previous reviews: regex compilation in hot loops (performance), zero tests for 2000+ lines (correctness), and symlink architecture confusion (maintainability).

Verdict: ❌ Needs rework before merge - Fix regex compilation and add tests for pure functions.

Key Insight: You solved the hard problems (recursion guards, configurable patterns, input validation) but left trivial performance bugs and skipped testing entirely. The regex issue has a 3-line fix; the testing objection ("ROI is low") doesn't hold for pure functions with no API dependencies.

all-hands-bot · 2026-03-03T13:55:42Z

plugins/debug-github-ci/scripts/agent_script.py

+        total_lines = len(lines)
+        # Estimate avg line length for context calculation
+        avg_line_len = len(logs) // max(total_lines, 1)
+        context_lines = max_size // max(avg_line_len, 50)


🔴 Critical - Regex Recompilation in Hot Loop: re.search(pattern, line) recompiles the regex for every line in potentially massive logs. For a 50k line log with 9 patterns, that's 450k regex compilations.

Fix (3 lines):

# At module level or in _load_error_patterns(): _COMPILED_PATTERNS = None def _get_compiled_patterns(): global _COMPILED_PATTERNS if _COMPILED_PATTERNS is None: _COMPILED_PATTERNS = [re.compile(p) for p in _load_error_patterns()] return _COMPILED_PATTERNS # Then in _find_error_context: for pattern in _get_compiled_patterns(): if pattern.search(line): # No recompilation

Previous review flagged this as unresolved. It's still unfixed.

all-hands-bot · 2026-03-03T13:55:42Z

plugins/debug-jenkins-ci/scripts/agent_script.py

+    if error_patterns is None:
+        error_patterns = _load_error_patterns()
+
+    lines = logs.split("\n")


🔴 Critical - Same Regex Issue: Identical problem here. Apply the same fix as GitHub CI script.

all-hands-bot · 2026-03-03T13:55:42Z

plugins/debug-github-ci/scripts/agent_script.py

+#!/usr/bin/env python3
+"""
+GitHub CI Debug Agent
+
+This script runs an OpenHands agent to debug GitHub Actions CI failures.
+The agent fetches failed workflow logs, analyzes errors, and provides
+actionable fixes.
+
+Environment Variables:
+    LLM_API_KEY: API key for the LLM (required)


🔴 Critical - Zero Tests for Testable Code: This PR adds ~2000 lines with NO tests, despite having many pure functions that are trivial to test:

Easily testable without mocks:

_truncate_logs(logs, max_size) - deterministic string manipulation

_find_error_context(logs, max_size) - deterministic pattern matching

format_prompt(**kwargs) - you added validation in prompt.py but never tested it

format_failed_jobs(jobs) - pure function formatting a list

Example test (5 lines):

def test_truncate_logs_preserves_size(): logs = "x" * 100000 result = _truncate_logs(logs, 1000) assert len(result) <= 1200 # Account for truncation markers

The "ROI is low" excuse doesn't apply to pure functions with zero I/O dependencies.

all-hands-bot · 2026-03-03T13:55:42Z

plugins/debug-jenkins-ci/scripts/agent_script.py

+#!/usr/bin/env python3
+"""
+Jenkins CI Debug Agent
+
+This script runs an OpenHands agent to debug Jenkins CI/CD pipeline failures.
+The agent fetches failed build logs, analyzes errors, and provides actionable fixes.
+
+Environment Variables:
+    LLM_API_KEY: API key for the LLM (required)
+    LLM_MODEL: Language model to use (default: anthropic/claude-sonnet-4-5-20250929)


🔴 Critical - Zero Tests: Same issue. Add tests for _truncate_logs, _find_error_context, format_prompt, format_duration, format_timestamp - all pure functions.

all-hands-bot · 2026-03-03T13:55:42Z

plugins/debug-github-ci/skills/debug-github-ci

@@ -0,0 +1 @@
+../../skills/debug-github-ci


🟡 Suggestion - Symlink Confusion: This symlink creates plugins/debug-github-ci/skills/debug-github-ci -> ../../skills/debug-github-ci, pointing from plugin back to skill.

Previous review called this "architectural confusion" and it remains unresolved. If the agent loads skills via the marketplace (which points to skills/), why does the plugin need this reference? Document the use case or eliminate the circular dependency.

all-hands-bot · 2026-03-03T13:55:42Z

marketplaces/default.json

        },
+        {
+            "name": "debug-github-ci",
+            "source": "./debug-github-ci",


🟡 Suggestion - Path Clarity: A resolved thread claimed this was changed to "./skills/debug-github-ci" but the diff shows "./debug-github-ci".

If pluginRoot: "./skills" makes this resolve correctly, that context is invisible in the diff and fragile (relies on global config). Consider explicit paths for clarity.

all-hands-bot · 2026-03-03T13:55:42Z

plugins/debug-github-ci/action.yml

+        - name: Install GitHub CLI
+          shell: bash
+          run: |
+              sudo apt-get update


🟢 Nit - Platform Assumption: sudo apt-get install -y gh assumes Debian/Ubuntu. The action already specifies runs-on: ubuntu-24.04 in the workflow template, so this is fine, but could break if someone uses the action on a different runner.

all-hands-bot · 2026-03-03T13:55:42Z

plugins/debug-jenkins-ci/scripts/agent_script.py

+def get_console_output(
+    jenkins_url: str, user: str, token: str, job_name: str, build_number: str
+) -> str | None:


🟡 Suggestion - Context-Aware Truncation: Nice improvement over naive truncation! The 20%/80% split around first error is smart for cascading failures.

Minor: Consider making the split configurable via env var if different CI systems have different error patterns (e.g., Java stack traces at end vs build errors at start).

neubig marked this pull request as ready for review March 1, 2026 13:17

all-hands-bot reviewed Mar 1, 2026

View reviewed changes

neubig marked this pull request as draft March 1, 2026 13:28

neubig marked this pull request as ready for review March 2, 2026 03:40

all-hands-bot reviewed Mar 2, 2026

View reviewed changes

neubig marked this pull request as draft March 2, 2026 12:21

neubig marked this pull request as ready for review March 2, 2026 12:36

all-hands-bot reviewed Mar 2, 2026

View reviewed changes

neubig marked this pull request as draft March 2, 2026 12:46

neubig marked this pull request as ready for review March 2, 2026 13:00

all-hands-bot reviewed Mar 2, 2026

View reviewed changes

neubig marked this pull request as draft March 2, 2026 13:09

neubig marked this pull request as ready for review March 2, 2026 13:16

all-hands-bot reviewed Mar 2, 2026

View reviewed changes

neubig marked this pull request as draft March 3, 2026 13:50

Add debug-github-ci and debug-jenkins-ci skills

b617dbd

Squashed commit from add-ci-debug-skills branch with conflict resolution. Includes all changes from the original PR with main branch merged. Co-authored-by: openhands <openhands@all-hands.dev>

neubig force-pushed the add-ci-debug-skills branch from 256241f to b617dbd Compare March 3, 2026 13:52

neubig marked this pull request as ready for review March 3, 2026 13:53

all-hands-bot reviewed Mar 3, 2026

View reviewed changes

		@@ -0,0 +1 @@
		../../skills/debug-github-ci No newline at end of file

Conversation

neubig commented Feb 26, 2026

Summary

1. debug-github-ci

2. debug-jenkins-ci

Format

Files Changed

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Taste Rating: 🔴 Needs Improvement

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Taste Rating: 🟡 Acceptable - Works but has unresolved design issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Taste Rating: 🟡 Acceptable

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Taste Rating: 🟡 Acceptable - Works but has unresolved issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Taste Rating: 🟡 Acceptable - Works but misses fundamentals

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Taste Rating: 🔴 Needs Improvement

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

1. `debug-github-ci`

2. `debug-jenkins-ci`