Skip to content

Add debug-github-ci and debug-jenkins-ci skills#76

Open
neubig wants to merge 1 commit intomainfrom
add-ci-debug-skills
Open

Add debug-github-ci and debug-jenkins-ci skills#76
neubig wants to merge 1 commit intomainfrom
add-ci-debug-skills

Conversation

@neubig
Copy link
Contributor

@neubig neubig commented Feb 26, 2026

Summary

This PR adds two new skills for debugging CI/CD pipeline failures:

1. debug-github-ci

Debug GitHub Actions workflow failures with systematic guidance for:

  • Finding failed runs using gh run list
  • Fetching logs with gh run view --log-failed
  • Common error patterns and their fixes
  • Rerunning failed jobs

Triggers:

  • /debug-github-ci
  • github ci failed
  • github actions failed
  • workflow failed

2. debug-jenkins-ci

Debug Jenkins pipeline failures with guidance for:

  • Fetching build status and console output via Jenkins REST API
  • Pipeline stage debugging with wfapi
  • Common Jenkins error patterns
  • Jenkinsfile validation and debugging

Triggers:

  • /debug-jenkins-ci
  • jenkins ci failed
  • jenkins build failed
  • jenkins pipeline failed

Format

Both skills follow the same format as github-pr-review:

  • YAML frontmatter with name, description, and triggers
  • Step-by-step debugging workflows
  • Common error pattern tables
  • API examples with curl fallbacks
  • Debugging checklists

Files Changed

  • skills/debug-github-ci/SKILL.md - Main skill definition
  • skills/debug-github-ci/README.md - Human-readable documentation
  • skills/debug-jenkins-ci/SKILL.md - Main skill definition
  • skills/debug-jenkins-ci/README.md - Human-readable documentation
  • marketplaces/default.json - Register both skills in marketplace

@neubig neubig marked this pull request as ready for review March 1, 2026 13:17
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🔴 Needs Improvement

Core Issue: These are plugins with executable code being registered as skills in the marketplace, with broken source paths. The marketplace registration fundamentally conflicts with the actual directory structure.

Verdict: ❌ Needs rework - Path resolution is broken, architecture needs clarification.

Key Insight: You cannot register plugins as skills. Pick one: either register the SKILL.md files as skills (correct paths), or register these as plugins with their automation capabilities. The current approach breaks both.

@neubig neubig marked this pull request as draft March 1, 2026 13:28
@neubig neubig marked this pull request as ready for review March 2, 2026 03:40
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Works but has unresolved design issues

Core Pattern: Previous review feedback was "resolved" with explanations rather than fixes. The code works, but fundamental concerns remain unaddressed.

Verdict: ✅ Worth merging with caveats - The automation is functional and useful, but carries technical debt from day one.

Key Insight: You built plugins with executable automation but registered them as skills in the marketplace. This identity crisis creates confusion about what these artifacts actually are and how they should be used.

@neubig neubig marked this pull request as draft March 2, 2026 12:21
@neubig neubig marked this pull request as ready for review March 2, 2026 12:36
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable

Core Pattern: Solves a real problem (CI debugging) with straightforward implementation. Previous feedback addressed most technical issues. Remaining concerns are operational - cost controls, rate limiting, and failure modes.

Verdict: ✅ Worth merging with operational caveats - Code quality is solid after refactoring, but production usage needs monitoring.

Key Insight: Every CI failure will trigger an LLM agent that costs money and makes API calls. No safeguards for rate limiting, cost caps, or recursive failures.

@neubig neubig marked this pull request as draft March 2, 2026 12:46
@neubig neubig marked this pull request as ready for review March 2, 2026 13:00
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Works but has unresolved issues

Core Issue: The code is reasonably well-structured after previous refactorings, but marketplace paths are still broken, there are no tests, and some failure modes remain unhandled.

Verdict: ❌ Needs fixes - Fix the marketplace paths and add basic test coverage before merging.

@neubig neubig marked this pull request as draft March 2, 2026 13:09
@neubig neubig marked this pull request as ready for review March 2, 2026 13:16
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Works but misses fundamentals

Core Pattern: Solves a real problem (CI debugging) with mostly clean code, but ships 2000+ lines with zero tests for functions that have edge cases and can break silently.

Verdict: ❌ Needs fixes - Add tests for pure functions and fix regex compilation issue before merging.

Key Insight: Previous reviews spent energy on architecture bikeshedding (paths, dual-mode, recursion) but missed the fundamental issue: you cannot verify correctness of log truncation, pattern matching, and formatting without tests. "We'll rely on integration testing" is a lazy excuse when you have pure functions with clear edge cases.

@@ -0,0 +1,520 @@
#!/usr/bin/env python3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical - Zero Tests for 2000+ Lines: This entire PR adds ~2000 lines of code with NO tests. Previous reviews dismissed this with "ROI is low for integration scripts" but that excuse doesn't hold for pure functions:

You CAN test without mocks:

  • _truncate_logs(logs: str, max_size: int) - has edge cases around boundaries
  • _find_error_context(logs: str, max_size: int) - pattern matching with various error locations
  • format_failed_jobs(jobs: list[dict]) - formatting logic
  • format_duration() / format_timestamp() in Jenkins script - time formatting edge cases
  • Input validation in prompt.py - regex patterns that could break

These functions can silently break in production (wrong truncation, missed errors, incorrect timestamps). Add unit tests for at minimum the pure functions before shipping this.

total_lines = len(lines)
# Estimate avg line length for context calculation
avg_line_len = len(logs) // max(total_lines, 1)
context_lines = max_size // max(avg_line_len, 50)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important - Regex Compilation in Hot Loop: You're calling re.search(pattern, line) for every line and every pattern, which recompiles the regex each time. For large logs (which you're specifically handling), this is wasteful.

Better approach:

def _find_error_context(logs: str, max_size: int) -> str | None:
    lines = logs.split("\n")
    error_patterns = _load_error_patterns()
    
    # Compile patterns once
    compiled_patterns = [re.compile(p) for p in error_patterns]
    
    error_indices = []
    for i, line in enumerate(lines):
        for pattern in compiled_patterns:
            if pattern.search(line):
                error_indices.append(i)
                break

Small fix, but matters when processing logs with thousands of lines.

if error_patterns is None:
error_patterns = _load_error_patterns()

lines = logs.split("\n")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important - Same Regex Issue: Identical problem here - patterns recompiled in hot loop. Apply the same fix as the GitHub CI script.

@@ -0,0 +1 @@
../../skills/debug-github-ci No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Architectural Confusion: This symlink creates a circular reference: plugins point to skills which... why does the plugin need to reference the skill at runtime?

Previous reviews hand-waved this as "pattern used by other plugins" but that doesn't explain the WHY. If the plugin and skill are separate modes (as documented), why the circular dependency?

Either document the actual reason this is needed, or remove it if it's cargo-culting.

@neubig neubig marked this pull request as draft March 3, 2026 13:50
Squashed commit from add-ci-debug-skills branch with conflict resolution.
Includes all changes from the original PR with main branch merged.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig neubig force-pushed the add-ci-debug-skills branch from 256241f to b617dbd Compare March 3, 2026 13:52
@neubig neubig marked this pull request as ready for review March 3, 2026 13:53
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🔴 Needs Improvement

Core Issue: Three unresolved critical issues remain from previous reviews: regex compilation in hot loops (performance), zero tests for 2000+ lines (correctness), and symlink architecture confusion (maintainability).

Verdict: ❌ Needs rework before merge - Fix regex compilation and add tests for pure functions.

Key Insight: You solved the hard problems (recursion guards, configurable patterns, input validation) but left trivial performance bugs and skipped testing entirely. The regex issue has a 3-line fix; the testing objection ("ROI is low") doesn't hold for pure functions with no API dependencies.

total_lines = len(lines)
# Estimate avg line length for context calculation
avg_line_len = len(logs) // max(total_lines, 1)
context_lines = max_size // max(avg_line_len, 50)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical - Regex Recompilation in Hot Loop: re.search(pattern, line) recompiles the regex for every line in potentially massive logs. For a 50k line log with 9 patterns, that's 450k regex compilations.

Fix (3 lines):

# At module level or in _load_error_patterns():
_COMPILED_PATTERNS = None

def _get_compiled_patterns():
    global _COMPILED_PATTERNS
    if _COMPILED_PATTERNS is None:
        _COMPILED_PATTERNS = [re.compile(p) for p in _load_error_patterns()]
    return _COMPILED_PATTERNS

# Then in _find_error_context:
for pattern in _get_compiled_patterns():
    if pattern.search(line):  # No recompilation

Previous review flagged this as unresolved. It's still unfixed.

if error_patterns is None:
error_patterns = _load_error_patterns()

lines = logs.split("\n")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical - Same Regex Issue: Identical problem here. Apply the same fix as GitHub CI script.

Comment on lines +1 to +10
#!/usr/bin/env python3
"""
GitHub CI Debug Agent

This script runs an OpenHands agent to debug GitHub Actions CI failures.
The agent fetches failed workflow logs, analyzes errors, and provides
actionable fixes.

Environment Variables:
LLM_API_KEY: API key for the LLM (required)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical - Zero Tests for Testable Code: This PR adds ~2000 lines with NO tests, despite having many pure functions that are trivial to test:

Easily testable without mocks:

  • _truncate_logs(logs, max_size) - deterministic string manipulation
  • _find_error_context(logs, max_size) - deterministic pattern matching
  • format_prompt(**kwargs) - you added validation in prompt.py but never tested it
  • format_failed_jobs(jobs) - pure function formatting a list

Example test (5 lines):

def test_truncate_logs_preserves_size():
    logs = "x" * 100000
    result = _truncate_logs(logs, 1000)
    assert len(result) <= 1200  # Account for truncation markers

The "ROI is low" excuse doesn't apply to pure functions with zero I/O dependencies.

Comment on lines +1 to +10
#!/usr/bin/env python3
"""
Jenkins CI Debug Agent

This script runs an OpenHands agent to debug Jenkins CI/CD pipeline failures.
The agent fetches failed build logs, analyzes errors, and provides actionable fixes.

Environment Variables:
LLM_API_KEY: API key for the LLM (required)
LLM_MODEL: Language model to use (default: anthropic/claude-sonnet-4-5-20250929)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical - Zero Tests: Same issue. Add tests for _truncate_logs, _find_error_context, format_prompt, format_duration, format_timestamp - all pure functions.

@@ -0,0 +1 @@
../../skills/debug-github-ci No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Symlink Confusion: This symlink creates plugins/debug-github-ci/skills/debug-github-ci -> ../../skills/debug-github-ci, pointing from plugin back to skill.

Previous review called this "architectural confusion" and it remains unresolved. If the agent loads skills via the marketplace (which points to skills/), why does the plugin need this reference? Document the use case or eliminate the circular dependency.

},
{
"name": "debug-github-ci",
"source": "./debug-github-ci",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Path Clarity: A resolved thread claimed this was changed to "./skills/debug-github-ci" but the diff shows "./debug-github-ci".

If pluginRoot: "./skills" makes this resolve correctly, that context is invisible in the diff and fragile (relies on global config). Consider explicit paths for clarity.

- name: Install GitHub CLI
shell: bash
run: |
sudo apt-get update

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit - Platform Assumption: sudo apt-get install -y gh assumes Debian/Ubuntu. The action already specifies runs-on: ubuntu-24.04 in the workflow template, so this is fine, but could break if someone uses the action on a different runner.

Comment on lines +256 to +258
def get_console_output(
jenkins_url: str, user: str, token: str, job_name: str, build_number: str
) -> str | None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Context-Aware Truncation: Nice improvement over naive truncation! The 20%/80% split around first error is smart for cascading failures.

Minor: Consider making the split configurable via env var if different CI systems have different error patterns (e.g., Java stack traces at end vs build errors at start).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants