Conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🔴 Needs Improvement
Core Issue: These are plugins with executable code being registered as skills in the marketplace, with broken source paths. The marketplace registration fundamentally conflicts with the actual directory structure.
Verdict: ❌ Needs rework - Path resolution is broken, architecture needs clarification.
Key Insight: You cannot register plugins as skills. Pick one: either register the SKILL.md files as skills (correct paths), or register these as plugins with their automation capabilities. The current approach breaks both.
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable - Works but has unresolved design issues
Core Pattern: Previous review feedback was "resolved" with explanations rather than fixes. The code works, but fundamental concerns remain unaddressed.
Verdict: ✅ Worth merging with caveats - The automation is functional and useful, but carries technical debt from day one.
Key Insight: You built plugins with executable automation but registered them as skills in the marketplace. This identity crisis creates confusion about what these artifacts actually are and how they should be used.
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable
Core Pattern: Solves a real problem (CI debugging) with straightforward implementation. Previous feedback addressed most technical issues. Remaining concerns are operational - cost controls, rate limiting, and failure modes.
Verdict: ✅ Worth merging with operational caveats - Code quality is solid after refactoring, but production usage needs monitoring.
Key Insight: Every CI failure will trigger an LLM agent that costs money and makes API calls. No safeguards for rate limiting, cost caps, or recursive failures.
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable - Works but has unresolved issues
Core Issue: The code is reasonably well-structured after previous refactorings, but marketplace paths are still broken, there are no tests, and some failure modes remain unhandled.
Verdict: ❌ Needs fixes - Fix the marketplace paths and add basic test coverage before merging.
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable - Works but misses fundamentals
Core Pattern: Solves a real problem (CI debugging) with mostly clean code, but ships 2000+ lines with zero tests for functions that have edge cases and can break silently.
Verdict: ❌ Needs fixes - Add tests for pure functions and fix regex compilation issue before merging.
Key Insight: Previous reviews spent energy on architecture bikeshedding (paths, dual-mode, recursion) but missed the fundamental issue: you cannot verify correctness of log truncation, pattern matching, and formatting without tests. "We'll rely on integration testing" is a lazy excuse when you have pure functions with clear edge cases.
| @@ -0,0 +1,520 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
🔴 Critical - Zero Tests for 2000+ Lines: This entire PR adds ~2000 lines of code with NO tests. Previous reviews dismissed this with "ROI is low for integration scripts" but that excuse doesn't hold for pure functions:
You CAN test without mocks:
_truncate_logs(logs: str, max_size: int)- has edge cases around boundaries_find_error_context(logs: str, max_size: int)- pattern matching with various error locationsformat_failed_jobs(jobs: list[dict])- formatting logicformat_duration()/format_timestamp()in Jenkins script - time formatting edge cases- Input validation in
prompt.py- regex patterns that could break
These functions can silently break in production (wrong truncation, missed errors, incorrect timestamps). Add unit tests for at minimum the pure functions before shipping this.
| total_lines = len(lines) | ||
| # Estimate avg line length for context calculation | ||
| avg_line_len = len(logs) // max(total_lines, 1) | ||
| context_lines = max_size // max(avg_line_len, 50) |
There was a problem hiding this comment.
🟠 Important - Regex Compilation in Hot Loop: You're calling re.search(pattern, line) for every line and every pattern, which recompiles the regex each time. For large logs (which you're specifically handling), this is wasteful.
Better approach:
def _find_error_context(logs: str, max_size: int) -> str | None:
lines = logs.split("\n")
error_patterns = _load_error_patterns()
# Compile patterns once
compiled_patterns = [re.compile(p) for p in error_patterns]
error_indices = []
for i, line in enumerate(lines):
for pattern in compiled_patterns:
if pattern.search(line):
error_indices.append(i)
breakSmall fix, but matters when processing logs with thousands of lines.
| if error_patterns is None: | ||
| error_patterns = _load_error_patterns() | ||
|
|
||
| lines = logs.split("\n") |
There was a problem hiding this comment.
🟠 Important - Same Regex Issue: Identical problem here - patterns recompiled in hot loop. Apply the same fix as the GitHub CI script.
| @@ -0,0 +1 @@ | |||
| ../../skills/debug-github-ci No newline at end of file | |||
There was a problem hiding this comment.
🟡 Suggestion - Architectural Confusion: This symlink creates a circular reference: plugins point to skills which... why does the plugin need to reference the skill at runtime?
Previous reviews hand-waved this as "pattern used by other plugins" but that doesn't explain the WHY. If the plugin and skill are separate modes (as documented), why the circular dependency?
Either document the actual reason this is needed, or remove it if it's cargo-culting.
Squashed commit from add-ci-debug-skills branch with conflict resolution. Includes all changes from the original PR with main branch merged. Co-authored-by: openhands <openhands@all-hands.dev>
256241f to
b617dbd
Compare
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🔴 Needs Improvement
Core Issue: Three unresolved critical issues remain from previous reviews: regex compilation in hot loops (performance), zero tests for 2000+ lines (correctness), and symlink architecture confusion (maintainability).
Verdict: ❌ Needs rework before merge - Fix regex compilation and add tests for pure functions.
Key Insight: You solved the hard problems (recursion guards, configurable patterns, input validation) but left trivial performance bugs and skipped testing entirely. The regex issue has a 3-line fix; the testing objection ("ROI is low") doesn't hold for pure functions with no API dependencies.
| total_lines = len(lines) | ||
| # Estimate avg line length for context calculation | ||
| avg_line_len = len(logs) // max(total_lines, 1) | ||
| context_lines = max_size // max(avg_line_len, 50) |
There was a problem hiding this comment.
🔴 Critical - Regex Recompilation in Hot Loop: re.search(pattern, line) recompiles the regex for every line in potentially massive logs. For a 50k line log with 9 patterns, that's 450k regex compilations.
Fix (3 lines):
# At module level or in _load_error_patterns():
_COMPILED_PATTERNS = None
def _get_compiled_patterns():
global _COMPILED_PATTERNS
if _COMPILED_PATTERNS is None:
_COMPILED_PATTERNS = [re.compile(p) for p in _load_error_patterns()]
return _COMPILED_PATTERNS
# Then in _find_error_context:
for pattern in _get_compiled_patterns():
if pattern.search(line): # No recompilationPrevious review flagged this as unresolved. It's still unfixed.
| if error_patterns is None: | ||
| error_patterns = _load_error_patterns() | ||
|
|
||
| lines = logs.split("\n") |
There was a problem hiding this comment.
🔴 Critical - Same Regex Issue: Identical problem here. Apply the same fix as GitHub CI script.
| #!/usr/bin/env python3 | ||
| """ | ||
| GitHub CI Debug Agent | ||
|
|
||
| This script runs an OpenHands agent to debug GitHub Actions CI failures. | ||
| The agent fetches failed workflow logs, analyzes errors, and provides | ||
| actionable fixes. | ||
|
|
||
| Environment Variables: | ||
| LLM_API_KEY: API key for the LLM (required) |
There was a problem hiding this comment.
🔴 Critical - Zero Tests for Testable Code: This PR adds ~2000 lines with NO tests, despite having many pure functions that are trivial to test:
Easily testable without mocks:
_truncate_logs(logs, max_size)- deterministic string manipulation_find_error_context(logs, max_size)- deterministic pattern matchingformat_prompt(**kwargs)- you added validation in prompt.py but never tested itformat_failed_jobs(jobs)- pure function formatting a list
Example test (5 lines):
def test_truncate_logs_preserves_size():
logs = "x" * 100000
result = _truncate_logs(logs, 1000)
assert len(result) <= 1200 # Account for truncation markersThe "ROI is low" excuse doesn't apply to pure functions with zero I/O dependencies.
| #!/usr/bin/env python3 | ||
| """ | ||
| Jenkins CI Debug Agent | ||
|
|
||
| This script runs an OpenHands agent to debug Jenkins CI/CD pipeline failures. | ||
| The agent fetches failed build logs, analyzes errors, and provides actionable fixes. | ||
|
|
||
| Environment Variables: | ||
| LLM_API_KEY: API key for the LLM (required) | ||
| LLM_MODEL: Language model to use (default: anthropic/claude-sonnet-4-5-20250929) |
There was a problem hiding this comment.
🔴 Critical - Zero Tests: Same issue. Add tests for _truncate_logs, _find_error_context, format_prompt, format_duration, format_timestamp - all pure functions.
| @@ -0,0 +1 @@ | |||
| ../../skills/debug-github-ci No newline at end of file | |||
There was a problem hiding this comment.
🟡 Suggestion - Symlink Confusion: This symlink creates plugins/debug-github-ci/skills/debug-github-ci -> ../../skills/debug-github-ci, pointing from plugin back to skill.
Previous review called this "architectural confusion" and it remains unresolved. If the agent loads skills via the marketplace (which points to skills/), why does the plugin need this reference? Document the use case or eliminate the circular dependency.
| }, | ||
| { | ||
| "name": "debug-github-ci", | ||
| "source": "./debug-github-ci", |
There was a problem hiding this comment.
🟡 Suggestion - Path Clarity: A resolved thread claimed this was changed to "./skills/debug-github-ci" but the diff shows "./debug-github-ci".
If pluginRoot: "./skills" makes this resolve correctly, that context is invisible in the diff and fragile (relies on global config). Consider explicit paths for clarity.
| - name: Install GitHub CLI | ||
| shell: bash | ||
| run: | | ||
| sudo apt-get update |
There was a problem hiding this comment.
🟢 Nit - Platform Assumption: sudo apt-get install -y gh assumes Debian/Ubuntu. The action already specifies runs-on: ubuntu-24.04 in the workflow template, so this is fine, but could break if someone uses the action on a different runner.
| def get_console_output( | ||
| jenkins_url: str, user: str, token: str, job_name: str, build_number: str | ||
| ) -> str | None: |
There was a problem hiding this comment.
🟡 Suggestion - Context-Aware Truncation: Nice improvement over naive truncation! The 20%/80% split around first error is smart for cascading failures.
Minor: Consider making the split configurable via env var if different CI systems have different error patterns (e.g., Java stack traces at end vs build errors at start).
Summary
This PR adds two new skills for debugging CI/CD pipeline failures:
1.
debug-github-ciDebug GitHub Actions workflow failures with systematic guidance for:
gh run listgh run view --log-failedTriggers:
/debug-github-cigithub ci failedgithub actions failedworkflow failed2.
debug-jenkins-ciDebug Jenkins pipeline failures with guidance for:
wfapiTriggers:
/debug-jenkins-cijenkins ci failedjenkins build failedjenkins pipeline failedFormat
Both skills follow the same format as
github-pr-review:Files Changed
skills/debug-github-ci/SKILL.md- Main skill definitionskills/debug-github-ci/README.md- Human-readable documentationskills/debug-jenkins-ci/SKILL.md- Main skill definitionskills/debug-jenkins-ci/README.md- Human-readable documentationmarketplaces/default.json- Register both skills in marketplace