Skip to content

fix: make LLM review resilient to individual question failures#550

Merged
academo merged 1 commit intomainfrom
academo/fix-llm-error-return
Mar 26, 2026
Merged

fix: make LLM review resilient to individual question failures#550
academo merged 1 commit intomainfrom
academo/fix-llm-error-return

Conversation

@academo
Copy link
Copy Markdown
Collaborator

@academo academo commented Mar 26, 2026

The LLM review was silently dropping all results when any single question failed. Two issues:

  • AskLLMAboutCode had a retry loop wrapping CallLLM, which itself already retries internally via callLLMWithRetry so a failing question would fire up to 9 API calls before giving up, then discard all previously successful answers
  • The 20s per-call timeout was too short for large plugin codebases, causing frequent context deadline exceeded errors on Anthropic

Now failed questions are skipped and processing continues, the redundant outer retry is removed, and the timeout is bumped to 90s. Also adds an OK report when LLM review completes clean.

- Increase LLM call timeout from 20s to 90s to handle large code contexts
- Remove redundant outer retry loop in AskLLMAboutCode (inner callLLMWithRetry already handles retries)
- Skip failed questions instead of aborting the entire review
- Report OK status when LLM review completes without concerns
@academo academo self-assigned this Mar 26, 2026
@academo academo moved this from 📬 Triage to 🔬 In review in Grafana Catalog Team Mar 26, 2026
@academo academo marked this pull request as ready for review March 26, 2026 10:46
@academo academo requested review from a team as code owners March 26, 2026 10:46
@academo academo requested review from Ukochka, oshirohugo, s4kh and xnyo March 26, 2026 10:46
}
lastErr = nil
break
agenticAnswers, err := agenticClient.CallLLM(c.ctx, []string{userPrompt}, absCodePath)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main change here is removing the for retries := 3; retries > 0; retries-- loop

this was retrying 3 times but CallLLM already has retrying handling internally (3 times)

Copy link
Copy Markdown
Member

@xnyo xnyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@academo academo merged commit c653140 into main Mar 26, 2026
11 checks passed
@academo academo deleted the academo/fix-llm-error-return branch March 26, 2026 11:14
@github-project-automation github-project-automation bot moved this from 🔬 In review to 🚀 Shipped in Grafana Catalog Team Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🚀 Shipped

Development

Successfully merging this pull request may close these issues.

3 participants