fix: robust analysis parsing and same-provider model resolution#61
fix: robust analysis parsing and same-provider model resolution#61
Conversation
- Skip export prompt entirely when no analysis available (API quota/rate limit), guide user to run `oasis analyze` then `oasis report` instead - Reorder export menu so "Done" is default after first action - Replace dead oasis.kryptsec.com URL with GitHub repo in reports
- Add robust JSON extraction (brace-matching) for LLM responses wrapped in explanation text or trailing content - Increase MAX_COMPLETION_TOKENS from 4096 to 8192 to prevent truncation - Use benchmark model for analysis instead of hardcoding claude-sonnet, with blocklist for non-text models (image/embed/tts) - Display actual analyzer model in results instead of hardcoded default - Prevent wrong API key passthrough when analyzer provider differs
Treelovah
left a comment
There was a problem hiding this comment.
The core fixes here — extractJson(), threading analyzerModel, the API key fallthrough guard, image model filter — are solid work solving real problems. Three things need addressing before merge.
1. Export extractJson and test it directly.
This is a hand-rolled brace-matching JSON scanner and the most important addition in the PR. Right now it has zero direct tests — it's only exercised indirectly through parseAnalysisResponse, which also runs Zod parsing, default filling, and score computation. Too much noise to isolate a scanner regression.
Export it. Write 6-8 targeted cases: clean JSON passthrough, leading/trailing explanation text, markdown fences, nested objects, braces inside string values, unclosed JSON, no JSON at all. It's a pure function with deterministic I/O — exactly the kind of thing that benefits from unit tests. Don't add more parseAnalysisResponse cases though — the existing ones cover the integration path fine.
2. Drop the token bump or demonstrate the need.
I checked every saved analysis result in the repo. The largest JSON payload across all runs is ~2,959 tokens. Most are under 2,000. 4096 already gives over 1,100 tokens of headroom above our worst case.
The truncation failures were caused by models wrapping JSON in explanation text, inflating the total response past 4096. extractJson() already solves that — the scanner doesn't care how much preamble the model generates, it just finds the braces. Doubling MAX_COMPLETION_TOKENS to 8192 is solving a problem the parser fix already addressed, and it doubles worst-case analysis cost on a tool where users pay their own API bills.
Drop the change. If you disagree, post a result where the extracted JSON alone exceeds 4096 tokens — as of now this looks like an edge case that doesn't exist in practice.
3. Split unrelated changes into a separate PR.
The export.ts UX rework (skip export when no analysis, reorder Done choice) and the report.ts link updates are good changes but have nothing to do with analysis parsing or model resolution. Let's make these PRs separately so we can really dive into the big one without the need for cherry picking.
Treelovah
left a comment
There was a problem hiding this comment.
Review — Request Changes
The fixes here are real and well-understood — the extractJson() scanner, the model resolution threading, and the API key fallthrough are all clear improvements. But this needs a few things before it's merge-ready.
1. extractJson() needs direct unit tests
This is the most important function in the PR and it's only tested indirectly through parseAnalysisResponse. For a parser, that's not sufficient. We need edge case coverage:
- Trailing explanation text after valid JSON
- Nested braces inside string values (
{"key": "value with { braces }"}) - Unclosed/truncated JSON
- Bare JSON with no markdown fences
- Multiple JSON objects in one response (should extract the first)
- Fences not at start of string (e.g.
"Here's the result:\n```json\n{...}\n```")
2. Non-text model regex is too loose
/imagine|image|embed|tts|whisper|dall-e/i matching on substring will false-positive the moment someone benchmarks a vision model with "image" in the name (e.g. gpt-5-image-understanding). Two options:
- Preferred: Explicit blocklist array with anchored patterns (e.g.
/^grok-imagine-/,/^dall-e-/,/^tts-/,/^whisper-/,/[-.]embed(ding)?[-.]/) - Alternative: Check model capabilities rather than pattern-matching names
At minimum, anchor the patterns more tightly so legitimate text models aren't caught.
3. Split unrelated changes into a separate PR
The export.ts UX improvements (early return + choice reordering) and the report.ts link updates (oasis.kryptsec.com → github.com/KryptSec/oasis) are unrelated to analysis parsing. One concern per PR — we enforced this on ourselves during the hardening audit, same standard applies here.
4. Acknowledge the token cost tradeoff
MAX_COMPLETION_TOKENS 4096 → 8192 doubles worst-case spend per analysis. The change itself is justified, but add a one-liner to the PR description noting the cost tradeoff so users aren't surprised.
To be clear: the engineering here is solid. These are "make it merge-ready for a public repo" asks, not fundamental objections.
…t token bump - Export extractJson and add 10 direct unit tests covering: clean passthrough, leading/trailing text, markdown fences, nested objects, braces in strings, unclosed JSON, multiple objects, no-JSON input - Replace loose /imagine|image|embed|.../ regex with anchored NON_TEXT_MODEL_PATTERNS blocklist to prevent false positives on vision models like gpt-5-image-understanding - Revert MAX_COMPLETION_TOKENS to 4096 — extractJson already strips preamble text that caused truncation, token bump is redundant
The original PR swapped oasis.kryptsec.com for the GitHub repo URL in share cards and HTML reports. That domain is live. Put it back.
Problem
Analysis consistently fails with
could not parse LLM response (truncated or malformed JSON)across multiple providers. Three related issues:JSON extraction is fragile — the parser only strips
```jsonfences. When models (especially Gemini) wrap JSON in explanation text or append trailing content,JSON.parsefails on the raw string.MAX_COMPLETION_TOKENStoo low — set to 4096, which truncates complex analyses (many steps, detailed rubric evaluations with per-milestone reasoning). Truncated JSON → parse failure → zero scores.Analyzer model resolution ignores user's provider —
parseAnalysisResponsehardcodesanalyzerModel: DEFAULT_ANALYZER_MODEL(Claude Sonnet), so results always displayclaude-sonnet-4-5-20250929even when analysis actually ran on Gemini or Grok. Additionally, when the benchmark and analysis use the same provider, image/embedding models (e.g.grok-imagine-image,text-embedding-3-large) are returned as the analyzer model, causing 400 errors.Changes
src/lib/analyzer.tsextractJson()function (exported) — brace-matching scanner that handles markdown fences, explanation text wrapping, escaped quotes, nested braces, and trailing content. Always runs the scanner (no early-return bypass).analyzerModelthroughparseAnalysisResponse— display the actual model used instead of hardcoded default.NON_TEXT_MODEL_PATTERNSarray with anchored regexes (/^grok-imagine-/i,/^dall-e-/i, etc.) to avoid false positives on legitimate text models likegpt-5-image-understanding.src/commands/run.ts--analyzer-providerdiffers.src/lib/export.tsoasis analyzethenoasis report.src/lib/report.tsoasis.kryptsec.comwithgithub.com/KryptSec/oasisin share cards and HTML reports.tests/unit/analyzer.test.tsextractJsontests: clean passthrough, leading/trailing text, markdown fences (including mid-string), nested objects, braces inside strings, unclosed JSON, multiple objects, no-JSON inputgpt-5-image-understandingis NOT filtered)Token cost note
MAX_COMPLETION_TOKENSstays at 4096 (original value). The truncation failures were caused by models wrapping JSON in explanation text —extractJson()solves this by scanning for braces regardless of preamble length. No token increase needed.Reproduction
Test plan
npm run build— clean compilationnpx vitest run— 411/411 tests pass (11 new)oasis analyze <id> -p google --reanalyze— consistent results across re-runsoasis analyze <id>with a run that usedgrok-imagine-imagefalls back togrok-4-0709