fix: robust analysis parsing and same-provider model resolution by pi3-code · Pull Request #61 · KryptSec/oasis

pi3-code · 2026-02-27T22:56:13Z

Problem

Analysis consistently fails with could not parse LLM response (truncated or malformed JSON) across multiple providers. Three related issues:

JSON extraction is fragile — the parser only strips ```json fences. When models (especially Gemini) wrap JSON in explanation text or append trailing content, JSON.parse fails on the raw string.
MAX_COMPLETION_TOKENS too low — set to 4096, which truncates complex analyses (many steps, detailed rubric evaluations with per-milestone reasoning). Truncated JSON → parse failure → zero scores.
Analyzer model resolution ignores user's provider — parseAnalysisResponse hardcodes analyzerModel: DEFAULT_ANALYZER_MODEL (Claude Sonnet), so results always display claude-sonnet-4-5-20250929 even when analysis actually ran on Gemini or Grok. Additionally, when the benchmark and analysis use the same provider, image/embedding models (e.g. grok-imagine-image, text-embedding-3-large) are returned as the analyzer model, causing 400 errors.

Changes

`src/lib/analyzer.ts`

Add extractJson() function (exported) — brace-matching scanner that handles markdown fences, explanation text wrapping, escaped quotes, nested braces, and trailing content. Always runs the scanner (no early-return bypass).
Thread analyzerModel through parseAnalysisResponse — display the actual model used instead of hardcoded default.
Anchored blocklist for non-text models — explicit NON_TEXT_MODEL_PATTERNS array with anchored regexes (/^grok-imagine-/i, /^dall-e-/i, etc.) to avoid false positives on legitimate text models like gpt-5-image-understanding.

`src/commands/run.ts`

Fix API key fallthrough — only use the benchmark provider's key as fallback for the analyzer when the providers match. Prevents passing e.g. a Google key to the Anthropic SDK when --analyzer-provider differs.

`src/lib/export.ts`

Skip export when no analysis — instead of showing export options without scores, guide user to run oasis analyze then oasis report.
Reorder export menu — "Done" becomes default choice after the first export action.

`src/lib/report.ts`

Fix dead URL — replace oasis.kryptsec.com with github.com/KryptSec/oasis in share cards and HTML reports.

`tests/unit/analyzer.test.ts`

10 direct extractJson tests: clean passthrough, leading/trailing text, markdown fences (including mid-string), nested objects, braces inside strings, unclosed JSON, multiple objects, no-JSON input
Non-text model filter tests (image, embedding, tts models) + false-positive guard (gpt-5-image-understanding is NOT filtered)
Same-provider tests verifying benchmark model is preserved

Token cost note

MAX_COMPLETION_TOKENS stays at 4096 (original value). The truncation failures were caused by models wrapping JSON in explanation text — extractJson() solves this by scanning for braces regardless of preamble length. No token increase needed.

Reproduction

# Before fix — fails consistently
oasis run -c jwt-forgery -m models/gemini-3-pro-preview -p google --analyze
# → "could not parse LLM response (truncated or malformed JSON)"

# After fix — works
oasis run -c jwt-forgery -m models/gemini-3-pro-preview -p google --analyze
# → Analysis complete, KSM 76.6, Model: models/gemini-3-pro-preview

Test plan

npm run build — clean compilation
npx vitest run — 411/411 tests pass (11 new)
Manual: benchmark + analysis with Google Gemini — analysis completes, correct model displayed
Manual: oasis analyze <id> -p google --reanalyze — consistent results across re-runs
Verify image model blocklist: oasis analyze <id> with a run that used grok-imagine-image falls back to grok-4-0709

- Skip export prompt entirely when no analysis available (API quota/rate limit), guide user to run `oasis analyze` then `oasis report` instead - Reorder export menu so "Done" is default after first action - Replace dead oasis.kryptsec.com URL with GitHub repo in reports

- Add robust JSON extraction (brace-matching) for LLM responses wrapped in explanation text or trailing content - Increase MAX_COMPLETION_TOKENS from 4096 to 8192 to prevent truncation - Use benchmark model for analysis instead of hardcoding claude-sonnet, with blocklist for non-text models (image/embed/tts) - Display actual analyzer model in results instead of hardcoded default - Prevent wrong API key passthrough when analyzer provider differs

Treelovah

The core fixes here — extractJson(), threading analyzerModel, the API key fallthrough guard, image model filter — are solid work solving real problems. Three things need addressing before merge.

1. Export extractJson and test it directly.

This is a hand-rolled brace-matching JSON scanner and the most important addition in the PR. Right now it has zero direct tests — it's only exercised indirectly through parseAnalysisResponse, which also runs Zod parsing, default filling, and score computation. Too much noise to isolate a scanner regression.

Export it. Write 6-8 targeted cases: clean JSON passthrough, leading/trailing explanation text, markdown fences, nested objects, braces inside string values, unclosed JSON, no JSON at all. It's a pure function with deterministic I/O — exactly the kind of thing that benefits from unit tests. Don't add more parseAnalysisResponse cases though — the existing ones cover the integration path fine.

2. Drop the token bump or demonstrate the need.

I checked every saved analysis result in the repo. The largest JSON payload across all runs is ~2,959 tokens. Most are under 2,000. 4096 already gives over 1,100 tokens of headroom above our worst case.

The truncation failures were caused by models wrapping JSON in explanation text, inflating the total response past 4096. extractJson() already solves that — the scanner doesn't care how much preamble the model generates, it just finds the braces. Doubling MAX_COMPLETION_TOKENS to 8192 is solving a problem the parser fix already addressed, and it doubles worst-case analysis cost on a tool where users pay their own API bills.

Drop the change. If you disagree, post a result where the extracted JSON alone exceeds 4096 tokens — as of now this looks like an edge case that doesn't exist in practice.

3. Split unrelated changes into a separate PR.

The export.ts UX rework (skip export when no analysis, reorder Done choice) and the report.ts link updates are good changes but have nothing to do with analysis parsing or model resolution. Let's make these PRs separately so we can really dive into the big one without the need for cherry picking.

Treelovah

Review — Request Changes

The fixes here are real and well-understood — the extractJson() scanner, the model resolution threading, and the API key fallthrough are all clear improvements. But this needs a few things before it's merge-ready.

1. `extractJson()` needs direct unit tests

This is the most important function in the PR and it's only tested indirectly through parseAnalysisResponse. For a parser, that's not sufficient. We need edge case coverage:

Trailing explanation text after valid JSON
Nested braces inside string values ({"key": "value with { braces }"})
Unclosed/truncated JSON
Bare JSON with no markdown fences
Multiple JSON objects in one response (should extract the first)
Fences not at start of string (e.g. "Here's the result:\n```json\n{...}\n```")

2. Non-text model regex is too loose

Preferred: Explicit blocklist array with anchored patterns (e.g. /^grok-imagine-/, /^dall-e-/, /^tts-/, /^whisper-/, /[-.]embed(ding)?[-.]/)
Alternative: Check model capabilities rather than pattern-matching names

At minimum, anchor the patterns more tightly so legitimate text models aren't caught.

3. Split unrelated changes into a separate PR

The export.ts UX improvements (early return + choice reordering) and the report.ts link updates (oasis.kryptsec.com → github.com/KryptSec/oasis) are unrelated to analysis parsing. One concern per PR — we enforced this on ourselves during the hardening audit, same standard applies here.

4. Acknowledge the token cost tradeoff

MAX_COMPLETION_TOKENS 4096 → 8192 doubles worst-case spend per analysis. The change itself is justified, but add a one-liner to the PR description noting the cost tradeoff so users aren't surprised.

To be clear: the engineering here is solid. These are "make it merge-ready for a public repo" asks, not fundamental objections.

…t token bump - Export extractJson and add 10 direct unit tests covering: clean passthrough, leading/trailing text, markdown fences, nested objects, braces in strings, unclosed JSON, multiple objects, no-JSON input - Replace loose /imagine|image|embed|.../ regex with anchored NON_TEXT_MODEL_PATTERNS blocklist to prevent false positives on vision models like gpt-5-image-understanding - Revert MAX_COMPLETION_TOKENS to 4096 — extractJson already strips preamble text that caused truncation, token bump is redundant

The original PR swapped oasis.kryptsec.com for the GitHub repo URL in share cards and HTML reports. That domain is live. Put it back.

pi3-code added 2 commits February 28, 2026 03:06

pi3-code requested a review from Treelovah February 27, 2026 22:57

Treelovah requested changes Mar 1, 2026

View reviewed changes

Treelovah requested changes Mar 2, 2026

View reviewed changes

pi3-code requested a review from Treelovah March 2, 2026 16:55

Revert oasis.kryptsec.com URL replacement — it's not dead

e944b85

The original PR swapped oasis.kryptsec.com for the GitHub repo URL in share cards and HTML reports. That domain is live. Put it back.

Treelovah merged commit 3241c6c into main Mar 2, 2026
3 checks passed

Treelovah deleted the fix/analyzer-provider-resolution branch March 2, 2026 18:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: robust analysis parsing and same-provider model resolution#61

fix: robust analysis parsing and same-provider model resolution#61
Treelovah merged 4 commits intomainfrom
fix/analyzer-provider-resolution

pi3-code commented Feb 27, 2026 •

edited

Loading

Uh oh!

Treelovah left a comment

Uh oh!

Treelovah left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pi3-code commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

src/lib/analyzer.ts

src/commands/run.ts

src/lib/export.ts

src/lib/report.ts

tests/unit/analyzer.test.ts

Token cost note

Reproduction

Test plan

Uh oh!

Treelovah left a comment

Choose a reason for hiding this comment

Uh oh!

Treelovah left a comment

Choose a reason for hiding this comment

Review — Request Changes

1. extractJson() needs direct unit tests

2. Non-text model regex is too loose

3. Split unrelated changes into a separate PR

4. Acknowledge the token cost tradeoff

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pi3-code commented Feb 27, 2026 •

edited

Loading

`src/lib/analyzer.ts`

`src/commands/run.ts`

`src/lib/export.ts`

`src/lib/report.ts`

`tests/unit/analyzer.test.ts`

1. `extractJson()` needs direct unit tests