claude-code-chat-browser: schema drift detection for upstream JSONL c… by clean6378-max-it · Pull Request #108 · cppalliance/claude-code-chat-browser

clean6378-max-it · 2026-07-02T15:58:24Z

Closes #103

Summary

Detect upstream Claude Code JSONL schema drift by fingerprinting field paths during parse_session() and diffing against a committed schema_baseline.json. New or missing required paths emit warnings on the claude_code_chat_browser.schema_drift logger; the web UI shows a dismissible amber banner on the session list page; GET /api/schema-report returns {known_fields, new_fields, missing_fields, has_drift}.

Closes sprint item 5 (July Week 1 Thursday, 5 pt).

Changes

utils/schema_drift.py — baseline loader, recursive path collection, drift diff, process-wide report
utils/jsonl_parser.py — _collect_field_paths() wrapper; drift check at parse completion
schema_baseline.json — 98 known (json_path, expected_type) pairs; only type is required
api/schema_report.py — GET /api/schema-report
static/js/sessions.js + static/css/style.css — dismissible amber warning banner (sessionStorage fingerprint)
tests/fixtures/jsonl/unknown_field.jsonl — synthetic FutureToolXYZ / unknown tool key fixture
tests/test_schema_drift.py — 10 tests covering warnings, API, false-positive guard, merge behavior

Out of scope

Auto-updating baseline on every parse (updates are deliberate)
Blocking parse on drift
CLI command (API endpoint chosen instead)

Test plan

pytest tests/ -k schema -q
pytest -q (443 passed)
mypy -p api -p utils -p models
ruff check .
Manual: parse tests/fixtures/jsonl/unknown_field.jsonl — confirm warning logged and amber banner in UI

Summary by CodeRabbit

New Features
- Added a schema drift warning banner in the workspace to surface newly detected or missing session fields (up to 5 new and 5 missing required items).
- Added a schema drift report endpoint so the UI can display current drift status.
Bug Fixes
- Banner dismissals are remembered per drift state, preventing repeated pop-ups.
- If the schema baseline is invalid, drift tracking is skipped without breaking session handling.
Style
- Added warning-themed banner styling in both light and dark modes, including a dismiss button.

…hanges (#5) Fingerprint known Claude Code JSONL field paths against a committed schema_baseline.json, warn on drift during parsing, expose GET /api/schema-report, and surface a dismissible amber banner on the session list page. Warnings only - parsing is never blocked.

coderabbitai · 2026-07-02T15:58:47Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ee9dded3-eb93-44e7-8e9f-4f0bca3b9425

📥 Commits

Reviewing files that changed from the base of the PR and between 8a7f170 and 252429c.

📒 Files selected for processing (5)

benchmarks/baselines.json
static/js/sessions.js
static/js/sessions.test.js
tests/test_schema_drift.py
utils/schema_drift.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/test_schema_drift.py
static/js/sessions.js

📝 Walkthrough

Walkthrough

Adds JSONL schema drift tracking from parser to API and UI, backed by a committed baseline, updated tests, and refreshed benchmark baselines.

Changes

Schema Drift Detection

Layer / File(s)	Summary
Baseline field catalog `schema_baseline.json`	Defines the committed field-path catalog, including required flags and typed entries for message, tool, and metadata fields.
Schema drift detection module `utils/schema_drift.py`	Collects dotted field paths, loads the baseline, computes known/new/missing sets, accumulates reports, and supports reset/cache clearing.
Parser and API wiring `utils/jsonl_parser.py`, `api/schema_report.py`, `app.py`	Records observed field paths during session parsing, exposes `/api/schema-report`, and registers the new blueprint in the app.
UI drift banner `static/js/sessions.js`, `static/css/style.css`	Fetches the schema report, renders a dismissible banner when drift exists, persists dismissal state, and adds warning styles.
Tests and benchmarks `tests/test_schema_drift.py`, `static/js/sessions.test.js`, `tests/fixtures/jsonl/unknown_field.jsonl`, `benchmarks/baselines.json`	Adds fixture coverage, schema drift tests, UI call-order/banner tests, and updated benchmark baselines.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Possibly related PRs

cppalliance/claude-code-chat-browser#6: Both PRs modify session parsing in utils/jsonl_parser.py and the same parse_session flow.
cppalliance/claude-code-chat-browser#38: Both PRs touch utils/jsonl_parser.py, including parsing-related logic around parse_session.

Suggested reviewers: timon0305, wpak-ai

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title is specific and matches the main change: schema drift detection for Claude Code JSONL parsing.
Linked Issues check	✅ Passed	The PR meets the linked issue goals: field-path fingerprinting, baseline comparison, warnings, banner, API report, fixture, and non-blocking parsing.
Out of Scope Changes check	✅ Passed	The changes appear scoped to schema-drift detection and related tests, UI, baseline, and benchmark updates; no unrelated code changes stand out.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/schema-drift-detection

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

utils/schema_drift.py (2)
90-121: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Baseline JSON is re-read and re-parsed from disk on every parse_session() call.

load_baseline_fields() does file I/O + json.loads and is invoked unconditionally by diff_against_baseline(), which per the parser integration snippet runs once per session file. On a session-list page rendering many sessions, this repeats disk I/O + parsing for a file that never changes at runtime. Consider loading/parsing the baseline once (module import or lazily-cached, e.g. via functools.lru_cache) instead of on every parse.
♻️ Example: cache the parsed baseline
+from functools import lru_cache
+
+
+@lru_cache(maxsize=1)
+def load_baseline_fields() -> dict[str, SchemaFieldSpec]:
     """Load ``schema_baseline.json`` field specs keyed by dotted path."""
     raw = json.loads(BASELINE_PATH.read_text(encoding="utf-8"))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@utils/schema_drift.py` around lines 90 - 121, The baseline JSON is being
re-read and re-parsed on every call through diff_against_baseline() and
load_baseline_fields(), which adds repeated disk I/O for data that does not
change at runtime. Cache the parsed baseline once instead of loading it per
session, either by memoizing load_baseline_fields() with a lazy cache such as
functools.lru_cache or by initializing the baseline at module import, and keep
diff_against_baseline() using the cached result.
71-87: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

collect_field_paths_with_types and baseline expected_type are unused in the drift diff.

diff_against_baseline only compares field paths, and nothing in this module reads observed types. If type-drift checks aren’t coming next, remove the helper and expected_type plumbing for now to keep this focused.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@utils/schema_drift.py` around lines 71 - 87, The observed type-tracking code
in collect_field_paths_with_types is not used by diff_against_baseline, so the
drift logic should be simplified to only compare field paths. Remove the unused
type-collection helper and any expected_type plumbing from the baseline handling
unless you are adding type-drift checks in this same change, and keep the
remaining symbols centered around collect_field_paths, baseline, and
diff_against_baseline.
static/js/sessions.js (1)
85-88: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Parallelize independent fetches to reduce latency.

fetchSchemaDriftBannerHtml() and the sessions fetch are independent; awaiting them sequentially adds the schema-report round trip to the workspace load time.
⚡ Suggested refactor
-        const schemaBannerHtml = await fetchSchemaDriftBannerHtml();
-
-        const res = await fetch(`/api/projects/${encodeURIComponent(projectName)}/sessions`);
-        state.cachedSessions = await res.json();
+        const [schemaBannerHtml, res] = await Promise.all([
+            fetchSchemaDriftBannerHtml(),
+            fetch(`/api/projects/${encodeURIComponent(projectName)}/sessions`),
+        ]);
+        state.cachedSessions = await res.json();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@static/js/sessions.js` around lines 85 - 88, The workspace load is waiting on
two independent requests in sequence, which adds unnecessary latency. Update the
sessions-loading flow in sessions.js so fetchSchemaDriftBannerHtml() and the
/api/projects/${encodeURIComponent(projectName)}/sessions fetch start in
parallel, then await both results together before using schemaBannerHtml and
state.cachedSessions. Keep the change localized to the sessions-loading logic
and preserve the existing assignment behavior once both promises resolve.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@utils/jsonl_parser.py`:
- Around line 236-239: `parse_session` currently calls
`record_parse_drift(observed_field_paths)` unguarded, so drift tracking can
abort parsing if `diff_against_baseline()` or `load_baseline_fields()` fails on
a missing or malformed `schema_baseline.json`. Update `record_parse_drift` (or
its call site in `parse_session`) to catch and suppress non-fatal drift-tracking
errors, while still allowing `validate_session_dict(...)` and the rest of
parsing to proceed normally; keep the fix localized around `record_parse_drift`,
`diff_against_baseline`, and `load_baseline_fields`.

---

Nitpick comments:
In `@static/js/sessions.js`:
- Around line 85-88: The workspace load is waiting on two independent requests
in sequence, which adds unnecessary latency. Update the sessions-loading flow in
sessions.js so fetchSchemaDriftBannerHtml() and the
/api/projects/${encodeURIComponent(projectName)}/sessions fetch start in
parallel, then await both results together before using schemaBannerHtml and
state.cachedSessions. Keep the change localized to the sessions-loading logic
and preserve the existing assignment behavior once both promises resolve.

In `@utils/schema_drift.py`:
- Around line 90-121: The baseline JSON is being re-read and re-parsed on every
call through diff_against_baseline() and load_baseline_fields(), which adds
repeated disk I/O for data that does not change at runtime. Cache the parsed
baseline once instead of loading it per session, either by memoizing
load_baseline_fields() with a lazy cache such as functools.lru_cache or by
initializing the baseline at module import, and keep diff_against_baseline()
using the cached result.
- Around line 71-87: The observed type-tracking code in
collect_field_paths_with_types is not used by diff_against_baseline, so the
drift logic should be simplified to only compare field paths. Remove the unused
type-collection helper and any expected_type plumbing from the baseline handling
unless you are adding type-drift checks in this same change, and keep the
remaining symbols centered around collect_field_paths, baseline, and
diff_against_baseline.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 118cf0f8-8beb-4e8a-ab11-f5c572bfcdac

📥 Commits

Reviewing files that changed from the base of the PR and between 4345d69 and 8a7f170.

📒 Files selected for processing (9)

api/schema_report.py
app.py
schema_baseline.json
static/css/style.css
static/js/sessions.js
tests/fixtures/jsonl/unknown_field.jsonl
tests/test_schema_drift.py
utils/jsonl_parser.py
utils/schema_drift.py

…108) Cache schema_baseline.json with lru_cache and make record_parse_drift non-fatal on baseline I/O or parse errors so parsing never aborts. Fetch /api/schema-report after sessions load so the banner reflects drift from the current parse run. Add vitest coverage for banner rendering and fetch ordering; extend pytest for malformed baseline. Raise benchmark baselines for per-entry field-path fingerprinting.

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread utils/jsonl_parser.py

clean6378-max-it requested a review from timon0305 July 2, 2026 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

claude-code-chat-browser: schema drift detection for upstream JSONL c…#108

claude-code-chat-browser: schema drift detection for upstream JSONL c…#108
clean6378-max-it wants to merge 2 commits into
masterfrom
feat/schema-drift-detection

clean6378-max-it commented Jul 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

clean6378-max-it commented Jul 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Out of scope

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clean6378-max-it commented Jul 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading