claude-code-chat-browser: YAML escape fix — handle colons, hashes, tabs, bool literals in frontmatter export#107
Conversation
Refactor frontmatter export through _session_frontmatter_dict and _append_yaml_value so all string fields are double-quoted or block-scalars. _escape_yaml handles tabs, hashes, colons, bool-like literals, and non-printable Unicode. Add round-trip and Hypothesis property tests; pin PyYAML in dev deps.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds YAML dev dependencies, rewrites frontmatter serialization in ChangesYAML Frontmatter Escaping
Estimated code review effort: 3 (Moderate) | ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/test_md_exporter_yaml.py (2)
70-83: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueInconsistent frontmatter extraction: naive
split("---")vs. the robust helper defined above.Line 82 extracts the YAML block with
md.split("---")[1], a plain substring split, while line 83 uses the delimiter-line-aware_extract_frontmatter_dictfor the same assertion. The naive split would misbehave if any frontmatter value ever contains the literal substring"---"(not triggered by current fixed test data, but a footgun for future edits). Prefer using_extract_frontmatter_dictconsistently instead of duplicating the assertion with two different extraction strategies.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_md_exporter_yaml.py` around lines 70 - 83, The YAML frontmatter roundtrip test uses two different extraction paths, and the direct md.split("---")[1] approach in test_yaml_frontmatter_roundtrip is brittle. Update the assertion to use the same robust helper, _extract_frontmatter_dict, consistently for frontmatter parsing so the test relies on delimiter-line-aware extraction instead of naive substring splitting.
97-104: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valuePrefer
hypothesis.assume()over earlyreturnto skip invalid examples.Returning early when
"\n" in s or "\r" in smakes the test pass trivially for those inputs rather than telling Hypothesis to discard/reject them, which weakens shrinking and example diversity for the intended single-line case.♻️ Suggested change
-from hypothesis import given, settings, strategies as st +from hypothesis import assume, given, settings, strategies as st @@ def test_escape_yaml_roundtrip(s: str) -> None: """Double-quoted scalars round-trip for arbitrary single-line text.""" - if "\n" in s or "\r" in s: - return + assume("\n" not in s and "\r" not in s) loaded = yaml.safe_load(f"key: {_escape_yaml(s)}") assert loaded["key"] == s🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_md_exporter_yaml.py` around lines 97 - 104, Update test_escape_yaml_roundtrip to use hypothesis.assume() instead of returning early when the generated string contains newline or carriage return characters. Keep the single-line constraint in place by rejecting invalid examples with assume, so Hypothesis correctly discards them while still exercising _escape_yaml and yaml.safe_load on valid inputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_md_exporter_yaml.py`:
- Around line 107-114: The YAML round-trip fuzz test in
test_yaml_string_field_roundtrip is exposing that _append_yaml_value does not
preserve strings ending with trailing newlines because the current block scalar
style strips them on load. Update the string serialization logic in
_append_yaml_value so values like "\n" survive a safe_load round-trip, using a
representation that preserves trailing line breaks or an equivalent escaping
strategy, and keep the behavior verified by test_yaml_string_field_roundtrip.
In `@utils/md_exporter.py`:
- Around line 525-534: The multiline `|-` fast-path in `md_exporter.py` is lossy
for some string values, especially those with trailing newlines, `\r\n`, or
leading indentation, so narrow the `value.splitlines()` branch in the YAML
export logic to only cases that round-trip safely. Update the condition inside
the value-handling block to use `splitlines()` only when the exact original
content can be preserved, otherwise fall back to `_escape_yaml(value)`; use the
existing `content_lines` handling in `md_exporter.py` as the place to fix this.
---
Nitpick comments:
In `@tests/test_md_exporter_yaml.py`:
- Around line 70-83: The YAML frontmatter roundtrip test uses two different
extraction paths, and the direct md.split("---")[1] approach in
test_yaml_frontmatter_roundtrip is brittle. Update the assertion to use the same
robust helper, _extract_frontmatter_dict, consistently for frontmatter parsing
so the test relies on delimiter-line-aware extraction instead of naive substring
splitting.
- Around line 97-104: Update test_escape_yaml_roundtrip to use
hypothesis.assume() instead of returning early when the generated string
contains newline or carriage return characters. Keep the single-line constraint
in place by rejecting invalid examples with assume, so Hypothesis correctly
discards them while still exercising _escape_yaml and yaml.safe_load on valid
inputs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3f333a4a-d6f4-4ca9-98b1-b44b7741fec9
📒 Files selected for processing (3)
requirements-dev.txttests/test_md_exporter_yaml.pyutils/md_exporter.py
| @FUZZ_SETTINGS | ||
| @given(st.text()) | ||
| def test_yaml_string_field_roundtrip(s: str) -> None: | ||
| """Frontmatter string serializer round-trips arbitrary text.""" | ||
| lines: list[str] = [] | ||
| _append_yaml_value(lines, "key", s) | ||
| loaded = yaml.safe_load("\n".join(lines)) | ||
| assert loaded["key"] == s |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the block-scalar chomping round-trip bug for a trailing-newline string.
python3 - <<'PY'
import sys
sys.path.insert(0, "utils")
from md_exporter import _append_yaml_value
import yaml
lines = []
_append_yaml_value(lines, "key", "\n")
text = "\n".join(lines)
print(repr(text))
print(repr(yaml.safe_load(text)))
PYRepository: cppalliance/claude-code-chat-browser
Length of output: 200
test_yaml_string_field_roundtrip fails for strings ending in \n _append_yaml_value(lines, "key", "\n") serializes to key: |-\n , which loads back as "" because |- strips trailing line breaks. st.text() can shrink to this case, so the fuzz test is expected to fail unless trailing-newline strings are handled differently (for example, | or quoted escaping).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_md_exporter_yaml.py` around lines 107 - 114, The YAML round-trip
fuzz test in test_yaml_string_field_roundtrip is exposing that
_append_yaml_value does not preserve strings ending with trailing newlines
because the current block scalar style strips them on load. Update the string
serialization logic in _append_yaml_value so values like "\n" survive a
safe_load round-trip, using a representation that preserves trailing line breaks
or an equivalent escaping strategy, and keep the behavior verified by
test_yaml_string_field_roundtrip.
Block scalars cannot preserve newline-only strings on Python 3.12 where splitlines() yields ['']. Always emit double-quoted scalars. Widen test_escape_yaml_roundtrip to cover multiline text.
Drop brittle md.split('---') assertion; _extract_frontmatter_dict already
covers the same check with proper delimiter handling.
|
The PR body claims "Multiline printable strings use |- block scalars; non-printable multiline falls back to quotes escapes." But _escape_yaml always emits a double-quoted scalar with \n for newlines - there's no block-scalar code anywhere test_multiline_title_uses_quoted_scalar even asserts the quoted from. |
Store list fields as lists in _session_frontmatter_dict and serialize them as YAML block sequences in _append_yaml_value instead of comma-joined strings that consumers cannot parse as arrays.
Good catch — block scalars were removed in df37c92, updated the PR description |
Closes #102
Summary
_session_frontmatter_dict()+_append_yaml_value()_escape_yaml()always emits double-quoted scalars with proper escaping for",\, tabs, newlines, control chars, and non-printable Unicodemodels_usedandservice_tiersemit as YAML sequences (not comma-joined strings)test_yaml_frontmatter_roundtripand Hypothesis@given(st.text())property testsPyYAML>=6.0torequirements-dev.txtSprint item Add CLI + Web GUI for browsing Claude Code sessions #6 (3 pt). Independent of PR claude-code-chat-browser: SessionMetadataDict — tighten all-NotRequired fields for mypy shape enforcement #106 (SessionMetadataDict).
Problem
Unquoted frontmatter values broke on colons,
#comments, tabs, and YAML boolean literals (true,off, etc.) in real session titles and paths.Test plan
pytest tests/test_md_exporter_yaml.py— 5 passedpytest -q— 424 passed, 13 skippedmypy api utils models— passruff check .— passFix: handle edge case— confirmyaml.safe_load()parses frontmatterSummary by CodeRabbit
#, quotes, backslashes, newlines).