Skip to content

Fix HtmlToDjot reverse-conversion round-trip fidelity#208

Merged
dereuromark merged 1 commit into
masterfrom
fix/htmltodjot-roundtrip-fidelity
Jun 3, 2026
Merged

Fix HtmlToDjot reverse-conversion round-trip fidelity#208
dereuromark merged 1 commit into
masterfrom
fix/htmltodjot-roundtrip-fidelity

Conversation

@dereuromark
Copy link
Copy Markdown
Contributor

Problem

The default DjotConverter output (without round-trip annotations) did not survive HtmlToDjot back-conversion in several cases, so a Djot -> HTML -> Djot -> HTML cycle changed the rendered HTML. This is exactly the path WYSIWYG and HTML-serializing tools rely on, where HTML stability across the round-trip matters.

Found via a round-trip drift harness over the core element set plus element combinations.

Fixes

Bracket-in-label corruption. processLink() blanket-escaped every ] in the already-built child markup, mangling structural brackets produced by a nested image or link. [![alt](/i.png)](https://x.com) became [![alt\](/i.png)](https://x.com) (invalid). ] is now escaped at the text-node level in escapeDjotText(), where it is known to be literal, so structural ] emitted by child element processors stays intact. A literal ] inside a link/span label is still neutralized. This also fixes literal ] inside span and semantic-span labels, which had the same latent issue.

Footnote reference. The default footnote reference <a href="#fnN" role="doc-noteref"> was unrecognized and produced invalid Djot ([^1^](#fn1){#fnref1}). It now round-trips to a footnote marker, pairing with the definition already collected by processEndnotesSection(). Annotated inline footnotes (which also carry role="doc-noteref") are explicitly excluded and unaffected.

Loose lists. A loose list (each item wrapped in <p>) collapsed into a tight list on the way back. Items are now separated by a blank line when the list is loose, so the next parse stays loose. Tight lists are unchanged.

Explicit heading and section ids. A <section id="..."> whose id is not derivable from the heading text dropped the id, so the next render generated a different one. Explicit ids are now preserved, while auto-derived slugs stay implicit so they regenerate identically. The unsluggable-heading case (where Djot falls back to an s-N id) is handled: an s-N id is treated as auto, any other id as explicit. Auto-slug detection reuses HeadingIdTracker::normalizeId() so it cannot drift from the renderer (transliteration included).

Not addressed (documented as known limitations)

These remain "drift" in the harness but are not reverse-converter bugs:

  • Soft break inside block text normalizes a newline to a space - renders identically.
  • A raw HTML block (=html) round-trips to a paragraph - the reverse side cannot know the original was a raw block without annotation.
  • <b> maps to strong markup - semantically equivalent.

Notes

Unit tests cover each fix with exact reverse output, plus a new HTML-stability data-provider suite for the default (un-annotated) converter path. Full suite green (2485 tests), phpstan and phpcs clean.

The default DjotConverter output (no round-trip annotations) did not
survive HtmlToDjot back-conversion in several cases, changing the
rendered HTML after a Djot -> HTML -> Djot -> HTML cycle. This is the
path WYSIWYG and HTML-serializing tools rely on.

Fixes:

- Bracket-in-label corruption: processLink() blanket-escaped every `]`
  in already-built child markup, mangling structural brackets from a
  nested image or link (`[![alt](src)](href)` became `[![alt\](src)]...`).
  `]` is now escaped at the text-node level in escapeDjotText(), where it
  is known to be literal, so structural brackets emitted by child elements
  stay intact. This also fixes a literal `]` inside span and semantic-span
  labels.

- Footnote reference: the default footnote reference
  `<a href="#fnN" role="doc-noteref">` was unrecognized and produced
  invalid Djot; it now round-trips to a footnote marker. Annotated inline
  footnotes are unaffected.

- Loose lists: a loose list (each item wrapped in `<p>`) collapsed into a
  tight list; items are now separated by a blank line to stay loose.

- Explicit heading and section ids: a section with a non-derivable id
  dropped it; explicit ids are now preserved, while auto-derived slugs
  (including the `s-N` fallback for unsluggable headings) stay implicit so
  they regenerate identically.

Adds unit tests for each case plus an HTML-stability round-trip suite for
the default (un-annotated) converter path.
@dereuromark dereuromark added the bug Something isn't working label Jun 3, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

❌ Patch coverage is 93.93939% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.87%. Comparing base (72200ec) to head (2afb8e0).

Files with missing lines Patch % Lines
src/Converter/HtmlToDjot.php 93.93% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master     #208   +/-   ##
=========================================
  Coverage     91.87%   91.87%           
- Complexity     3470     3489   +19     
=========================================
  Files           104      104           
  Lines          9829     9860   +31     
=========================================
+ Hits           9030     9059   +29     
- Misses          799      801    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dereuromark dereuromark merged commit 1a36893 into master Jun 3, 2026
6 checks passed
@dereuromark dereuromark deleted the fix/htmltodjot-roundtrip-fidelity branch June 3, 2026 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant