Skip to content

rewrite-xml: support HTML void elements in JSP/HTML parsing#7906

Open
knutwannheden wants to merge 1 commit into
mainfrom
lucky-quail
Open

rewrite-xml: support HTML void elements in JSP/HTML parsing#7906
knutwannheden wants to merge 1 commit into
mainfrom
lucky-quail

Conversation

@knutwannheden
Copy link
Copy Markdown
Contributor

Motivation

JSP and HTML files are parsed by the rewrite-xml XML grammar (the JSP extensions and the .jsp extension live there). The grammar's element rule only knew two shapes — <a>…</a> and <a/> — with no notion of an HTML void element. HTML5 allows void elements such as <br>, <img>, <input> and <meta> to be written without a trailing slash.

When the parser hit <br>, it treated it as the start of a normal element expecting </br>. ANTLR error recovery then mangled the tree (an entire <html>…</html> block collapsed to <html/> on reprint), the reprint no longer matched the input, and Parser.requirePrintEqualsInput downgraded the whole file to a ParseError. The original text was preserved, but the file became opaque to recipes.

This surfaced on real, valid Spring Boot JSP smoke tests (welcome.jsp), whose only "offense" was an unclosed <br>.

Examples

Previously this .jsp failed to parse (became a ParseError); it now round-trips as a proper Xml.Document:

<html lang="en">
<body>
	<br>
	Message
	<br>
</body>
</html>

Void elements with attributes are supported too:

<meta charset="utf-8">
<link rel="stylesheet" href="app.css">
<img src="logo.png" alt="logo">
<input type="text" name="q">
<hr>

Strict XML is unchanged: an element that merely shares a name with a void element still parses as a container (<link>https://example.com</link>), and an unclosed <br> in a plain .xml file remains a ParseError.

Summary

  • XMLParser.g4: added a third element alternative for void elements, gated by an isVoidElement($name.text) semantic predicate, plus an empty voidClose marker rule so the choice is detectable in the parse tree.
  • Added XMLParserBase and wired it via the grammar's superClass option. It holds the htmlMode flag and isVoidElement(...), keeping the .g4 free of target-specific (Java) members so the C# generation in rewrite-csharp is not broken. (When the C# sources are next regenerated they will need a matching XMLParserBase in the OpenRewrite.Xml.Grammar namespace; this is noted in a grammar comment.)
  • XmlParser: enables htmlMode only for .jsp/.jspx/.html/.htm sources.
  • XmlParserVisitor: maps the void shape to a Tag with null content/closing and attaches an HtmlVoidElement marker.
  • New HtmlVoidElement marker + XmlPrinter: a marked tag prints a bare > instead of />. A marker (rather than a model-field change) keeps the LST shape and serialization unchanged.
  • Regenerated the Java ANTLR sources.

Test plan

  • New XmlParserTest cases: <br> in a .jsp; void elements with attributes (<meta>/<link>/<img>/<input>/<hr>); the full Spring Boot welcome.jsp round-trip.
  • Regression guards: void-named container elements (<link>…</link>, <source>…</source>) still parse in XML mode; an unclosed <br> in .xml remains a ParseError (void leniency is HTML-only).
  • Existing JSP tests (jsp, jspScriptlet, mixedJspElements, …) still pass.
  • ./gradlew :rewrite-xml:check is green (tests + license).

JSP and HTML sources are parsed by the XML grammar, whose `element` rule
only accepted fully-closed (`<a>…</a>`) and self-closing (`<a/>`) tags.
An HTML void element written without a slash (e.g. `<br>`) was parsed as
the start of a normal element; ANTLR error recovery then mangled the tree,
the reprint no longer matched the input, and the whole file was downgraded
to a ParseError.

Add HTML void-element support, enabled only for HTML-like sources
(.jsp/.jspx/.html/.htm) so strict XML parsing is unaffected:

- the grammar gains a void-element alternative gated by a semantic
  predicate, plus an empty `voidClose` marker rule to detect it
- htmlMode and isVoidElement live in a hand-written XMLParserBase wired
  via the grammar's superClass option, keeping the .g4 free of
  target-specific members so the C# generation is not broken
- void tags carry an HtmlVoidElement marker so the printer emits a bare
  `>` instead of `/>`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant