-
-
Notifications
You must be signed in to change notification settings - Fork 48
Unicode support for XML and HTML tags name #196
Copy link
Copy link
Open
Description
Tempest version
3.0
PHP version
8.5
Operating system
macOS
Description
Currently, for XML and HTML, we are capturing at best \w which translate into [a-zA-Z0-9_]. But tag names can contains almost any unicode characters after the first char (this one must be [a-zA-Z] for HTML, a bit more permissive for XML).
Here's the specs I've found for each language:
- XML: https://www.w3.org/TR/REC-xml/#NT-NameStartChar
- HTML: https://dom.spec.whatwg.org/#valid-element-local-name
The HTML spec even gave us a valid ReGex: /^(?:[A-Za-z][^\0\t\n\f\r\u0020/>]*|[:_\u0080-\u{10FFFF}][A-Za-z0-9-.:_\u0080-\u{10FFFF}]*)$/u
So the following are valid in HTML:
<math-α></math-α>
<emotion-😍-emoji></emotion-😍-emoji>(And I guess even GitHub doesn't render then correctly 🥲)
Here's how Gecko (Firefox) is rendering the above HTML:

Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels