HTML API: Implement adoption agency algorithm and active format reconstruction#81
HTML API: Implement adoption agency algorithm and active format reconstruction#81sirreal wants to merge 3 commits into
Conversation
Introduce the supporting operations that the adoption agency algorithm and active formatting element reconstruction require on the two parser stacks. On the stack of open elements: - Extract the "in scope" element list into a shared class constant. - Add `has_node_in_scope()`, which reports whether a specific node (rather than any element of a given tag name) is in scope. The adoption agency algorithm must test a specific formatting element, regardless of other open elements sharing its tag name. On the list of active formatting elements, add position-indexed operations (`position_of()`, `remove_at()`, `insert_at()`, `replace_node()`) so entries can be cloned and replaced in place as the algorithms direct. These additions are unused until the algorithms are implemented and do not change parsing behavior.
Implement the adoption agency algorithm and active formatting element reconstruction so the HTML Processor handles misnested formatting elements instead of bailing. Previously the processor stopped whenever a document required reconstructing implicitly-closed formatting elements (e.g. `<p><b>1<p>2`) or running the adoption agency algorithm (e.g. `<b>1<p>2</b>3`). Both are now supported: - `reconstruct_active_formatting_elements()` reopens the run of unclosed formatting elements at the end of the list, per the specification's rewind/advance/create steps. - `run_adoption_agency_algorithm()` implements the full algorithm, including the furthest-block case and the "any other end tag" fallback. - The "Noah's Ark clause" limits the list of active formatting elements to three equivalent entries (same tag name, namespace, and attributes). Because the processor visits a document in a single pass, it cannot relocate nodes it has already reported. The parser's state (the stack of open elements and the list of active formatting elements) is maintained exactly as the specification requires, so every token visited after these algorithms run is reported with the ancestor chain a browser would produce. Nodes which were already visited when a misnesting is discovered remain where they were found. Formatting elements reopened by the parser are reported as "virtual" nodes. Reading an attribute, class, or qualified name of such a node reports the value from the tag which opened the original element; these nodes cannot be modified. Supporting this required hardening stack-event provenance so a single source tag never produces two visitor events: pushes are matched to the current token by identity, and each tag closer is matched to at most one popped node. The html5lib test cases whose constructed trees differ only because the adoption agency algorithm re-parents already-visited nodes are skip-listed with a shared reason; each was verified to match browser behavior for parser state and normalization. The absorbed `wpHtmlSupportRequiredActiveFormatReconstruction` test and the previous bail-asserting cases are replaced with tests of the new behavior.
A FORM end tag encountered while other elements remain open no longer stops the parser. The form element is removed from the stack of open elements using the same reconciliation the adoption agency algorithm uses, so any elements that remain open after it are reported with correct breadcrumbs. The scope check now tests the specific form element pointer rather than any FORM element in scope, matching the specification. One html5lib case (`<form><div></form><div>`) exercises a shape a single-pass token stream cannot represent: browsers keep the closed FORM as a DOM ancestor of its still-open descendants. This parser reports following content outside the closed FORM, mirroring the stack of open elements; the case is skip-listed with that reason.
|
Agent's report @ 764adbd:
|
Trac ticket:
Use of AI Tools
Example disclosure:
AI assistance: Yes
Tool(s): Claude Code
Model(s): Fable 5
Used for: Implementation.
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.