gh-102555 Increase HTML standard compliance for closing comment tags#117406
gh-102555 Increase HTML standard compliance for closing comment tags#117406Privat33r-dev wants to merge 7 commits intopython:mainfrom
Conversation
|
We might as well handle the |
ezio-melotti
left a comment
There was a problem hiding this comment.
Thanks for the PR. Before reviewing and merging, tests should be added.
| _declname_match = re.compile(r'[a-zA-Z][-_.a-zA-Z0-9]*\s*').match | ||
| _declstringlit_match = re.compile(r'(\'[^\']*\'|"[^"]*")\s*').match | ||
| _commentclose = re.compile(r'--\s*>') | ||
| _commentclose = re.compile(r'--!?>') |
There was a problem hiding this comment.
I would leave the \s*, even though I should double check what the HTML5 specs say exactly.
There was a problem hiding this comment.
I would leave the
\s*, even though I should double check what the HTML5 specs say exactly.
I provided the links to HTML5 specification earlier and "\s*" mentioned nowhere, moreover, my tests with latest versions of Firefox and Chrome has shown that it's in fact an incorrect behaviour and is not considered a closing tag by modern browsers. Thus I see no reason in keeping it (nor spec, nor common practice).
There was a problem hiding this comment.
https://html.spec.whatwg.org/#comment-end-state is the section of the specs I was looking for. It does indeed mention the ! but not the spaces, so updating the code accordingly sounds good to me.
Do you want to add tests to check these (-->, --!>, -- >, --x>, --->, etc.) cases?
There was a problem hiding this comment.
https://html.spec.whatwg.org/#comment-end-state is the section of the specs I was looking for. It does indeed mention the
!but not the spaces, so updating the code accordingly sounds good to me.Do you want to add tests to check these (
-->,--!>,-- >,--x>,--->, etc.) cases?
I am thinking about improving the solution to even include <!-->, unexpected EOF and similar other test cases (that were mentioned in a similar PR), but at the moment, unfortunately, I am lacking time to work on this PR. Hopefully, in the week (or at the weekend at worst) I can add the test cases and change a few other parts of the code to handle even wider variety of edge cases.
|
@ezio-melotti EOF edge-case (described here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-eof-in-comment ) appears to be a bit more complicated, I will try to resolve it today as well anyway. Currently added short comment ( |
|
New EOF behaviour seems to be consistent with chromium-based browser. There is still an edge case with EOF-ending abrupt comment case, but the case is relatively hard to handle and quite rare (html after tag that starts with http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state Ready for review. |
|
@ezio-melotti it would be nice if you can review the change soon :) |
|
I would like to give this a proper review before merging, but unfortunately I probably won't have time to look at this until the end of May. If I haven't replied by then, feel free to ping me again, and thanks for working on this! |
Thanks for updating me on the review status, glad to know that it's planned 👍🏻 |
Hi. It's the last day of May, so I decided to ping you :) |
|
@ezio-melotti I wonder if you might have some time this time? :) |
|
@ezio-melotti I appreciate your earlier willingness to review the PR, and I understand that sometimes time is tight. Would you like to review it now? :) |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
I completely forgot about this issue and this PR and created #135664.
The part related to unclosed comments was already included in #135464.
There are few flaws in this PR.
<!--!>and<!---!>should not be parsed as abruptly ended empty comments.--!>does not work here.- When unclosed comment ends with
-,--or--!just before EOF, they should not be appended to the comment token's data.
You can try to fix these issues in this PR, but I'll just add you as co-author of #135664 -- this will be simpler.
Thanks for deep dive. I double-checked the specs and confirm that you are right. I think that we can just go with your PR as it would be much simpler. Thanks for the honor of co-authorship :) |
Now code follows the recommendation.