Skip to content

Allow HTML tags in link text#2

Draft
remibetin wants to merge 1 commit intow3c:mainfrom
remibetin:fix-regex
Draft

Allow HTML tags in link text#2
remibetin wants to merge 1 commit intow3c:mainfrom
remibetin:fix-regex

Conversation

@remibetin
Copy link
Copy Markdown
Member

@remibetin remibetin commented Aug 27, 2024

Resolves w3c/wai-website#481 (and also addresses w3c/wai-std-gl-overview#21)

Description of the problem

When link text is enclosed in double square brackets, [[link_text]](/url/), we want the plugin to replace it with the title of the target resource, in the same language as the current page. When the target page has not been translated yet, we want the plugin to keep the link text that is between brackets.

That is generally working, but sometimes, translators prefer to keep part of the link text in English. In this case, they add <span lang="en"></span> markup in the link text.

The plugin does not handle this case correctly.

Example raised in w3c/wai-website#481

[[Χρησιμοποιώντας Υλικά του <span lang='en'>WAI</span>: Άδεια Χρήσης με Απόδοση (<span lang='en'>Attribution</span>)]](/about/using-wai-material/)

is rendered as:

Μπορείτε να χρησιμοποιήσετε αυτό το βίντεο εάν συμπεριλάβετε έναν σύνδεσμο προς αυτήν τη σελίδα. Περισσότερες πληροφορίες διατίθενται στο «Χρησιμοποιώντας Υλικά του WAI: Άδεια Χρήσης με Απόδοση (Attribution) (στα Αγγλικά)»(/WAI/about/using-wai-material/){: hreflang=”en”}.

Instead of:

Χρησιμοποιώντας Υλικά του WAI: Άδεια Χρήσης με Απόδοση (Attribution) (στα Αγγλικά)

Technical side of the problem

Double brackets links are currently processed in 2 steps:

  • Step 1: [[link_text]](/url/#fragment)
    is turned into
    <<link_text ("in English" in the target language)>>(/WAI/url/#fragment){: hreflang="en"}

    document.content.gsub!(/\[\[([^\]\]]+?)\]\]\((?!\/TR)(?!\/WAI)(\/.*?\/)(?:#(.*?))?\)/i) do |match|
    translatedpage = getPage document.site, Regexp.last_match[2], document.data['lang']
    if Regexp.last_match[3].nil?
    fragment = ''
    else
    fragment = '#' + Regexp.last_match[3]
    end
    if translatedpage.nil?
    '<<' + Regexp.last_match[1] + inenglishtext +'>>({{ "' + Regexp.last_match[2] +'" | relative_url }}'+ fragment+')' + hreflang

  • Step 2: <<link_text ("in English" in the target language)>>(/WAI/url/#fragment){: hreflang="en"}
    is expected to be turned into
    [link_text ("in English" in the target language)](/WAI/url/#fragment){: hreflang="en"}

    document.content.gsub!(/<<([^>>]+?)>>/i) do |match|
    '[' + Regexp.last_match[1] +']'
    end

Problem: the regular expression /<<([^>>]+?)>>/i tries to extract the link text but stops at first > character (the second > is superfluous). Including at first HTML tag.

Solution

[In Progress]

@remibetin remibetin marked this pull request as ready for review August 27, 2024 12:13
@remibetin remibetin marked this pull request as draft August 27, 2024 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue regarding the display of markup on the WAI website (Greek)

1 participant