Skip to content

replace bleach with nh3 for HTML sanitization#14442

Open
valentijnscholten wants to merge 6 commits intoDefectDojo:devfrom
valentijnscholten:bleach-to-nh3
Open

replace bleach with nh3 for HTML sanitization#14442
valentijnscholten wants to merge 6 commits intoDefectDojo:devfrom
valentijnscholten:bleach-to-nh3

Conversation

@valentijnscholten
Copy link
Member

@valentijnscholten valentijnscholten commented Mar 4, 2026

Bleach has been deprecated since Feb '23. Although it still works, a security product like Defect Dojo needs to switch to the goto replacement: nh3 which is also faster.

Summary

  • Replaces the deprecated (archived) bleach library with nh3, its actively maintained, Rust-backed successor
  • Drops bleach[css] from requirements.txt and adds nh3
  • Updates all call sites in dojo/utils.py, dojo/templatetags/display_tags.py, and dojo/templatetags/get_banner.py
  • Replaces bleach.ALLOWED_TAGS / bleach.ALLOWED_ATTRIBUTES with explicit constants (_NH3_ALLOWED_TAGS, _NH3_ALLOWED_ATTRIBUTES) shared between the two template tag modules

Note on style attribute: bleach supported CSS property-level filtering via CSSSanitizer (e.g. allowing only color and font-weight). nh3 has no equivalent — it cannot filter individual CSS properties, only entire attributes. To avoid allowing arbitrary CSS injection, the style attribute is no longer permitted on any element. In practice this means inline styling (e.g. colored text in the login banner, background-color on images in markdown) will be stripped rather than sanitized. This is the safer trade-off.

Note on disallowed tag handling: bleach escaped disallowed tags (e.g. <script>&lt;script&gt;), making them visible as literal text. nh3 strips them entirely, keeping only the text content. This is the correct behavior for a sanitizer.

bleach is deprecated and archived. nh3 is its Rust-backed successor,
actively maintained and significantly faster.
@github-actions github-actions bot added the ui label Mar 4, 2026
@valentijnscholten valentijnscholten added the affects_pro PRs that affect Pro and need a coordinated release/merge moment. label Mar 4, 2026
@dryrunsecurity
Copy link

dryrunsecurity bot commented Mar 4, 2026

DryRun Security

This pull request renders user-provided content using mark_safe after cleaning it with nh3.clean, which can disable Django's escaping and—absent explicit enforcement of safe URL schemes and stripping of dangerous attributes or protocols—creates a critical potential XSS vulnerability. The code should avoid mark_safe on untrusted input or ensure nh3.clean strictly enforces allowed tags/attributes and safe URI schemes before rendering.

🔴 Potential Cross-Site Scripting in dojo/templatetags/display_tags.py (drs_fa72d57f)
Vulnerability Potential Cross-Site Scripting
Description The code calls mark_safe on content cleaned by nh3.clean and then renders it. mark_safe disables Django's automatic escaping; if nh3.clean's allowlist (_NH3_ALLOWED_TAGS / _NH3_ALLOWED_ATTRIBUTES) permits attributes or tag combinations that can be abused (e.g., javascript: URIs in href, data URIs, or any on* event handlers) or if nh3.clean does not strip dangerous protocols/attributes, user-controlled input can reach the rendered HTML sink as unescaped markup and lead to XSS. The patch defines a relatively small allowed tag/attribute set, but there is no evidence in the patch that nh3.clean enforces safe URL schemes or strips dangerous attributes; the usage of nh3.clean + mark_safe is therefore risky and confirms a potential XSS path.

return mark_safe(nh3.clean(message, tags=_NH3_ALLOWED_TAGS, attributes=_NH3_ALLOWED_ATTRIBUTES))
def text_shortener(value, length):


Comment to provide feedback on these findings.

Report false positive: @dryrunsecurity fp [FINDING ID] [FEEDBACK]
Report low-impact: @dryrunsecurity nit [FINDING ID] [FEEDBACK]

Example: @dryrunsecurity fp drs_90eda195 This code is not user-facing

All finding details can be found in the DryRun Security Dashboard.

@valentijnscholten valentijnscholten changed the title chore(deps): replace bleach with nh3 for HTML sanitization replace bleach with nh3 for HTML sanitization Mar 4, 2026
@github-actions github-actions bot added the docs label Mar 4, 2026
…link

- Use escape() when building HTML in create_bleached_link so attribute
  values are properly encoded before nh3 parses them (prevents raw tags
  in href/title when user-supplied content contains HTML)
- Add rel="noopener noreferrer" to all expected link strings in tests
  (nh3 automatically injects this on target="_blank" links)
- Replace exact-output XSS assertion with semantic safety checks
nh3/ammonia does not re-escape < in attribute values when re-serializing,
so passing escape()'d HTML through nh3.clean() still produced raw angle
brackets in href/title. The function constructs trusted HTML itself, so
nh3 is redundant here — escape() is sufficient and correct.

Also adds rel="noopener noreferrer" explicitly and updates tests to
match the new output including the exact XSS-escaped form.
@valentijnscholten valentijnscholten added this to the 2.57.0 milestone Mar 6, 2026
Copy link
Contributor

@mtesauro mtesauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

Copy link
Contributor

@Maffooch Maffooch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question - I'm not dying on this hill 😄

The `bleach` library has been replaced by [`nh3`](https://nh3.readthedocs.io/) for HTML sanitization. This is a drop-in replacement in most cases, but there are two minor behavioral changes to be aware of:

- **`style` attributes are no longer allowed.** `bleach` supported CSS property-level filtering (e.g. allowing only `color` or `font-weight`). `nh3` has no equivalent, so `style` attributes are stripped entirely to avoid allowing arbitrary CSS injection. Content that previously relied on inline styles (e.g. colored text in the login banner, background-color on markdown images) will lose that styling.
- **Disallowed tags are stripped rather than escaped.** Previously, a tag like `<script>` would be rendered as the literal text `&lt;script&gt;`. Now the tag is removed entirely and only its text content is kept. This is the correct behavior for a sanitizer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a security tool, wont this render XSS findings stored in dojo totally incomplete? It wont be be clear what the original payload is if we strip out the tags entirely. It feels better to URL encode them for our specific use case

@github-actions
Copy link
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

affects_pro PRs that affect Pro and need a coordinated release/merge moment. docs integration_tests ui unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants