replace bleach with nh3 for HTML sanitization#14442
replace bleach with nh3 for HTML sanitization#14442valentijnscholten wants to merge 6 commits intoDefectDojo:devfrom
Conversation
bleach is deprecated and archived. nh3 is its Rust-backed successor, actively maintained and significantly faster.
|
This pull request renders user-provided content using mark_safe after cleaning it with nh3.clean, which can disable Django's escaping and—absent explicit enforcement of safe URL schemes and stripping of dangerous attributes or protocols—creates a critical potential XSS vulnerability. The code should avoid mark_safe on untrusted input or ensure nh3.clean strictly enforces allowed tags/attributes and safe URI schemes before rendering.
🔴 Potential Cross-Site Scripting in
|
| Vulnerability | Potential Cross-Site Scripting |
|---|---|
| Description | The code calls mark_safe on content cleaned by nh3.clean and then renders it. mark_safe disables Django's automatic escaping; if nh3.clean's allowlist (_NH3_ALLOWED_TAGS / _NH3_ALLOWED_ATTRIBUTES) permits attributes or tag combinations that can be abused (e.g., javascript: URIs in href, data URIs, or any on* event handlers) or if nh3.clean does not strip dangerous protocols/attributes, user-controlled input can reach the rendered HTML sink as unescaped markup and lead to XSS. The patch defines a relatively small allowed tag/attribute set, but there is no evidence in the patch that nh3.clean enforces safe URL schemes or strips dangerous attributes; the usage of nh3.clean + mark_safe is therefore risky and confirms a potential XSS path. |
django-DefectDojo/dojo/templatetags/display_tags.py
Lines 103 to 106 in 3d669fe
Comment to provide feedback on these findings.
Report false positive: @dryrunsecurity fp [FINDING ID] [FEEDBACK]
Report low-impact: @dryrunsecurity nit [FINDING ID] [FEEDBACK]
Example: @dryrunsecurity fp drs_90eda195 This code is not user-facing
All finding details can be found in the DryRun Security Dashboard.
…link - Use escape() when building HTML in create_bleached_link so attribute values are properly encoded before nh3 parses them (prevents raw tags in href/title when user-supplied content contains HTML) - Add rel="noopener noreferrer" to all expected link strings in tests (nh3 automatically injects this on target="_blank" links) - Replace exact-output XSS assertion with semantic safety checks
nh3/ammonia does not re-escape < in attribute values when re-serializing, so passing escape()'d HTML through nh3.clean() still produced raw angle brackets in href/title. The function constructs trusted HTML itself, so nh3 is redundant here — escape() is sufficient and correct. Also adds rel="noopener noreferrer" explicitly and updates tests to match the new output including the exact XSS-escaped form.
Maffooch
left a comment
There was a problem hiding this comment.
Just a question - I'm not dying on this hill 😄
| The `bleach` library has been replaced by [`nh3`](https://nh3.readthedocs.io/) for HTML sanitization. This is a drop-in replacement in most cases, but there are two minor behavioral changes to be aware of: | ||
|
|
||
| - **`style` attributes are no longer allowed.** `bleach` supported CSS property-level filtering (e.g. allowing only `color` or `font-weight`). `nh3` has no equivalent, so `style` attributes are stripped entirely to avoid allowing arbitrary CSS injection. Content that previously relied on inline styles (e.g. colored text in the login banner, background-color on markdown images) will lose that styling. | ||
| - **Disallowed tags are stripped rather than escaped.** Previously, a tag like `<script>` would be rendered as the literal text `<script>`. Now the tag is removed entirely and only its text content is kept. This is the correct behavior for a sanitizer. |
There was a problem hiding this comment.
Since this is a security tool, wont this render XSS findings stored in dojo totally incomplete? It wont be be clear what the original payload is if we strip out the tags entirely. It feels better to URL encode them for our specific use case
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
Bleach has been deprecated since Feb '23. Although it still works, a security product like Defect Dojo needs to switch to the goto replacement:
nh3which is also faster.Summary
bleachlibrary withnh3, its actively maintained, Rust-backed successorbleach[css]fromrequirements.txtand addsnh3dojo/utils.py,dojo/templatetags/display_tags.py, anddojo/templatetags/get_banner.pybleach.ALLOWED_TAGS/bleach.ALLOWED_ATTRIBUTESwith explicit constants (_NH3_ALLOWED_TAGS,_NH3_ALLOWED_ATTRIBUTES) shared between the two template tag modulesNote on
styleattribute:bleachsupported CSS property-level filtering viaCSSSanitizer(e.g. allowing onlycolorandfont-weight).nh3has no equivalent — it cannot filter individual CSS properties, only entire attributes. To avoid allowing arbitrary CSS injection, thestyleattribute is no longer permitted on any element. In practice this means inline styling (e.g. colored text in the login banner, background-color on images in markdown) will be stripped rather than sanitized. This is the safer trade-off.Note on disallowed tag handling:
bleachescaped disallowed tags (e.g.<script>→<script>), making them visible as literal text.nh3strips them entirely, keeping only the text content. This is the correct behavior for a sanitizer.