Skip to content

๐Ÿ›ก๏ธ Sentinel: [HIGH] Fix XSS via un-sanitized go-readability output#149

Draft
lucasew wants to merge 4 commits intomasterfrom
sentinel-fix-xss-bluemonday-15074944369555111581
Draft

๐Ÿ›ก๏ธ Sentinel: [HIGH] Fix XSS via un-sanitized go-readability output#149
lucasew wants to merge 4 commits intomasterfrom
sentinel-fix-xss-bluemonday-15074944369555111581

Conversation

@lucasew
Copy link
Owner

@lucasew lucasew commented Mar 1, 2026

Severity: High
Vulnerability: XSS (Cross-Site Scripting). The application was taking the raw HTML output of go-readability and directly injecting it into the HTML template, JSON, or Markdown formats without intermediate sanitization. go-readability primarily cleans up document structure for reading but does not rigorously prevent all XSS vectors (e.g. event handlers like onerror inside allowed tags might survive).
Impact: A specially crafted webpage could embed malicious JavaScript that bypasses go-readability's stripping. When the articleparser renders it, the JS would execute in the victim's browser, potentially stealing sessions, cookies, or redirecting the user.
Fix: Introduced the github.com/microcosm-cc/bluemonday library with the UGCPolicy (User Generated Content) to explicitly strip out any potentially malicious HTML tags and attributes. The parsed HTML content is passed through bluemonday's Sanitize() method before being formatted as HTML, JSON, Markdown, or Text. A sanitizeHTML helper handles this, utilizing a pre-configured global HTMLSanitizer. Added a dedicated test (api/xss_test.go) to ensure onerror handlers and scripts are blocked.

Assumptions

  • bluemonday.UGCPolicy strikes the right balance between preserving standard reading-oriented HTML structure while mitigating all active content (JS/CSS/Objects).
  • Sanitizing during the format* stage is appropriate and avoids modifying the internal state of the readability.Article (which is useful if a raw mode is ever required).

Alternatives Not Chosen

  • Parsing the string manually with standard regex or custom Go HTML walkers (too error-prone, XSS requires a dedicated state machine).
  • Attempting to sanitize before readability parsing (would break go-readability's heuristic algorithms which rely on raw page structure).

How To Pivot

  • If UGCPolicy is too restrictive (e.g., strips specific layout styles/data attributes the Sakura theme relies on), switch to bluemonday.StrictPolicy with explicit .AllowElements(...) overrides for required tags.

Next Knobs

  • Modify HTMLSanitizer in api/index.go to add custom allowed tags (e.g., .AllowAttrs("class").OnElements("span")).
  • Add tests for edge-case URL decoding XSS payloads.

PR created automatically by Jules for task 15074944369555111581 started by @lucasew

Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

๐Ÿ‘‹ Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a ๐Ÿ‘€ emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@vercel
Copy link

vercel bot commented Mar 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
articleparser Ready Ready Preview, Comment Mar 1, 2026 6:26am

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the application's security posture by mitigating a critical XSS vulnerability. Previously, content parsed from external sources was rendered without proper sanitization, posing a risk of malicious script injection. The changes introduce a robust HTML sanitization library, bluemonday, to filter out unsafe elements and attributes, thereby preventing XSS attacks across all output formats and ensuring user safety when consuming parsed articles.

Highlights

  • XSS Vulnerability Fix: Addressed a high-severity Cross-Site Scripting (XSS) vulnerability where unsanitized output from go-readability was directly injected into HTML, JSON, or Markdown formats, allowing potential execution of malicious JavaScript.
  • HTML Sanitization with Bluemonday: Integrated the github.com/microcosm-cc/bluemonday library, specifically using its UGCPolicy, to rigorously sanitize all HTML content parsed by go-readability before it is rendered or outputted in any format.
  • Comprehensive Sanitization Application: Implemented sanitization across all output formats (HTML, Markdown, JSON, and plain Text) to ensure that no malicious content can bypass the new security measures.
  • Dedicated XSS Test: Introduced a new test file (api/xss_test.go) to specifically verify that common XSS vectors, such as script tags and onerror attributes, are effectively blocked by the sanitization process.
Changelog
  • .jules/sentinel.md
    • Appended a note regarding the XSS vulnerability fix and the use of bluemonday.
  • api/index.go
    • Imported github.com/microcosm-cc/bluemonday.
    • Initialized a global HTMLSanitizer using bluemonday.UGCPolicy().
    • Created a sanitizeHTML helper function to apply the sanitization.
    • Applied sanitizeHTML to the content before rendering in formatHTML.
    • Applied sanitizeHTML to the content before converting to Markdown in formatMarkdown.
    • Applied sanitizeHTML to the content before encoding to JSON in formatJSON.
    • Applied sanitizeHTML to the content before writing as plain text in formatText.
  • api/xss_test.go
    • Added TestXSSHTMLSanitization to confirm that malicious HTML content (like script tags and onerror attributes) is correctly stripped by the sanitizer.
  • go.mod
    • Added github.com/microcosm-cc/bluemonday as a new dependency.
  • go.sum
    • Updated checksums for new bluemonday and related dependencies.
Activity
  • PR created automatically by Jules for task 15074944369555111581 started by @lucasew.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with ๐Ÿ‘ and ๐Ÿ‘Ž on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. โ†ฉ

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a high-severity XSS vulnerability by introducing the bluemonday library to sanitize HTML output from go-readability. The sanitization is correctly applied across all output formats (HTML, JSON, Markdown, and Text), and the inclusion of a dedicated test case is a great addition. I have two suggestions for improvement: one to fix a functional bug in the formatText handler to ensure it returns plain text as intended, and another to enhance performance by using byte slices instead of strings during sanitization to reduce memory allocations.

api/index.go Outdated
Comment on lines +326 to +328
func sanitizeHTML(html string) string {
return HTMLSanitizer.Sanitize(html)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve performance and reduce memory allocations, consider operating on byte slices ([]byte) instead of strings. bytes.Buffer.String() creates a new string, which involves an allocation and copy, whereas bytes.Buffer.Bytes() often returns the underlying byte slice without a copy. The bluemonday library provides a SanitizeBytes method that works directly on []byte. You could change this helper function and update its call sites to leverage this more efficient approach.

Suggested change
func sanitizeHTML(html string) string {
return HTMLSanitizer.Sanitize(html)
}
func sanitizeHTMLBytes(html []byte) []byte {
return HTMLSanitizer.SanitizeBytes(html)
}

api/index.go Outdated
Comment on lines +391 to +393
sanitizedContent := sanitizeHTML(buf.String())

if _, err := w.Write([]byte(sanitizedContent)); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The formatText function is intended to return plain text, but sanitizeHTML uses bluemonday.UGCPolicy, which preserves many HTML tags. This results in HTML-like content being served with a text/plain content type. To correctly implement this function's purpose, you should strip all HTML tags. bluemonday.StripTagsPolicy() is designed for this. Creating the policy inline is acceptable here to keep the change localized.

Suggested change
sanitizedContent := sanitizeHTML(buf.String())
if _, err := w.Write([]byte(sanitizedContent)); err != nil {
sanitizedContent := bluemonday.StripTagsPolicy().Sanitize(buf.String())
if _, err := w.Write([]byte(sanitizedContent)); err != nil {

Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant