๐ก๏ธ Sentinel: [HIGH] Fix XSS via un-sanitized go-readability output#149
๐ก๏ธ Sentinel: [HIGH] Fix XSS via un-sanitized go-readability output#149
Conversation
Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
|
๐ Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a ๐ emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the application's security posture by mitigating a critical XSS vulnerability. Previously, content parsed from external sources was rendered without proper sanitization, posing a risk of malicious script injection. The changes introduce a robust HTML sanitization library, Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with ๐ and ๐ on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses a high-severity XSS vulnerability by introducing the bluemonday library to sanitize HTML output from go-readability. The sanitization is correctly applied across all output formats (HTML, JSON, Markdown, and Text), and the inclusion of a dedicated test case is a great addition. I have two suggestions for improvement: one to fix a functional bug in the formatText handler to ensure it returns plain text as intended, and another to enhance performance by using byte slices instead of strings during sanitization to reduce memory allocations.
api/index.go
Outdated
| func sanitizeHTML(html string) string { | ||
| return HTMLSanitizer.Sanitize(html) | ||
| } |
There was a problem hiding this comment.
To improve performance and reduce memory allocations, consider operating on byte slices ([]byte) instead of strings. bytes.Buffer.String() creates a new string, which involves an allocation and copy, whereas bytes.Buffer.Bytes() often returns the underlying byte slice without a copy. The bluemonday library provides a SanitizeBytes method that works directly on []byte. You could change this helper function and update its call sites to leverage this more efficient approach.
| func sanitizeHTML(html string) string { | |
| return HTMLSanitizer.Sanitize(html) | |
| } | |
| func sanitizeHTMLBytes(html []byte) []byte { | |
| return HTMLSanitizer.SanitizeBytes(html) | |
| } |
api/index.go
Outdated
| sanitizedContent := sanitizeHTML(buf.String()) | ||
|
|
||
| if _, err := w.Write([]byte(sanitizedContent)); err != nil { |
There was a problem hiding this comment.
The formatText function is intended to return plain text, but sanitizeHTML uses bluemonday.UGCPolicy, which preserves many HTML tags. This results in HTML-like content being served with a text/plain content type. To correctly implement this function's purpose, you should strip all HTML tags. bluemonday.StripTagsPolicy() is designed for this. Creating the policy inline is acceptable here to keep the change localized.
| sanitizedContent := sanitizeHTML(buf.String()) | |
| if _, err := w.Write([]byte(sanitizedContent)); err != nil { | |
| sanitizedContent := bluemonday.StripTagsPolicy().Sanitize(buf.String()) | |
| if _, err := w.Write([]byte(sanitizedContent)); err != nil { |
Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
Severity: High
Vulnerability: XSS (Cross-Site Scripting). The application was taking the raw HTML output of
go-readabilityand directly injecting it into the HTML template, JSON, or Markdown formats without intermediate sanitization.go-readabilityprimarily cleans up document structure for reading but does not rigorously prevent all XSS vectors (e.g. event handlers likeonerrorinside allowed tags might survive).Impact: A specially crafted webpage could embed malicious JavaScript that bypasses
go-readability's stripping. When thearticleparserrenders it, the JS would execute in the victim's browser, potentially stealing sessions, cookies, or redirecting the user.Fix: Introduced the
github.com/microcosm-cc/bluemondaylibrary with theUGCPolicy(User Generated Content) to explicitly strip out any potentially malicious HTML tags and attributes. The parsed HTML content is passed throughbluemonday'sSanitize()method before being formatted as HTML, JSON, Markdown, or Text. AsanitizeHTMLhelper handles this, utilizing a pre-configured globalHTMLSanitizer. Added a dedicated test (api/xss_test.go) to ensureonerrorhandlers and scripts are blocked.Assumptions
bluemonday.UGCPolicystrikes the right balance between preserving standard reading-oriented HTML structure while mitigating all active content (JS/CSS/Objects).format*stage is appropriate and avoids modifying the internal state of thereadability.Article(which is useful if a raw mode is ever required).Alternatives Not Chosen
How To Pivot
UGCPolicyis too restrictive (e.g., strips specific layout styles/data attributes the Sakura theme relies on), switch tobluemonday.StrictPolicywith explicit.AllowElements(...)overrides for required tags.Next Knobs
HTMLSanitizerinapi/index.goto add custom allowed tags (e.g.,.AllowAttrs("class").OnElements("span")).PR created automatically by Jules for task 15074944369555111581 started by @lucasew