feat(uploads): validate file types on upload (#6)#1316
Merged
Conversation
Add type validation to every admin-facing upload field. Uploads are staff/superuser-only, so this is defense-in-depth against a careless or compromised account placing a browser-executable file (e.g. .svg/.html) on a public /media/ path where it would be served as active content (stored XSS); it also rejects plainly-wrong types before they break the thumbnail pipeline. New website/utils/upload_validators.py — extension allowlist (FileExtensionValidator) + a magic-byte content sniff, no new dependency: - positive signature check for images / PDFs / videos (few, stable formats); - negative check for raw_file (talk/poster/pub source): accepts any non-web-executable bytes so proprietary formats (pptx, key, fig, sketch, zip, ...) pass without us tracking each signature, rejects HTML/SVG/XML; - HEIC/HEIF rejected with a "convert to JPEG/PNG" message (browsers and Pillow can't render it; auto-conversion is a separate follow-up); - validators run on new uploads only (committed FieldFiles are skipped) so editing a legacy record whose file predates these rules won't break. Wired validators onto pdf_file/raw_file (Artifact, inherited by Grant/Poster/Publication/Talk), Banner image/video, and the News, Person (image + easter_egg), Photo, Project, and Sponsor image fields. Tests: website/tests/test_upload_validators.py (15 SimpleTestCase cases) — per category accept-valid / reject-wrong-extension / reject-renamed-payload, plus the HEIC message, .fig/.sketch acceptance, and the committed-file gate. Full suite: 200 pass, 1 pre-existing skip. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #6.
What & why
Adds file-type validation to every admin-facing upload field. Uploads are staff/superuser-only, so this is defense-in-depth: it stops a careless or compromised account from placing a browser-executable file (e.g.
.svg/.html) on a public/media/path, where the web server would serve it as active content (stored XSS). It also rejects plainly-wrong types (e.g. a.pagesheadshot) before they break the thumbnail pipeline.No new dependencies — uses Django's built-in
FileExtensionValidator+ a small magic-byte sniff. (Independent of the CKEditor work in #1269.)Approach
New module
website/utils/upload_validators.py— two layers per field:FileExtensionValidator).evil.html→evil.pdf) is still caught.Strategy differs by field, on purpose:
raw_file(talk/poster/pub source) → negative check: accepts any non-web-executable bytes and only rejects HTML/SVG/XML. This lets proprietary source formats (pptx,key,fig,sketch,zip, …) through without us chasing a magic number for every design tool, while still killing the XSS vector. A denylist expresses the actual goal ("don't serve active web content") more honestly than a positive allowlist here.Extra safeguards:
pillow-heif, so it would produce broken images. (Auto-conversion is a separate follow-up.)FieldFiles), so editing a legacy record whose file predates these rules won't break — and re-validating stored files adds no security.Fields wired
pdf_file/raw_fileonArtifact(inherited by Grant/Poster/Publication/Talk),Banner.image/Banner.video, and the image fields onNews,Person(image + easter_egg),Photo,Project,Sponsor. (Artifact.thumbnailis auto-generated and skipped.)Allowlists: images
jpg/jpeg/png/gif/webp, pdfpdf, videomp4/webm/mov/m4v, raw_filepdf/ppt/pptx/key/doc/docx/zip/fig/sketch.Testing
website/tests/test_upload_validators.py— 15SimpleTestCasecases (no DB, ~2ms): per category accept-valid / reject-wrong-extension / reject-renamed-payload, plus the HEIC message,.fig/.sketchacceptance, and the committed-file gate.python manage.py test website --settings=makeabilitylab.settings_test.makemigrations --dry-runconfirms the validators serialize cleanly to anAlterFieldmigration (harmless under the per-deploy regeneration flow; nothing committed — migrations are gitignored).UI note
Per the UI-change convention: the only visible effect is an admin validation error on a bad upload (no public-page change), so there are no public before/after screenshots. Happy to add a screenshot of the admin error message if useful.
Follow-ups (not in this PR)
pillow-heif+ a save-path step).get_ckeditor_image_filenamehardening — intentionally kept separate (Replace django-ckeditor (CKEditor 4 is EOL with known XSS issues) #1269).🤖 Generated with Claude Code