Skip to content

feat(uploads): validate file types on upload (#6)#1316

Merged
jonfroehlich merged 1 commit into
masterfrom
6-validate-filetypes-on-upload
Jun 17, 2026
Merged

feat(uploads): validate file types on upload (#6)#1316
jonfroehlich merged 1 commit into
masterfrom
6-validate-filetypes-on-upload

Conversation

@jonfroehlich

Copy link
Copy Markdown
Member

Closes #6.

What & why

Adds file-type validation to every admin-facing upload field. Uploads are staff/superuser-only, so this is defense-in-depth: it stops a careless or compromised account from placing a browser-executable file (e.g. .svg/.html) on a public /media/ path, where the web server would serve it as active content (stored XSS). It also rejects plainly-wrong types (e.g. a .pages headshot) before they break the thumbnail pipeline.

No new dependencies — uses Django's built-in FileExtensionValidator + a small magic-byte sniff. (Independent of the CKEditor work in #1269.)

Approach

New module website/utils/upload_validators.py — two layers per field:

  1. Extension allowlist (FileExtensionValidator).
  2. Magic-byte content sniff so a renamed payload (evil.htmlevil.pdf) is still caught.

Strategy differs by field, on purpose:

  • images / PDFs / videospositive check: bytes must match a known-good signature (few, stable, well-known formats).
  • raw_file (talk/poster/pub source) → negative check: accepts any non-web-executable bytes and only rejects HTML/SVG/XML. This lets proprietary source formats (pptx, key, fig, sketch, zip, …) through without us chasing a magic number for every design tool, while still killing the XSS vector. A denylist expresses the actual goal ("don't serve active web content") more honestly than a positive allowlist here.

Extra safeguards:

  • HEIC/HEIF rejected with a "convert to JPEG/PNG" message — most browsers can't render it and Pillow/easy_thumbnails can't process it without pillow-heif, so it would produce broken images. (Auto-conversion is a separate follow-up.)
  • New-uploads-only gate: validators skip already-stored files (committed FieldFiles), so editing a legacy record whose file predates these rules won't break — and re-validating stored files adds no security.

Fields wired

pdf_file/raw_file on Artifact (inherited by Grant/Poster/Publication/Talk), Banner.image/Banner.video, and the image fields on News, Person (image + easter_egg), Photo, Project, Sponsor. (Artifact.thumbnail is auto-generated and skipped.)

Allowlists: images jpg/jpeg/png/gif/webp, pdf pdf, video mp4/webm/mov/m4v, raw_file pdf/ppt/pptx/key/doc/docx/zip/fig/sketch.

Testing

  • New website/tests/test_upload_validators.py15 SimpleTestCase cases (no DB, ~2ms): per category accept-valid / reject-wrong-extension / reject-renamed-payload, plus the HEIC message, .fig/.sketch acceptance, and the committed-file gate.
  • Full suite: 200 pass, 1 pre-existing skip, run via python manage.py test website --settings=makeabilitylab.settings_test.
  • makemigrations --dry-run confirms the validators serialize cleanly to an AlterField migration (harmless under the per-deploy regeneration flow; nothing committed — migrations are gitignored).

UI note

Per the UI-change convention: the only visible effect is an admin validation error on a bad upload (no public-page change), so there are no public before/after screenshots. Happy to add a screenshot of the admin error message if useful.

Follow-ups (not in this PR)

🤖 Generated with Claude Code

Add type validation to every admin-facing upload field. Uploads are
staff/superuser-only, so this is defense-in-depth against a careless or
compromised account placing a browser-executable file (e.g. .svg/.html) on a
public /media/ path where it would be served as active content (stored XSS);
it also rejects plainly-wrong types before they break the thumbnail pipeline.

New website/utils/upload_validators.py — extension allowlist
(FileExtensionValidator) + a magic-byte content sniff, no new dependency:
- positive signature check for images / PDFs / videos (few, stable formats);
- negative check for raw_file (talk/poster/pub source): accepts any
  non-web-executable bytes so proprietary formats (pptx, key, fig, sketch,
  zip, ...) pass without us tracking each signature, rejects HTML/SVG/XML;
- HEIC/HEIF rejected with a "convert to JPEG/PNG" message (browsers and
  Pillow can't render it; auto-conversion is a separate follow-up);
- validators run on new uploads only (committed FieldFiles are skipped) so
  editing a legacy record whose file predates these rules won't break.

Wired validators onto pdf_file/raw_file (Artifact, inherited by
Grant/Poster/Publication/Talk), Banner image/video, and the News, Person
(image + easter_egg), Photo, Project, and Sponsor image fields.

Tests: website/tests/test_upload_validators.py (15 SimpleTestCase cases) —
per category accept-valid / reject-wrong-extension / reject-renamed-payload,
plus the HEIC message, .fig/.sketch acceptance, and the committed-file gate.
Full suite: 200 pass, 1 pre-existing skip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jonfroehlich jonfroehlich merged commit 49b2545 into master Jun 17, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Admin Interface: Validate Filetypes on Upload

1 participant