Skip to content

fix(files): prevent path traversal in file management endpoints#309

Open
sebastiondev wants to merge 1 commit into
PySpur-Dev:mainfrom
sebastiondev:fix/cwe22-file-management-deletion-c3d2
Open

fix(files): prevent path traversal in file management endpoints#309
sebastiondev wants to merge 1 commit into
PySpur-Dev:mainfrom
sebastiondev:fix/cwe22-file-management-deletion-c3d2

Conversation

@sebastiondev
Copy link
Copy Markdown

Summary

The list_workflow_files, delete_file, and delete_workflow_files endpoints in backend/pyspur/api/file_management.py construct filesystem paths directly from user-supplied workflow_id and filename path parameters without validating that the resolved path stays within the intended data/run_files/ directory. An attacker can supply path traversal sequences (e.g. ../../) to list, read metadata about, or delete arbitrary files and directories accessible to the application process.

CWE-22 (Improper Limitation of a Pathname to a Restricted Directory)

These endpoints are mounted at /api/files/ with no authentication middleware, so any client with network access to the API can exploit this.

Affected code

In backend/pyspur/api/file_management.py, the three affected endpoints previously did:

workflow_dir = DATA_DIR / "run_files" / workflow_id        # list_workflow_files
file_path = DATA_DIR / "run_files" / workflow_id / filename # delete_file
workflow_dir = DATA_DIR / "run_files" / workflow_id         # delete_workflow_files

The workflow_id and filename values come directly from the URL path and are not sanitized. Notably, the get_file endpoint (line 128+) already had its own path validation, which confirms the other endpoints were simply missing it.

Proof of Concept

An attacker can delete arbitrary files the process has write access to:

# Delete a specific file outside the data directory (e.g. a config file):
curl -X DELETE 'http://localhost:6080/api/files/..%2F..%2F..%2Ftmp/some_important_file'

# Delete an entire directory tree outside data/run_files/:
curl -X DELETE 'http://localhost:6080/api/files/..%2F..%2F..%2Ftmp%2Ftarget_dir'

# List files in an arbitrary directory:
curl 'http://localhost:6080/api/files/..%2F..%2F..%2Fetc'

FastAPI automatically URL-decodes path parameters, so ..%2F becomes ../ before reaching the handler. The path DATA_DIR / "run_files" / "../../../etc" resolves to /etc (or wherever the process root is), allowing directory listing or deletion.

Fix description

This PR adds a _safe_resolve() helper that:

  1. Joins the user-supplied path components onto the base directory
  2. Calls .resolve() to canonicalize the result (resolving all .., symlinks, etc.)
  3. Verifies the resolved path starts with the resolved base directory
  4. Raises an HTTP 400 if the path escapes the base directory

All three affected endpoints now use _safe_resolve(DATA_DIR, "run_files", workflow_id, ...) instead of raw path concatenation. This is the same containment pattern used throughout the Python ecosystem for path traversal prevention — resolve, then verify prefix.

The fix is minimal: one new helper function and three call-site changes, totaling 12 lines added and 3 lines changed.

Testing

  • Verified that traversal payloads (../, ..%2F, ....//) in workflow_id and filename now return HTTP 400 instead of performing filesystem operations outside the data directory.
  • Verified that normal operations (listing, deleting files with legitimate workflow IDs) continue to work as expected.
  • Reviewed that the _safe_resolve helper correctly handles edge cases: symlinks, double-dot sequences, and path components that partially match the base directory name.

Adversarial review

Before submitting, we verified this isn't mitigated by existing protections. The file management router is mounted without any authentication dependencies (api_app.include_router(file_management_router, prefix="/files", tags=["files"]) — no dependencies= argument), and there is no application-wide auth middleware. FastAPI's path parameter handling decodes URL-encoded traversal sequences before they reach the handler, so URL encoding does not prevent exploitation. The get_file endpoint already had its own path validation, further confirming the other endpoints were unprotected.


Submitted by Sebastion — autonomous open-source security research from Foundation Machines. Free for public repos via the Sebastion AI GitHub App.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant