Skip to content

feat: implement Python storage cleaner with Appwrite integration#338

Merged
hmacr merged 5 commits intoappwrite:mainfrom
zeus2611:feat-310-translate-3-node.js-templates-to-python
Feb 3, 2026
Merged

feat: implement Python storage cleaner with Appwrite integration#338
hmacr merged 5 commits intoappwrite:mainfrom
zeus2611:feat-310-translate-3-node.js-templates-to-python

Conversation

@zeus2611
Copy link
Copy Markdown
Contributor

@zeus2611 zeus2611 commented Sep 30, 2025

What does this PR do?

This PR implements a Python translation of the existing Node.js AppwriteService for cleaning storage buckets by removing files older than a retention period. It also provides a Python version of the serverless handler function that validates environment variables, authenticates the API key from request headers, and triggers the bucket cleanup process.

Test Plan

The changes were verified by running the Python service locally with mock environment variables and headers to ensure the cleanup function executes without error and correctly deletes files as intended. Additionally, error handling was tested by providing invalid or missing API keys to confirm appropriate error responses are returned.

Related PRs and Issues

#310

Have you read the Contributing Guidelines on issues?

Yes, I have read and followed the contributing guidelines.

Summary by CodeRabbit

  • New Features

    • Introduced a Storage Cleaner service that automatically removes files from storage buckets based on a configurable retention period.
  • Documentation

    • Added comprehensive documentation including usage, configuration, and environment setup instructions.
  • Chores

    • Added project configuration files, dependency specifications, and version control configuration.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Sep 30, 2025

Walkthrough

This pull request introduces a complete Python-based storage cleaner service for Appwrite. The implementation includes configuration files (.gitignore, requirements.txt), documentation (README.md), and service modules (main.py, utils.py, appwrite_service.py). The service removes files older than a configured retention period from an Appwrite storage bucket. It validates required environment variables, authenticates via API key headers, queries files by expiry date, and performs concurrent deletion with comprehensive error handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing a Python storage cleaner service with Appwrite integration, which is the core objective of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
python/storage-cleaner/src/appwrite.py (4)

8-8: Remove unused import.

The ID import is not used anywhere in this module.

-from appwrite.id import ID

17-24: Validate required environment variables.

The __init__ method reads APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID without validation. If these are missing or None, the Appwrite client may fail later during API calls with unclear errors.

Consider adding validation:

 def __init__(self, api_key: str):
+    endpoint = os.getenv("APPWRITE_FUNCTION_API_ENDPOINT")
+    project_id = os.getenv("APPWRITE_FUNCTION_PROJECT_ID")
+    
+    if not endpoint or not project_id:
+        raise ValueError("Missing required environment variables: APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID")
+    
     client = (
         Client()
-        .set_endpoint(os.getenv("APPWRITE_FUNCTION_API_ENDPOINT"))
-        .set_project(os.getenv("APPWRITE_FUNCTION_PROJECT_ID"))
+        .set_endpoint(endpoint)
+        .set_project(project_id)
         .set_key(api_key)
     )
     self.storage = Storage(client)

43-43: Use defensive dictionary access.

Directly accessing f["$id"] will raise a KeyError if the API response format changes or is malformed. Consider using .get() with a fallback or explicit validation.

-            self.storage.delete_file(bucket_id, f["$id"])
+            file_id = f.get("$id")
+            if file_id:
+                self.storage.delete_file(bucket_id, file_id)

45-46: Consider more Pythonic empty check.

Using not files is more idiomatic in Python than len(files) == 0.

-        if len(files) == 0:
+        if not files:
             break
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79a7bcc and d786c8c.

📒 Files selected for processing (6)
  • python/storage-cleaner/.gitignore (1 hunks)
  • python/storage-cleaner/README.md (1 hunks)
  • python/storage-cleaner/requirements.txt (1 hunks)
  • python/storage-cleaner/src/appwrite.py (1 hunks)
  • python/storage-cleaner/src/main.py (1 hunks)
  • python/storage-cleaner/src/utils.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
python/storage-cleaner/src/appwrite.py (1)
python/storage-cleaner/src/utils.py (1)
  • get_expiry_date (9-23)
python/storage-cleaner/src/main.py (2)
python/storage-cleaner/src/appwrite.py (2)
  • AppwriteService (13-46)
  • clean_bucket (26-46)
python/storage-cleaner/src/utils.py (1)
  • throw_if_missing (26-40)
🪛 markdownlint-cli2 (0.18.1)
python/storage-cleaner/README.md

11-11: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.13.1)
python/storage-cleaner/src/utils.py

40-40: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
python/storage-cleaner/README.md (1)

24-24: Verify timeout sufficiency for large buckets.

The 15-second timeout may be insufficient when cleaning buckets with thousands of old files. Each batch fetches 25 files, and with network latency for delete operations, larger buckets could exceed this limit and cause incomplete cleanup.

Consider testing with a representative bucket size, or document the expected maximum bucket size this function can handle within 15 seconds. You may need to increase the timeout or implement cursor-based resumption for very large buckets.

python/storage-cleaner/src/main.py (1)

7-10: Environment validation logic approved.

The use of throw_if_missing correctly validates required environment variables before proceeding with bucket operations.

Comment thread python/storage-cleaner/requirements.txt
Comment thread python/storage-cleaner/src/appwrite.py
Comment thread python/storage-cleaner/src/main.py Outdated
Comment thread python/storage-cleaner/src/main.py Outdated
Comment thread python/storage-cleaner/src/utils.py
Comment thread python/storage-cleaner/src/utils.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
python/storage-cleaner/src/main.py (1)

9-14: Fix header access to prevent KeyError.

Line 9 uses direct dictionary access context.req.headers["x-appwrite-key"], which will raise a KeyError if the header is missing—before the check on Line 11 can execute. This exposes internal errors to clients.

Apply this diff:

-    api_key = context.req.headers["x-appwrite-key"]
+    api_key = context.req.headers.get("x-appwrite-key")
 
     if not api_key:
🧹 Nitpick comments (1)
python/storage-cleaner/src/appwrite.py (1)

17-24: Consider validating required environment variables.

The __init__ method reads APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID from the environment without validation. If these are missing, the Appwrite SDK client will likely fail with an unclear error during subsequent API calls.

You can either:

  1. Let the SDK fail naturally (current approach—acceptable if calling code validates upstream).
  2. Add explicit validation here for clearer error messages:
def __init__(self, api_key: str):
    endpoint = os.getenv("APPWRITE_FUNCTION_API_ENDPOINT")
    project_id = os.getenv("APPWRITE_FUNCTION_PROJECT_ID")
    
    if not endpoint or not project_id:
        raise ValueError(
            "Missing required environment variables: "
            "APPWRITE_FUNCTION_API_ENDPOINT, APPWRITE_FUNCTION_PROJECT_ID"
        )
    
    client = (
        Client()
        .set_endpoint(endpoint)
        .set_project(project_id)
        .set_key(api_key)
    )
    self.storage = Storage(client)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d786c8c and accc817.

📒 Files selected for processing (4)
  • python/storage-cleaner/requirements.txt (1 hunks)
  • python/storage-cleaner/src/appwrite.py (1 hunks)
  • python/storage-cleaner/src/main.py (1 hunks)
  • python/storage-cleaner/src/utils.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-30T10:09:36.561Z
Learnt from: zeus2611
PR: appwrite/templates#338
File: python/storage-cleaner/src/utils.py:26-40
Timestamp: 2025-09-30T10:09:36.561Z
Learning: In the Python storage-cleaner template at python/storage-cleaner/src/utils.py, the throw_if_missing function intentionally treats 0 as an invalid value to prevent a retention period of 0, which would cause files to be deleted immediately and risk accidental data loss.

Applied to files:

  • python/storage-cleaner/src/utils.py
🧬 Code graph analysis (2)
python/storage-cleaner/src/appwrite.py (1)
python/storage-cleaner/src/utils.py (1)
  • get_expiry_date (9-23)
python/storage-cleaner/src/main.py (2)
python/storage-cleaner/src/appwrite.py (2)
  • AppwriteService (12-65)
  • clean_bucket (26-65)
python/storage-cleaner/src/utils.py (1)
  • throw_if_missing (26-41)
🪛 Ruff (0.13.1)
python/storage-cleaner/src/appwrite.py

45-47: Avoid specifying long messages outside the exception class

(TRY003)


46-46: Use explicit conversion flag

Replace with conversion flag

(RUF010)


56-56: Do not catch blind exception: Exception

(BLE001)


63-65: Avoid specifying long messages outside the exception class

(TRY003)

python/storage-cleaner/src/main.py

23-23: Do not catch blind exception: Exception

(BLE001)

python/storage-cleaner/src/utils.py

41-41: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (5)
python/storage-cleaner/requirements.txt (1)

1-1: LGTM!

The pinned version constraint ensures reproducible builds and prevents breaking changes from future major releases.

python/storage-cleaner/src/main.py (1)

18-25: LGTM!

The exception handling properly wraps bucket operations, distinguishing between validation errors (400) and unexpected failures (500) while logging details for debugging.

python/storage-cleaner/src/utils.py (2)

9-23: LGTM!

The function uses timezone-aware datetime (Python 3.12+ compatible), handles invalid environment values gracefully, and returns a properly formatted ISO 8601 timestamp.


26-41: LGTM!

The validation logic correctly treats 0 as invalid to prevent immediate deletion and data loss. The inline comment on Line 36 documents this business rule for maintainers.

python/storage-cleaner/src/appwrite.py (1)

41-65: LGTM!

The error handling properly wraps API calls, tracks successes and failures, and provides detailed diagnostics when deletions fail. The loop correctly exits when no more files are returned.

Comment thread python/storage-cleaner/src/appwrite.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
python/storage-cleaner/src/main.py (1)

20-20: Consider a more informative success message.

The message "Buckets cleaned" is plural but only one bucket is cleaned. Consider making it singular or adding details about the operation.

-        return context.res.text("Buckets cleaned", 200)
+        return context.res.text("Bucket cleaned successfully", 200)

Alternatively, if you want to return structured data:

-        return context.res.text("Buckets cleaned", 200)
+        return context.res.json({"message": "Bucket cleaned successfully", "bucket_id": os.environ["APPWRITE_BUCKET_ID"]}, 200)
python/storage-cleaner/src/appwrite.py (1)

50-58: File deletion loop handles errors well.

The loop safely extracts file IDs, wraps deletions in error handling, and tracks both successes and failures. The use of the file_id variable in the exception handler correctly avoids the KeyError issue flagged in previous reviews.

Optional improvement: Files without a $id field are silently skipped (line 53's conditional). Consider logging these cases for visibility:

             try:
                 file_id = f.get("$id")
                 if file_id:
                     self.storage.delete_file(bucket_id, file_id)
                     deleted_files_count += 1
+                else:
+                    print(f"Warning: File object missing $id in bucket {bucket_id}")
             except Exception as e:
                 failed_files.append({"id": file_id, "error": str(e)})
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between accc817 and 0d103d6.

📒 Files selected for processing (2)
  • python/storage-cleaner/src/appwrite.py (1 hunks)
  • python/storage-cleaner/src/main.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
python/storage-cleaner/src/main.py (2)
python/storage-cleaner/src/appwrite.py (2)
  • AppwriteService (12-65)
  • clean_bucket (26-65)
python/storage-cleaner/src/utils.py (1)
  • throw_if_missing (26-41)
python/storage-cleaner/src/appwrite.py (1)
python/storage-cleaner/src/utils.py (1)
  • get_expiry_date (9-23)
🪛 Ruff (0.13.1)
python/storage-cleaner/src/main.py

23-23: Do not catch blind exception: Exception

(BLE001)

python/storage-cleaner/src/appwrite.py

45-47: Avoid specifying long messages outside the exception class

(TRY003)


46-46: Use explicit conversion flag

Replace with conversion flag

(RUF010)


56-56: Do not catch blind exception: Exception

(BLE001)


63-65: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (8)
python/storage-cleaner/src/main.py (3)

1-3: LGTM! Clean and minimal imports.

The imports are appropriate for the serverless function's requirements.


9-14: Excellent API key validation!

The safe header access using .get() combined with the empty-value check properly prevents KeyError exceptions and provides clear error messages.


18-25: Exception handling properly addresses previous concerns.

The try-except structure correctly catches and handles errors from the bucket cleaning operation, logs details internally, and returns sanitized error messages. The broad Exception catch is appropriate in this serverless context to prevent unhandled exceptions from crashing the function.

Note: The static analysis warning about catching Exception is a false positive here - broad exception handling is appropriate for serverless entry points to ensure graceful degradation.

python/storage-cleaner/src/appwrite.py (5)

1-9: LGTM! Well-organized imports and documentation.

The module docstring and imports are clean and appropriate for the storage cleanup service.


17-24: Client initialization looks correct.

The fluent API usage for configuring the Appwrite client is clean and idiomatic. Ensure that the required environment variables (APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID) are validated at the entry point (see comment in main.py).


26-39: Good query setup and pagination strategy.

The batch size of 25 is reasonable for iterative deletion, and the counter initialization properly tracks success and failure states.


41-48: Robust error handling for list operation.

The try-except block properly catches API failures, preserves the exception chain with from e, and provides clear context. The broad Exception catch is appropriate here since the Appwrite SDK may raise various exception types.

Note: Static analysis warnings (TRY003, RUF010) are pedantic in this context - the current implementation is clear and maintainable.


59-65: Appropriate error handling for partial failures.

The loop termination logic is correct, and raising a RuntimeError for partial failures ensures callers are aware of issues. The detailed error message (including counts and failed file details) is valuable for debugging, despite the static analysis warning.

Note: The current behavior raises an exception even if some files were successfully deleted. This is a conservative approach that ensures operators are notified of any failures. If you prefer to log warnings instead and only raise on total failure, consider adjusting the logic - but the current behavior is safer.

Comment thread python/storage-cleaner/src/main.py Outdated
Comment thread python/storage-cleaner/src/main.py
@zeus2611
Copy link
Copy Markdown
Contributor Author

Hi @Meldiron, just bumping this thread. I'm happy to implement the validation for the system env vars if that is the standard for this repo, even if they are auto-injected. Let me know how you'd like to proceed!

Comment thread python/storage-cleaner/src/appwrite.py Outdated
@hmacr
Copy link
Copy Markdown

hmacr commented Feb 2, 2026

@zeus2611 Thanks for the contribution! Could you share a screen recording of your testing?

…missing environment variables, and standardize JSON responses.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@python/storage-cleaner/src/appwrite.py`:
- Around line 42-69: The current while True loop around list_files/delete_file
can hang if deletions fail because the same files keep reappearing; modify the
cleanup loop in the method that calls list_files and delete_file to stop after a
batch with failures (or implement a max retry counter). Concretely: track a
retry or batch count (e.g., max_retries) and increment each loop, or check
failed_files after processing a batch and break out of the while loop if
failed_files is non-empty; ensure deleted_files_count is still updated and that
any remaining failed_files are returned/reported to the caller so the function
terminates instead of looping forever.

Comment thread python/storage-cleaner/src/appwrite_service.py
…lation script and a local runner for the storage cleaner.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@python/storage-cleaner/src/appwrite_service.py`:
- Around line 54-60: Validate the batch before creating futures: filter the
files list for entries with a truthy f.get("$id") and if that filtered list is
empty, set batch_failed = True (or break/return) to avoid reprocessing the same
batch; then build future_to_file from the filtered list and proceed to call
self.storage.delete_file within the ThreadPoolExecutor. Ensure you
reference/modify the variables and constructs seen here (files, batch_failed,
future_to_file, ThreadPoolExecutor, self.storage.delete_file) so missing $id
entries are treated as failures and the loop can terminate.

Comment thread python/storage-cleaner/src/appwrite_service.py
@zeus2611
Copy link
Copy Markdown
Contributor Author

zeus2611 commented Feb 3, 2026

Hi @hmacr,

Following up on your feedback, I've implemented the concurrent file deletion to optimize the cleanup process. I also took the opportunity to improve the error handling for missing environment variables and standardized the JSON responses to match the expected format.

Here is a screen recording demonstrating the testing process:

Screen.Recording.2026-02-03.at.11.25.41.AM.mov

Testing Overview:

  1. Data Seeding: Ran a helper script (populate_storage.py) locally to create multiple test files in the bucket for cleanup.
  2. Local Testing: Verified the logic using a local runner (run_local.py) to ensure concurrent deletions work as expected.
  3. Appwrite Deployment: Deployed the function to Appwrite, configured the required environment variables, and executed it to verify real-world performance and integration.

Let me know if you would like any further changes!

@hmacr hmacr merged commit ab01b02 into appwrite:main Feb 3, 2026
1 check passed
@zeus2611 zeus2611 deleted the feat-310-translate-3-node.js-templates-to-python branch February 4, 2026 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants