feat: implement Python storage cleaner with Appwrite integration#338
Conversation
WalkthroughThis pull request introduces a complete Python-based storage cleaner service for Appwrite. The implementation includes configuration files ( Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (4)
python/storage-cleaner/src/appwrite.py (4)
8-8: Remove unused import.The
IDimport is not used anywhere in this module.-from appwrite.id import ID
17-24: Validate required environment variables.The
__init__method readsAPPWRITE_FUNCTION_API_ENDPOINTandAPPWRITE_FUNCTION_PROJECT_IDwithout validation. If these are missing or None, the Appwrite client may fail later during API calls with unclear errors.Consider adding validation:
def __init__(self, api_key: str): + endpoint = os.getenv("APPWRITE_FUNCTION_API_ENDPOINT") + project_id = os.getenv("APPWRITE_FUNCTION_PROJECT_ID") + + if not endpoint or not project_id: + raise ValueError("Missing required environment variables: APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID") + client = ( Client() - .set_endpoint(os.getenv("APPWRITE_FUNCTION_API_ENDPOINT")) - .set_project(os.getenv("APPWRITE_FUNCTION_PROJECT_ID")) + .set_endpoint(endpoint) + .set_project(project_id) .set_key(api_key) ) self.storage = Storage(client)
43-43: Use defensive dictionary access.Directly accessing
f["$id"]will raise aKeyErrorif the API response format changes or is malformed. Consider using.get()with a fallback or explicit validation.- self.storage.delete_file(bucket_id, f["$id"]) + file_id = f.get("$id") + if file_id: + self.storage.delete_file(bucket_id, file_id)
45-46: Consider more Pythonic empty check.Using
not filesis more idiomatic in Python thanlen(files) == 0.- if len(files) == 0: + if not files: break
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
python/storage-cleaner/.gitignore(1 hunks)python/storage-cleaner/README.md(1 hunks)python/storage-cleaner/requirements.txt(1 hunks)python/storage-cleaner/src/appwrite.py(1 hunks)python/storage-cleaner/src/main.py(1 hunks)python/storage-cleaner/src/utils.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
python/storage-cleaner/src/appwrite.py (1)
python/storage-cleaner/src/utils.py (1)
get_expiry_date(9-23)
python/storage-cleaner/src/main.py (2)
python/storage-cleaner/src/appwrite.py (2)
AppwriteService(13-46)clean_bucket(26-46)python/storage-cleaner/src/utils.py (1)
throw_if_missing(26-40)
🪛 markdownlint-cli2 (0.18.1)
python/storage-cleaner/README.md
11-11: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.13.1)
python/storage-cleaner/src/utils.py
40-40: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
python/storage-cleaner/README.md (1)
24-24: Verify timeout sufficiency for large buckets.The 15-second timeout may be insufficient when cleaning buckets with thousands of old files. Each batch fetches 25 files, and with network latency for delete operations, larger buckets could exceed this limit and cause incomplete cleanup.
Consider testing with a representative bucket size, or document the expected maximum bucket size this function can handle within 15 seconds. You may need to increase the timeout or implement cursor-based resumption for very large buckets.
python/storage-cleaner/src/main.py (1)
7-10: Environment validation logic approved.The use of
throw_if_missingcorrectly validates required environment variables before proceeding with bucket operations.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
python/storage-cleaner/src/main.py (1)
9-14: Fix header access to prevent KeyError.Line 9 uses direct dictionary access
context.req.headers["x-appwrite-key"], which will raise aKeyErrorif the header is missing—before the check on Line 11 can execute. This exposes internal errors to clients.Apply this diff:
- api_key = context.req.headers["x-appwrite-key"] + api_key = context.req.headers.get("x-appwrite-key") if not api_key:
🧹 Nitpick comments (1)
python/storage-cleaner/src/appwrite.py (1)
17-24: Consider validating required environment variables.The
__init__method readsAPPWRITE_FUNCTION_API_ENDPOINTandAPPWRITE_FUNCTION_PROJECT_IDfrom the environment without validation. If these are missing, the Appwrite SDK client will likely fail with an unclear error during subsequent API calls.You can either:
- Let the SDK fail naturally (current approach—acceptable if calling code validates upstream).
- Add explicit validation here for clearer error messages:
def __init__(self, api_key: str): endpoint = os.getenv("APPWRITE_FUNCTION_API_ENDPOINT") project_id = os.getenv("APPWRITE_FUNCTION_PROJECT_ID") if not endpoint or not project_id: raise ValueError( "Missing required environment variables: " "APPWRITE_FUNCTION_API_ENDPOINT, APPWRITE_FUNCTION_PROJECT_ID" ) client = ( Client() .set_endpoint(endpoint) .set_project(project_id) .set_key(api_key) ) self.storage = Storage(client)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
python/storage-cleaner/requirements.txt(1 hunks)python/storage-cleaner/src/appwrite.py(1 hunks)python/storage-cleaner/src/main.py(1 hunks)python/storage-cleaner/src/utils.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-30T10:09:36.561Z
Learnt from: zeus2611
PR: appwrite/templates#338
File: python/storage-cleaner/src/utils.py:26-40
Timestamp: 2025-09-30T10:09:36.561Z
Learning: In the Python storage-cleaner template at python/storage-cleaner/src/utils.py, the throw_if_missing function intentionally treats 0 as an invalid value to prevent a retention period of 0, which would cause files to be deleted immediately and risk accidental data loss.
Applied to files:
python/storage-cleaner/src/utils.py
🧬 Code graph analysis (2)
python/storage-cleaner/src/appwrite.py (1)
python/storage-cleaner/src/utils.py (1)
get_expiry_date(9-23)
python/storage-cleaner/src/main.py (2)
python/storage-cleaner/src/appwrite.py (2)
AppwriteService(12-65)clean_bucket(26-65)python/storage-cleaner/src/utils.py (1)
throw_if_missing(26-41)
🪛 Ruff (0.13.1)
python/storage-cleaner/src/appwrite.py
45-47: Avoid specifying long messages outside the exception class
(TRY003)
46-46: Use explicit conversion flag
Replace with conversion flag
(RUF010)
56-56: Do not catch blind exception: Exception
(BLE001)
63-65: Avoid specifying long messages outside the exception class
(TRY003)
python/storage-cleaner/src/main.py
23-23: Do not catch blind exception: Exception
(BLE001)
python/storage-cleaner/src/utils.py
41-41: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (5)
python/storage-cleaner/requirements.txt (1)
1-1: LGTM!The pinned version constraint ensures reproducible builds and prevents breaking changes from future major releases.
python/storage-cleaner/src/main.py (1)
18-25: LGTM!The exception handling properly wraps bucket operations, distinguishing between validation errors (400) and unexpected failures (500) while logging details for debugging.
python/storage-cleaner/src/utils.py (2)
9-23: LGTM!The function uses timezone-aware datetime (Python 3.12+ compatible), handles invalid environment values gracefully, and returns a properly formatted ISO 8601 timestamp.
26-41: LGTM!The validation logic correctly treats
0as invalid to prevent immediate deletion and data loss. The inline comment on Line 36 documents this business rule for maintainers.python/storage-cleaner/src/appwrite.py (1)
41-65: LGTM!The error handling properly wraps API calls, tracks successes and failures, and provides detailed diagnostics when deletions fail. The loop correctly exits when no more files are returned.
…ing API key gracefully
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
python/storage-cleaner/src/main.py (1)
20-20: Consider a more informative success message.The message "Buckets cleaned" is plural but only one bucket is cleaned. Consider making it singular or adding details about the operation.
- return context.res.text("Buckets cleaned", 200) + return context.res.text("Bucket cleaned successfully", 200)Alternatively, if you want to return structured data:
- return context.res.text("Buckets cleaned", 200) + return context.res.json({"message": "Bucket cleaned successfully", "bucket_id": os.environ["APPWRITE_BUCKET_ID"]}, 200)python/storage-cleaner/src/appwrite.py (1)
50-58: File deletion loop handles errors well.The loop safely extracts file IDs, wraps deletions in error handling, and tracks both successes and failures. The use of the
file_idvariable in the exception handler correctly avoids the KeyError issue flagged in previous reviews.Optional improvement: Files without a
$idfield are silently skipped (line 53's conditional). Consider logging these cases for visibility:try: file_id = f.get("$id") if file_id: self.storage.delete_file(bucket_id, file_id) deleted_files_count += 1 + else: + print(f"Warning: File object missing $id in bucket {bucket_id}") except Exception as e: failed_files.append({"id": file_id, "error": str(e)})
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
python/storage-cleaner/src/appwrite.py(1 hunks)python/storage-cleaner/src/main.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
python/storage-cleaner/src/main.py (2)
python/storage-cleaner/src/appwrite.py (2)
AppwriteService(12-65)clean_bucket(26-65)python/storage-cleaner/src/utils.py (1)
throw_if_missing(26-41)
python/storage-cleaner/src/appwrite.py (1)
python/storage-cleaner/src/utils.py (1)
get_expiry_date(9-23)
🪛 Ruff (0.13.1)
python/storage-cleaner/src/main.py
23-23: Do not catch blind exception: Exception
(BLE001)
python/storage-cleaner/src/appwrite.py
45-47: Avoid specifying long messages outside the exception class
(TRY003)
46-46: Use explicit conversion flag
Replace with conversion flag
(RUF010)
56-56: Do not catch blind exception: Exception
(BLE001)
63-65: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (8)
python/storage-cleaner/src/main.py (3)
1-3: LGTM! Clean and minimal imports.The imports are appropriate for the serverless function's requirements.
9-14: Excellent API key validation!The safe header access using
.get()combined with the empty-value check properly prevents KeyError exceptions and provides clear error messages.
18-25: Exception handling properly addresses previous concerns.The try-except structure correctly catches and handles errors from the bucket cleaning operation, logs details internally, and returns sanitized error messages. The broad
Exceptioncatch is appropriate in this serverless context to prevent unhandled exceptions from crashing the function.Note: The static analysis warning about catching
Exceptionis a false positive here - broad exception handling is appropriate for serverless entry points to ensure graceful degradation.python/storage-cleaner/src/appwrite.py (5)
1-9: LGTM! Well-organized imports and documentation.The module docstring and imports are clean and appropriate for the storage cleanup service.
17-24: Client initialization looks correct.The fluent API usage for configuring the Appwrite client is clean and idiomatic. Ensure that the required environment variables (
APPWRITE_FUNCTION_API_ENDPOINTandAPPWRITE_FUNCTION_PROJECT_ID) are validated at the entry point (see comment in main.py).
26-39: Good query setup and pagination strategy.The batch size of 25 is reasonable for iterative deletion, and the counter initialization properly tracks success and failure states.
41-48: Robust error handling for list operation.The try-except block properly catches API failures, preserves the exception chain with
from e, and provides clear context. The broadExceptioncatch is appropriate here since the Appwrite SDK may raise various exception types.Note: Static analysis warnings (TRY003, RUF010) are pedantic in this context - the current implementation is clear and maintainable.
59-65: Appropriate error handling for partial failures.The loop termination logic is correct, and raising a
RuntimeErrorfor partial failures ensures callers are aware of issues. The detailed error message (including counts and failed file details) is valuable for debugging, despite the static analysis warning.Note: The current behavior raises an exception even if some files were successfully deleted. This is a conservative approach that ensures operators are notified of any failures. If you prefer to log warnings instead and only raise on total failure, consider adjusting the logic - but the current behavior is safer.
|
Hi @Meldiron, just bumping this thread. I'm happy to implement the validation for the system env vars if that is the standard for this repo, even if they are auto-injected. Let me know how you'd like to proceed! |
|
@zeus2611 Thanks for the contribution! Could you share a screen recording of your testing? |
…missing environment variables, and standardize JSON responses.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@python/storage-cleaner/src/appwrite.py`:
- Around line 42-69: The current while True loop around list_files/delete_file
can hang if deletions fail because the same files keep reappearing; modify the
cleanup loop in the method that calls list_files and delete_file to stop after a
batch with failures (or implement a max retry counter). Concretely: track a
retry or batch count (e.g., max_retries) and increment each loop, or check
failed_files after processing a batch and break out of the while loop if
failed_files is non-empty; ensure deleted_files_count is still updated and that
any remaining failed_files are returned/reported to the caller so the function
terminates instead of looping forever.
…lation script and a local runner for the storage cleaner.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@python/storage-cleaner/src/appwrite_service.py`:
- Around line 54-60: Validate the batch before creating futures: filter the
files list for entries with a truthy f.get("$id") and if that filtered list is
empty, set batch_failed = True (or break/return) to avoid reprocessing the same
batch; then build future_to_file from the filtered list and proceed to call
self.storage.delete_file within the ThreadPoolExecutor. Ensure you
reference/modify the variables and constructs seen here (files, batch_failed,
future_to_file, ThreadPoolExecutor, self.storage.delete_file) so missing $id
entries are treated as failures and the loop can terminate.
|
Hi @hmacr, Following up on your feedback, I've implemented the concurrent file deletion to optimize the cleanup process. I also took the opportunity to improve the error handling for missing environment variables and standardized the JSON responses to match the expected format. Here is a screen recording demonstrating the testing process: Screen.Recording.2026-02-03.at.11.25.41.AM.movTesting Overview:
Let me know if you would like any further changes! |
What does this PR do?
This PR implements a Python translation of the existing Node.js AppwriteService for cleaning storage buckets by removing files older than a retention period. It also provides a Python version of the serverless handler function that validates environment variables, authenticates the API key from request headers, and triggers the bucket cleanup process.
Test Plan
The changes were verified by running the Python service locally with mock environment variables and headers to ensure the cleanup function executes without error and correctly deletes files as intended. Additionally, error handling was tested by providing invalid or missing API keys to confirm appropriate error responses are returned.
Related PRs and Issues
#310
Have you read the Contributing Guidelines on issues?
Yes, I have read and followed the contributing guidelines.
Summary by CodeRabbit
New Features
Documentation
Chores