Skip to content

feat: Add S3 support for artifacts#29

Open
wmsnp wants to merge 3 commits intogoogle:mainfrom
wmsnp:feature/s3-artifact-service
Open

feat: Add S3 support for artifacts#29
wmsnp wants to merge 3 commits intogoogle:mainfrom
wmsnp:feature/s3-artifact-service

Conversation

@wmsnp
Copy link
Copy Markdown

@wmsnp wmsnp commented Nov 21, 2025

Link to Issue or Description of Change

Description

Introduce S3ArtifactService to provide an self-hosted Artifact storage solution

Solution

  • Supports asynchronous upload and download of S3 artifacts
  • Includes unit tests covering core functionality
  • Depends on aioboto3

Testing Plan

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @wmsnp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the artifact storage capabilities by integrating S3-compatible object storage. It provides a robust, self-hosted option for managing application artifacts, ensuring data persistence and version control. The new service is designed for high performance through asynchronous operations and includes comprehensive testing to guarantee reliability.

Highlights

  • S3 Artifact Service Introduction: Introduced a new S3ArtifactService class to provide a self-hosted solution for storing and retrieving artifacts using Amazon S3 or S3-compatible services.
  • Asynchronous Operations: The S3ArtifactService supports asynchronous upload and download of artifacts, leveraging aioboto3 for non-blocking I/O operations.
  • Artifact Versioning and Metadata: Implemented atomic versioning for artifacts using S3's If-None-Match header, along with support for custom metadata which is JSON-serialized.
  • Comprehensive Unit Tests: Added a new test file with extensive unit tests covering the core functionality of the S3ArtifactService, including saving, loading, deleting, and listing artifacts and their versions, as well as handling user-scoped artifacts.
  • Dependency Update: Added aioboto3>=15.5.0 as a new dependency to pyproject.toml to enable asynchronous interaction with S3.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an S3ArtifactService to support artifact storage on S3-compatible services. The implementation is robust, featuring asynchronous operations and atomic versioning with a retry mechanism. The accompanying unit tests are comprehensive and effectively mock the S3 interactions. I've identified a few areas for improvement, including a performance optimization for listing artifact versions, adding a test case for the version conflict retry logic, and some minor code cleanup in the project dependencies and tests. Overall, this is a solid contribution.

Comment thread pyproject.toml
Comment thread src/google/adk_community/artifacts/s3_artifact_service.py
Comment thread tests/unittests/artifacts/test_artifact_service.py
Comment thread tests/unittests/artifacts/test_artifact_service.py
Comment thread tests/unittests/artifacts/test_artifact_service.py
@thedayisntgray
Copy link
Copy Markdown

Can we git this PR or the other one referenced merged?

Is there anything I can do to help speed this along?

@miyannishar
Copy link
Copy Markdown

Hey @wmsnp — I've opened #115 which supersedes both this PR and my earlier #36.

Your implementation here was a big influence — I adopted both the aioboto3 pattern for native async I/O and the IfNoneMatch conditional writes for atomic versioning. The key additions in the new PR are:

  • Optional [s3] dependency group (so aioboto3 isn't required if you're not using S3)
  • User-scoped artifact key listing with user: prefix preservation
  • Comprehensive standalone test suite with full async mock infrastructure
  • README and documentation

Thanks for pioneering the async approach — it clearly belongs in the final version. 🙏

@miyannishar miyannishar mentioned this pull request Apr 18, 2026
8 tasks
@wmsnp
Copy link
Copy Markdown
Author

wmsnp commented Apr 19, 2026

Hey @wmsnp — I've opened #115 which supersedes both this PR and my earlier #36.

Your implementation here was a big influence — I adopted both the aioboto3 pattern for native async I/O and the IfNoneMatch conditional writes for atomic versioning. The key additions in the new PR are:

  • Optional [s3] dependency group (so aioboto3 isn't required if you're not using S3)
  • User-scoped artifact key listing with user: prefix preservation
  • Comprehensive standalone test suite with full async mock infrastructure
  • README and documentation

Thanks for pioneering the async approach — it clearly belongs in the final version. 🙏

Thanks — really appreciate that. One thing I found later, though, is that not all S3-compatible implementations support If-None-Match, so that part may need a bit of care for compatibility.

@DeanChensj
Copy link
Copy Markdown
Collaborator

@gemini-cli /review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

🤖 Hi @DeanChensj, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! The S3ArtifactService implementation is solid and follows the established patterns. I've left a few comments regarding:

  1. Performance: The list_artifact_versions method performs a head_object call for every version, which could be a bottleneck.
  2. Metadata Limits: S3 has a 2KB limit on metadata that we should be aware of when flattening custom metadata.
  3. Robustness: A finite default for retries in save_artifact might be safer than infinite.

Overall, great work!

async def _client(self):
session = await self._session()
async with session.client(service_name="s3", **self.aws_configs) as s3:
yield s3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 metadata has a total size limit of 2 KB (including keys and values). Since custom_metadata is flattened into JSON strings here, large metadata dictionaries might cause put_object to fail. It might be worth adding a check or a note about this limit.

elif artifact.file_data:
raise NotImplementedError(
"Saving artifact with file_data is not supported yet in"
" S3ArtifactService."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With save_artifact_max_retries set to -1 (infinite), this loop could theoretically run forever if there's a consistent race condition or a logic error in version calculation. A high but finite default might be safer.

metadata = head.get("Metadata", {})

canonical_uri = f"s3://{self.bucket_name}/{obj['Key']}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling head_object for every version in a loop will be very slow if an artifact has many versions (O(N) network calls). S3 doesn't return custom metadata in list_objects_v2, so this might be necessary if metadata is required, but we should consider if there's a way to cache or avoid this for large version sets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add S3 support to manage artifacts.

4 participants