Skip to content

Refactor codebase to package utility modules for distribution#57

Merged
fcogidi merged 10 commits intomainfrom
fco/restructure_for_packaging
Mar 12, 2026
Merged

Refactor codebase to package utility modules for distribution#57
fcogidi merged 10 commits intomainfrom
fco/restructure_for_packaging

Conversation

@fcogidi
Copy link
Collaborator

@fcogidi fcogidi commented Jan 26, 2026

Summary

This PR converts utility modules/scripts into a package (aieng-agents-utils).

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • 🔧 Refactoring (no functional changes)
  • ⚡ Performance improvement
  • 🧪 Test improvements
  • 🔒 Security fix

Changes Made

New Utility Package (aieng-agents-utils)

  • Created a Python package with agent development utilities
  • Agent Tools Module: Code interpreter (E2B), Gemini grounding with Google Search, Weaviate knowledge base, Wikipedia news events fetcher
  • Data Processing: PDF to HuggingFace dataset converter with OCR, dataset chunking utilities, unified dataset loader
  • Async Utilities: Progress bars for async operations, rate limiting, concurrent task management
  • Client Management: Lifecycle manager for async clients (OpenAI, Weaviate) with proper cleanup
  • Gradio Integration: Message format converters between Gradio chatbot and OpenAI SDK
  • Langfuse Integration: OpenTelemetry-based tracing and observability setup
  • Environment Config: Type-safe Pydantic-based configuration management
  • Session Management: SQLite-backed persistent conversation sessions

CLI Tools

  • pdf_to_hf_dataset: Console script for converting PDFs to chunked HuggingFace datasets
    • Multimodal OCR using OpenAI-compatible APIs
    • Smart page filtering (TOC detection, front/back matter skipping)
    • Structured output support with heading detection
    • Token-aware chunking for embedding models
  • chunk_hf_dataset: Console script for re-chunking existing HuggingFace datasets

API Key Management

  • Create user-scoped API keys for the Gemini grounding proxy
  • Delete API keys by owner with filtering options
  • Support for batch operations via --owners-file
  • JSON and CSV output formats
  • Firestore-backed storage with metadata support
  • Usage limits and tracking per key

Documentation

  • Added README with installation instructions (uv/pip)

Package Configuration

  • Proper pyproject.toml with build system (hatchling)
  • Console script entry points for CLI tools
  • Complete dependency specifications
  • MIT license
  • Author information and project metadata

Testing

  • Tests pass locally (uv run pytest tests/)
  • Linting passes (uv run ruff check src_dir/)
  • Manual testing performed (describe below)

Manual testing details:

  1. Package structure: Verified all modules are properly importable
  2. Reference implementations: Manually ran all reference implementations to verify they all still work correctly
  3. Tests: Verified that all tests still pass
  4. CLI scripts: Verified that CLI scripts work

Checklist

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Documentation updated
  • No sensitive information (API keys, credentials) exposed

@fcogidi fcogidi requested a review from amrit110 January 26, 2026 18:59
@fcogidi fcogidi self-assigned this Jan 26, 2026
@amrit110
Copy link
Member

@fcogidi great initiative! Was super happy to see this PR. Will review it shortly.

@fcogidi fcogidi requested a review from Copilot January 26, 2026 19:08
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the codebase by extracting utility modules into a separate installable package (aieng-agents-utils), improving code organization and reusability across the Vector Institute Agent Bootcamp implementations.

Changes:

  • Created a new Python package structure with proper build configuration, console scripts, and comprehensive documentation
  • Migrated all utility modules (tools, data processing, async utilities, Langfuse integration, etc.) to the new package namespace
  • Updated all import statements across reference implementations to use the new package structure

Reviewed changes

Copilot reviewed 63 out of 77 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pyproject.toml Updated project metadata and dependencies to use the new workspace package
aieng-agents-utils/pyproject.toml Added package configuration with build system, dependencies, and console scripts
aieng-agents-utils/README.md Created comprehensive documentation for the new utility package
aieng-agents-utils/aieng/agents/init.py Created main package entry point with public API exports
aieng-agents-utils/aieng/agents/tools/init.py Organized tool exports with proper all definitions
aieng-agents-utils/aieng/agents/prompts.py Consolidated system prompts into centralized module
src//app.py, src//cli.py Updated imports to reference new package namespace
tests/tool_tests/test_integration.py Updated test imports to use new package structure
aieng-agents-utils/tests/README.md Added test documentation with updated command paths

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@amrit110
Copy link
Member

amrit110 commented Feb 18, 2026

@fcogidi, Can the src directory be renamed to implementations to conform with the template? Also, i'm guessing src itself is not a package to be built?

@fcogidi fcogidi marked this pull request as draft March 11, 2026 13:57
@fcogidi fcogidi marked this pull request as ready for review March 11, 2026 19:49
@fcogidi fcogidi requested a review from Copilot March 11, 2026 19:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 71 out of 92 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (4)

aieng-agents-utils/aieng/agents/tools/README.md:8

  • This usage example points to aieng-agents-utils/aieng/tools/news_events.py, but the module is located under aieng-agents-utils/aieng/agents/tools/news_events.py (or can be run via python -m aieng.agents.tools.news_events). As written, the command will fail.
    aieng-agents-utils/aieng/agents/web_search/README.md:117
  • The README still instructs running uvicorn utils.web_search.app:app, but the module path is now aieng.agents.web_search.app:app. Updating the command prevents local-dev instructions from breaking.
    aieng-agents-utils/aieng/agents/data/pdf_to_hf_dataset.py:20
  • The docstring examples invoke pdf_to_hf_dataset.py, but this module is exposed as a console script (pdf_to_hf_dataset) via project.scripts. Using the script name (or python -m aieng.agents.data.pdf_to_hf_dataset) will make the examples runnable after installation.
    aieng-agents-utils/aieng/agents/env_vars.py:72
  • The docstring example imports Configs from implementations.utils.env_vars, but the class lives in aieng.agents.env_vars (and is re-exported as aieng.agents.Configs). Updating the example avoids broken copy/paste for users.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@amrit110 amrit110 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move looks pretty straight-forward. Just wondering about the package name for distribution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aieng-agents-utils rather than aieng-agents? Do we just see this is a support/utility package?

Copy link
Collaborator Author

@fcogidi fcogidi Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing it will look like:

from aieng.agents import ...

But yeah, I think it's more a utility package. It's a bunch of helpful functions and tools for building agents, not an agent framework.
I guess aieng-agents is more consistent with the import pattern, plus a lot of the packages under the aieng namespace will be utility packages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the package to aieng-agents as that keeps things consistent. I still see it as a utility package though.

@fcogidi fcogidi merged commit 916c474 into main Mar 12, 2026
4 checks passed
@fcogidi fcogidi deleted the fco/restructure_for_packaging branch March 12, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants