Skip to content

Commit 916c474

Browse files
authored
Refactor codebase for distribution (#57)
* Refactor codebase to to package utility modules for distribution * Remove unnecessary getenv assertion from langfuse client setup * Fix formatting issues in prompts and update README paths for consistency * Upgrade packages flagged by pip-audit and ignore ones that can't be upgraded * Rename `src` dir to `implementations` * Update dependencies * Restructure reference implementations to fix gradio hot-reload issue * Fix extra trailing quote * Rename `aieng-agents-utils` to `aieng-agents`
1 parent ee4470c commit 916c474

92 files changed

Lines changed: 1755 additions & 1169 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/code_checks.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,4 @@ jobs:
5757
virtual-environment: .venv/
5858
ignore-vulns: |
5959
GHSA-xm59-rqc7-hhvf
60+
GHSA-7gcm-g887-7qv7

README.md

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -11,31 +11,31 @@ This repository includes several modules, each showcasing a different aspect of
1111
**2. Frameworks: OpenAI Agents SDK**
1212
Showcases the use of the OpenAI agents SDK to reduce boilerplate and improve readability.
1313

14-
- **[2.1 ReAct Agent for RAG - OpenAI SDK](src/2_frameworks/1_react_rag/README.md)**
14+
- **[2.1 ReAct Agent for RAG - OpenAI SDK](implementations/2_frameworks/1_react_rag/README.md)**
1515
Implements the same Reason-and-Act agent using the high-level abstractions provided by the OpenAI Agents SDK. This approach reduces boilerplate and improves readability.
1616
The use of langfuse for making the agent less of a black-box is also introduced in this module.
1717

18-
- **[2.2 Multi-agent Setup for Deep Research](src/2_frameworks/2_multi_agent/README.md)**
18+
- **[2.2 Multi-agent Setup for Deep Research](implementations/2_frameworks/2_multi_agent/README.md)**
1919
Demo of a multi-agent architecture to improve efficiency on long-context inputs, reduce latency, and reduce LLM costs. Two versions are available- "efficient" and "verbose". For the build days, you should start from the "efficient" version as that provides greater flexibility and is easier to follow.
2020

2121
**3. Evals: Automated Evaluation Pipelines**
2222
Contains scripts and utilities for evaluating agent performance using LLM-as-a-judge and synthetic data generation. Includes tools for uploading datasets, running evaluations, and integrating with [Langfuse](https://langfuse.com/) for traceability.
2323

24-
- **[3.1 LLM-as-a-Judge](src/3_evals/1_llm_judge/README.md)**
24+
- **[3.1 LLM-as-a-Judge](implementations/3_evals/1_llm_judge/README.md)**
2525
Automated evaluation pipelines using LLM-as-a-judge with Langfuse integration.
2626

27-
- **[3.2 Evaluation on Synthetic Dataset](src/3_evals/2_synthetic_data/README.md)**
27+
- **[3.2 Evaluation on Synthetic Dataset](implementations/3_evals/2_synthetic_data/README.md)**
2828
Showcases the generation of synthetic evaluation data for testing agents.
2929

3030
We also provide "basic" no-framework implementations. These are meant to showcase how agents work behind the scene and are excessively verbose in the implementation. You should not use these as the basis for real projects.
3131

3232
**1. Basics: Reason-and-Act RAG**
3333
A minimal Reason-and-Act (ReAct) agent for knowledge retrieval, implemented without any agent framework.
3434

35-
- **[1.0 Search Demo](src/1_basics/0_search_demo/README.md)**
35+
- **[1.0 Search Demo](implementations/1_basics/0_search_demo/README.md)**
3636
A simple demo showing the capabilities (and limitations) of a knowledgebase search.
3737

38-
- **[1.1 ReAct Agent for RAG](src/1_basics/1_react_rag/README.md)**
38+
- **[1.1 ReAct Agent for RAG](implementations/1_basics/1_react_rag/README.md)**
3939
Basic ReAct agent for step-by-step retrieval and answer generation.
4040

4141
## Getting Started
@@ -48,7 +48,7 @@ In that case you can verify that the API keys work by running integration tests
4848
uv run --env-file .env pytest -sv tests/tool_tests/test_integration.py
4949
```
5050

51-
## Reference Implementations
51+
## Running the Reference Implementations
5252

5353
For "Gradio App" reference implementations, running the script would print out a "public URL" ending in `gradio.live` (might take a few seconds to appear.) To access the gradio app with the full streaming capabilities, copy and paste this `gradio.live` URL into a new browser tab.
5454

@@ -74,48 +74,48 @@ These warnings can be safely ignored, as they are the result of a bug in the ups
7474
Interactive knowledge base demo. Access the gradio interface in your browser to see if your knowledge base meets your expectations.
7575

7676
```bash
77-
uv run --env-file .env gradio src/1_basics/0_search_demo/app.py
77+
uv run --env-file .env gradio implementations/1_basics/0_search_demo/app.py
7878
```
7979

8080
Basic Reason-and-Act Agent- for demo purposes only.
8181

8282
As noted above, these are unnecessarily verbose for real applications.
8383

8484
```bash
85-
# uv run --env-file .env src/1_basics/1_react_rag/cli.py
86-
# uv run --env-file .env gradio src/1_basics/1_react_rag/app.py
85+
# uv run --env-file .env implementations/1_basics/1_react_rag/cli.py
86+
# uv run --env-file .env gradio implementations/1_basics/1_react_rag/app.py
8787
```
8888

8989
### 2. Frameworks
9090

9191
Reason-and-Act Agent without the boilerplate- using the OpenAI Agent SDK.
9292

9393
```bash
94-
uv run --env-file .env src/2_frameworks/1_react_rag/cli.py
95-
uv run --env-file .env gradio src/2_frameworks/1_react_rag/langfuse_gradio.py
94+
uv run --env-file .env implementations/2_frameworks/1_react_rag/cli.py
95+
uv run --env-file .env gradio implementations/2_frameworks/1_react_rag/langfuse_gradio.py
9696
```
9797

9898
Multi-agent examples, also via the OpenAI Agent SDK.
9999

100100
```bash
101-
uv run --env-file .env gradio src/2_frameworks/2_multi_agent/efficient.py
101+
uv run --env-file .env gradio implementations/2_frameworks/2_multi_agent/efficient.py
102102
# Verbose option - greater control over the agent flow, but less flexible.
103-
# uv run --env-file .env gradio src/2_frameworks/2_multi_agent/verbose.py
103+
# uv run --env-file .env gradio implementations/2_frameworks/2_multi_agent/verbose.py
104104
```
105105

106-
Python Code Interpreter demo- using the OpenAI Agent SDK, E2B for secure code sandbox, and LangFuse for observability. Refer to [src/2_frameworks/3_code_interpreter/README.md](src/2_frameworks/3_code_interpreter/README.md) for details.
106+
Python Code Interpreter demo- using the OpenAI Agent SDK, E2B for secure code sandbox, and LangFuse for observability. Refer to [implementations/2_frameworks/3_code_interpreter/README.md](implementations/2_frameworks/3_code_interpreter/README.md) for details.
107107

108-
MCP server integration example also via OpenAI Agents SDK with Gradio and Langfuse tracing. Refer to [src/2_frameworks/4_mcp/README.md](src/2_frameworks/4_mcp/README.md) for more details.
108+
MCP server integration example also via OpenAI Agents SDK with Gradio and Langfuse tracing. Refer to [implementations/2_frameworks/4_mcp/README.md](implementations/2_frameworks/4_mcp/README.md) for more details.
109109

110110
### 3. Evals
111111

112112
Synthetic data.
113113

114114
```bash
115115
uv run --env-file .env \
116-
-m src.3_evals.2_synthetic_data.synthesize_data \
116+
-m implementations.3_evals.2_synthetic_data.synthesize_data \
117117
--source_dataset hf://vector-institute/hotpotqa@d997ecf:train \
118-
--langfuse_dataset_name search-dataset-synthetic-20250609 \
118+
--langfuse_dataset_name search-dataset-synthetic \
119119
--limit 18
120120
```
121121

@@ -125,15 +125,15 @@ Quantify embedding diversity of synthetic data
125125
# Baseline: "Real" dataset
126126
uv run \
127127
--env-file .env \
128-
-m src.3_evals.2_synthetic_data.annotate_diversity \
128+
-m implementations.3_evals.2_synthetic_data.annotate_diversity \
129129
--langfuse_dataset_name search-dataset \
130130
--run_name cosine_similarity_bge_m3
131131

132132
# Synthetic dataset
133133
uv run \
134134
--env-file .env \
135-
-m src.3_evals.2_synthetic_data.annotate_diversity \
136-
--langfuse_dataset_name search-dataset-synthetic-20250609 \
135+
-m implementations.3_evals.2_synthetic_data.annotate_diversity \
136+
--langfuse_dataset_name search-dataset-synthetic \
137137
--run_name cosine_similarity_bge_m3
138138
```
139139

@@ -142,16 +142,16 @@ Visualize embedding diversity of synthetic data
142142
```bash
143143
uv run \
144144
--env-file .env \
145-
gradio src/3_evals/2_synthetic_data/gradio_visualize_diversity.py
145+
gradio implementations/3_evals/2_synthetic_data/gradio_visualize_diversity.py
146146
```
147147

148148
Run LLM-as-a-judge Evaluation on synthetic data
149149

150150
```bash
151151
uv run \
152152
--env-file .env \
153-
-m src.3_evals.1_llm_judge.run_eval \
154-
--langfuse_dataset_name search-dataset-synthetic-20250609 \
153+
-m implementations.3_evals.1_llm_judge.run_eval \
154+
--langfuse_dataset_name search-dataset-synthetic \
155155
--run_name enwiki_weaviate \
156156
--limit 18
157157
```

aieng-agents/.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.12

0 commit comments

Comments
 (0)