Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/code_checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,4 @@ jobs:
virtual-environment: .venv/
ignore-vulns: |
GHSA-xm59-rqc7-hhvf
GHSA-7gcm-g887-7qv7
48 changes: 24 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,31 +11,31 @@ This repository includes several modules, each showcasing a different aspect of
**2. Frameworks: OpenAI Agents SDK**
Showcases the use of the OpenAI agents SDK to reduce boilerplate and improve readability.

- **[2.1 ReAct Agent for RAG - OpenAI SDK](src/2_frameworks/1_react_rag/README.md)**
- **[2.1 ReAct Agent for RAG - OpenAI SDK](implementations/2_frameworks/1_react_rag/README.md)**
Implements the same Reason-and-Act agent using the high-level abstractions provided by the OpenAI Agents SDK. This approach reduces boilerplate and improves readability.
The use of langfuse for making the agent less of a black-box is also introduced in this module.

- **[2.2 Multi-agent Setup for Deep Research](src/2_frameworks/2_multi_agent/README.md)**
- **[2.2 Multi-agent Setup for Deep Research](implementations/2_frameworks/2_multi_agent/README.md)**
Demo of a multi-agent architecture to improve efficiency on long-context inputs, reduce latency, and reduce LLM costs. Two versions are available- "efficient" and "verbose". For the build days, you should start from the "efficient" version as that provides greater flexibility and is easier to follow.

**3. Evals: Automated Evaluation Pipelines**
Contains scripts and utilities for evaluating agent performance using LLM-as-a-judge and synthetic data generation. Includes tools for uploading datasets, running evaluations, and integrating with [Langfuse](https://langfuse.com/) for traceability.

- **[3.1 LLM-as-a-Judge](src/3_evals/1_llm_judge/README.md)**
- **[3.1 LLM-as-a-Judge](implementations/3_evals/1_llm_judge/README.md)**
Automated evaluation pipelines using LLM-as-a-judge with Langfuse integration.

- **[3.2 Evaluation on Synthetic Dataset](src/3_evals/2_synthetic_data/README.md)**
- **[3.2 Evaluation on Synthetic Dataset](implementations/3_evals/2_synthetic_data/README.md)**
Showcases the generation of synthetic evaluation data for testing agents.

We also provide "basic" no-framework implementations. These are meant to showcase how agents work behind the scene and are excessively verbose in the implementation. You should not use these as the basis for real projects.

**1. Basics: Reason-and-Act RAG**
A minimal Reason-and-Act (ReAct) agent for knowledge retrieval, implemented without any agent framework.

- **[1.0 Search Demo](src/1_basics/0_search_demo/README.md)**
- **[1.0 Search Demo](implementations/1_basics/0_search_demo/README.md)**
A simple demo showing the capabilities (and limitations) of a knowledgebase search.

- **[1.1 ReAct Agent for RAG](src/1_basics/1_react_rag/README.md)**
- **[1.1 ReAct Agent for RAG](implementations/1_basics/1_react_rag/README.md)**
Basic ReAct agent for step-by-step retrieval and answer generation.

## Getting Started
Expand All @@ -48,7 +48,7 @@ In that case you can verify that the API keys work by running integration tests
uv run --env-file .env pytest -sv tests/tool_tests/test_integration.py
```

## Reference Implementations
## Running the Reference Implementations

For "Gradio App" reference implementations, running the script would print out a "public URL" ending in `gradio.live` (might take a few seconds to appear.) To access the gradio app with the full streaming capabilities, copy and paste this `gradio.live` URL into a new browser tab.

Expand All @@ -74,48 +74,48 @@ These warnings can be safely ignored, as they are the result of a bug in the ups
Interactive knowledge base demo. Access the gradio interface in your browser to see if your knowledge base meets your expectations.

```bash
uv run --env-file .env gradio src/1_basics/0_search_demo/app.py
uv run --env-file .env gradio implementations/1_basics/0_search_demo/app.py
```

Basic Reason-and-Act Agent- for demo purposes only.

As noted above, these are unnecessarily verbose for real applications.

```bash
# uv run --env-file .env src/1_basics/1_react_rag/cli.py
# uv run --env-file .env gradio src/1_basics/1_react_rag/app.py
# uv run --env-file .env implementations/1_basics/1_react_rag/cli.py
# uv run --env-file .env gradio implementations/1_basics/1_react_rag/app.py
```

### 2. Frameworks

Reason-and-Act Agent without the boilerplate- using the OpenAI Agent SDK.

```bash
uv run --env-file .env src/2_frameworks/1_react_rag/cli.py
uv run --env-file .env gradio src/2_frameworks/1_react_rag/langfuse_gradio.py
uv run --env-file .env implementations/2_frameworks/1_react_rag/cli.py
uv run --env-file .env gradio implementations/2_frameworks/1_react_rag/langfuse_gradio.py
```

Multi-agent examples, also via the OpenAI Agent SDK.

```bash
uv run --env-file .env gradio src/2_frameworks/2_multi_agent/efficient.py
uv run --env-file .env gradio implementations/2_frameworks/2_multi_agent/efficient.py
# Verbose option - greater control over the agent flow, but less flexible.
# uv run --env-file .env gradio src/2_frameworks/2_multi_agent/verbose.py
# uv run --env-file .env gradio implementations/2_frameworks/2_multi_agent/verbose.py
```

Python Code Interpreter demo- using the OpenAI Agent SDK, E2B for secure code sandbox, and LangFuse for observability. Refer to [src/2_frameworks/3_code_interpreter/README.md](src/2_frameworks/3_code_interpreter/README.md) for details.
Python Code Interpreter demo- using the OpenAI Agent SDK, E2B for secure code sandbox, and LangFuse for observability. Refer to [implementations/2_frameworks/3_code_interpreter/README.md](implementations/2_frameworks/3_code_interpreter/README.md) for details.

MCP server integration example also via OpenAI Agents SDK with Gradio and Langfuse tracing. Refer to [src/2_frameworks/4_mcp/README.md](src/2_frameworks/4_mcp/README.md) for more details.
MCP server integration example also via OpenAI Agents SDK with Gradio and Langfuse tracing. Refer to [implementations/2_frameworks/4_mcp/README.md](implementations/2_frameworks/4_mcp/README.md) for more details.

### 3. Evals

Synthetic data.

```bash
uv run --env-file .env \
-m src.3_evals.2_synthetic_data.synthesize_data \
-m implementations.3_evals.2_synthetic_data.synthesize_data \
--source_dataset hf://vector-institute/hotpotqa@d997ecf:train \
--langfuse_dataset_name search-dataset-synthetic-20250609 \
--langfuse_dataset_name search-dataset-synthetic \
--limit 18
```

Expand All @@ -125,15 +125,15 @@ Quantify embedding diversity of synthetic data
# Baseline: "Real" dataset
uv run \
--env-file .env \
-m src.3_evals.2_synthetic_data.annotate_diversity \
-m implementations.3_evals.2_synthetic_data.annotate_diversity \
--langfuse_dataset_name search-dataset \
--run_name cosine_similarity_bge_m3

# Synthetic dataset
uv run \
--env-file .env \
-m src.3_evals.2_synthetic_data.annotate_diversity \
--langfuse_dataset_name search-dataset-synthetic-20250609 \
-m implementations.3_evals.2_synthetic_data.annotate_diversity \
--langfuse_dataset_name search-dataset-synthetic \
--run_name cosine_similarity_bge_m3
```

Expand All @@ -142,16 +142,16 @@ Visualize embedding diversity of synthetic data
```bash
uv run \
--env-file .env \
gradio src/3_evals/2_synthetic_data/gradio_visualize_diversity.py
gradio implementations/3_evals/2_synthetic_data/gradio_visualize_diversity.py
```

Run LLM-as-a-judge Evaluation on synthetic data

```bash
uv run \
--env-file .env \
-m src.3_evals.1_llm_judge.run_eval \
--langfuse_dataset_name search-dataset-synthetic-20250609 \
-m implementations.3_evals.1_llm_judge.run_eval \
--langfuse_dataset_name search-dataset-synthetic \
--run_name enwiki_weaviate \
--limit 18
```
Expand Down
1 change: 1 addition & 0 deletions aieng-agents/.python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12
Loading
Loading