VectorInstitute · fcogidi · Mar 12, 2026 · Jan 26, 2026 · Jan 26, 2026 · Jan 26, 2026
diff --git a/.github/workflows/code_checks.yml b/.github/workflows/code_checks.yml
@@ -57,3 +57,4 @@ jobs:
           virtual-environment: .venv/
           ignore-vulns: |
             GHSA-xm59-rqc7-hhvf
+            GHSA-7gcm-g887-7qv7
diff --git a/README.md b/README.md
@@ -11,31 +11,31 @@ This repository includes several modules, each showcasing a different aspect of
 **2. Frameworks: OpenAI Agents SDK**
   Showcases the use of the OpenAI agents SDK to reduce boilerplate and improve readability.
 
-- **[2.1 ReAct Agent for RAG - OpenAI SDK](src/2_frameworks/1_react_rag/README.md)**
+- **[2.1 ReAct Agent for RAG - OpenAI SDK](implementations/2_frameworks/1_react_rag/README.md)**
   Implements the same Reason-and-Act agent using the high-level abstractions provided by the OpenAI Agents SDK. This approach reduces boilerplate and improves readability.
   The use of langfuse for making the agent less of a black-box is also introduced in this module.
 
-- **[2.2 Multi-agent Setup for Deep Research](src/2_frameworks/2_multi_agent/README.md)**
+- **[2.2 Multi-agent Setup for Deep Research](implementations/2_frameworks/2_multi_agent/README.md)**
   Demo of a multi-agent architecture to improve efficiency on long-context inputs, reduce latency, and reduce LLM costs. Two versions are available- "efficient" and "verbose". For the build days, you should start from the "efficient" version as that provides greater flexibility and is easier to follow.
 
 **3. Evals: Automated Evaluation Pipelines**
   Contains scripts and utilities for evaluating agent performance using LLM-as-a-judge and synthetic data generation. Includes tools for uploading datasets, running evaluations, and integrating with [Langfuse](https://langfuse.com/) for traceability.
 
-- **[3.1 LLM-as-a-Judge](src/3_evals/1_llm_judge/README.md)**
+- **[3.1 LLM-as-a-Judge](implementations/3_evals/1_llm_judge/README.md)**
   Automated evaluation pipelines using LLM-as-a-judge with Langfuse integration.
 
-- **[3.2 Evaluation on Synthetic Dataset](src/3_evals/2_synthetic_data/README.md)**
+- **[3.2 Evaluation on Synthetic Dataset](implementations/3_evals/2_synthetic_data/README.md)**
   Showcases the generation of synthetic evaluation data for testing agents.
 
 We also provide "basic" no-framework implementations. These are meant to showcase how agents work behind the scene and are excessively verbose in the implementation. You should not use these as the basis for real projects.
 
 **1. Basics: Reason-and-Act RAG**
 A minimal Reason-and-Act (ReAct) agent for knowledge retrieval, implemented without any agent framework.
 
-- **[1.0 Search Demo](src/1_basics/0_search_demo/README.md)**
+- **[1.0 Search Demo](implementations/1_basics/0_search_demo/README.md)**
   A simple demo showing the capabilities (and limitations) of a knowledgebase search.
 
-- **[1.1 ReAct Agent for RAG](src/1_basics/1_react_rag/README.md)**
+- **[1.1 ReAct Agent for RAG](implementations/1_basics/1_react_rag/README.md)**
   Basic ReAct agent for step-by-step retrieval and answer generation.
 
 ## Getting Started
@@ -48,7 +48,7 @@ In that case you can verify that the API keys work by running integration tests
 uv run --env-file .env pytest -sv tests/tool_tests/test_integration.py
 ```
 
-## Reference Implementations
+## Running the Reference Implementations
 
 For "Gradio App" reference implementations, running the script would print out a "public URL" ending in `gradio.live` (might take a few seconds to appear.) To access the gradio app with the full streaming capabilities, copy and paste this `gradio.live` URL into a new browser tab.
 
@@ -74,48 +74,48 @@ These warnings can be safely ignored, as they are the result of a bug in the ups
 Interactive knowledge base demo. Access the gradio interface in your browser to see if your knowledge base meets your expectations.
 
 ```bash
-uv run --env-file .env gradio src/1_basics/0_search_demo/app.py
+uv run --env-file .env gradio implementations/1_basics/0_search_demo/app.py
 ```
 
 Basic Reason-and-Act Agent- for demo purposes only.
 
 As noted above, these are unnecessarily verbose for real applications.
 
 ```bash
-# uv run --env-file .env src/1_basics/1_react_rag/cli.py
-# uv run --env-file .env gradio src/1_basics/1_react_rag/app.py
+# uv run --env-file .env implementations/1_basics/1_react_rag/cli.py
+# uv run --env-file .env gradio implementations/1_basics/1_react_rag/app.py
 ```
 
 ### 2. Frameworks
 
 Reason-and-Act Agent without the boilerplate- using the OpenAI Agent SDK.
 
 ```bash
-uv run --env-file .env src/2_frameworks/1_react_rag/cli.py
-uv run --env-file .env gradio src/2_frameworks/1_react_rag/langfuse_gradio.py
+uv run --env-file .env implementations/2_frameworks/1_react_rag/cli.py
+uv run --env-file .env gradio implementations/2_frameworks/1_react_rag/langfuse_gradio.py
 ```
 
 Multi-agent examples, also via the OpenAI Agent SDK.
 
 ```bash
-uv run --env-file .env gradio src/2_frameworks/2_multi_agent/efficient.py
+uv run --env-file .env gradio implementations/2_frameworks/2_multi_agent/efficient.py
 # Verbose option - greater control over the agent flow, but less flexible.
-# uv run --env-file .env gradio src/2_frameworks/2_multi_agent/verbose.py
+# uv run --env-file .env gradio implementations/2_frameworks/2_multi_agent/verbose.py
 ```
 
-Python Code Interpreter demo- using the OpenAI Agent SDK, E2B for secure code sandbox, and LangFuse for observability. Refer to [src/2_frameworks/3_code_interpreter/README.md](src/2_frameworks/3_code_interpreter/README.md) for details.
+Python Code Interpreter demo- using the OpenAI Agent SDK, E2B for secure code sandbox, and LangFuse for observability. Refer to [implementations/2_frameworks/3_code_interpreter/README.md](implementations/2_frameworks/3_code_interpreter/README.md) for details.
 
-MCP server integration example also via OpenAI Agents SDK with Gradio and Langfuse tracing. Refer to [src/2_frameworks/4_mcp/README.md](src/2_frameworks/4_mcp/README.md) for more details.
+MCP server integration example also via OpenAI Agents SDK with Gradio and Langfuse tracing. Refer to [implementations/2_frameworks/4_mcp/README.md](implementations/2_frameworks/4_mcp/README.md) for more details.
 
 ### 3. Evals
 
 Synthetic data.
 
 ```bash
 uv run --env-file .env \
--m src.3_evals.2_synthetic_data.synthesize_data \
+-m implementations.3_evals.2_synthetic_data.synthesize_data \
 --source_dataset hf://vector-institute/hotpotqa@d997ecf:train \
---langfuse_dataset_name search-dataset-synthetic-20250609 \
+--langfuse_dataset_name search-dataset-synthetic \
 --limit 18
 ```
 
@@ -125,15 +125,15 @@ Quantify embedding diversity of synthetic data
 # Baseline: "Real" dataset
 uv run \
 --env-file .env \
--m src.3_evals.2_synthetic_data.annotate_diversity \
+-m implementations.3_evals.2_synthetic_data.annotate_diversity \
 --langfuse_dataset_name search-dataset \
 --run_name cosine_similarity_bge_m3
 
 # Synthetic dataset
 uv run \
 --env-file .env \
--m src.3_evals.2_synthetic_data.annotate_diversity \
---langfuse_dataset_name search-dataset-synthetic-20250609 \
+-m implementations.3_evals.2_synthetic_data.annotate_diversity \
+--langfuse_dataset_name search-dataset-synthetic \
 --run_name cosine_similarity_bge_m3
 ```
 
@@ -142,16 +142,16 @@ Visualize embedding diversity of synthetic data
 ```bash
 uv run \
 --env-file .env \
-gradio src/3_evals/2_synthetic_data/gradio_visualize_diversity.py
+gradio implementations/3_evals/2_synthetic_data/gradio_visualize_diversity.py
 ```
 
 Run LLM-as-a-judge Evaluation on synthetic data
 
 ```bash
 uv run \
 --env-file .env \
--m src.3_evals.1_llm_judge.run_eval \
---langfuse_dataset_name search-dataset-synthetic-20250609 \
+-m implementations.3_evals.1_llm_judge.run_eval \
+--langfuse_dataset_name search-dataset-synthetic \
 --run_name enwiki_weaviate \
 --limit 18
 ```

diff --git a/aieng-agents/.python-version b/aieng-agents/.python-version
@@ -0,0 +1 @@
+3.12