This repository provides a unified and extensible framework for running and organizing evaluations across multiple LLM evaluation tools such as lm-evaluation-harness, HELM, etc.
- Python 3.12 or higher
uv
- Install the required dependencies:
uv sync| Name | Name | Last commit date | ||
|---|---|---|---|---|
This repository provides a unified and extensible framework for running and organizing evaluations across multiple LLM evaluation tools such as lm-evaluation-harness, HELM, etc.
uvuv sync