You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -15,23 +18,28 @@ This repository contains the official code for the paper:
15
18
Tool use has turned large language models (LLMs) into powerful agents that can perform complex multi-step tasks by dynamically utilising external software components. However, these tools must be implemented in advance by human developers, hindering the applicability of LLM agents in domains which demand large numbers of highly specialised tools, like in life sciences and medicine. Motivated by the growing trend of scientific studies accompanied by public code repositories, we propose ToolMaker, a novel agentic framework that autonomously transforms papers with code into LLM-compatible tools. Given a short task description and a repository URL, ToolMaker autonomously installs required dependencies and generates code to perform the task, using a closed-loop self-correction mechanism to iteratively diagnose and rectify errors. To evaluate our approach, we introduce a benchmark comprising 15 diverse and complex computational tasks spanning both medical and non-medical domains with over 100 unit tests to objectively assess tool correctness and robustness. ToolMaker correctly implements 80% of the tasks, substantially outperforming current state-of-the-art software engineering agents. ToolMaker therefore is a step towards fully autonomous agent-based scientific workflows.
16
19
</details>
17
20
18
-

19
21
22
+
## News
23
+
-**[May 2025]** Our [paper](https://arxiv.org/abs/2502.11705) has been accepted at [ACL 2025](https://2025.aclweb.org/)! 🎉
24
+
-**[Feb 2025]** Initial code release.
20
25
21
26
> [!NOTE]
22
-
> This is an experimental release of ToolMaker that is compatible with the [ToolArena](https://github.com/KatherLab/ToolArena) benchmark. ToolArena includes many more tools than the original TM-Bench which was released as part of ToolMaker. As such, the tasks are no longer defined in this repository, but in the ToolArena repository (though imported into this repository via the [`benchmark`](benchmark/) submodule, which points to ToolArena).
27
+
> This is an experimental release of ToolMaker that is compatible with the [ToolArena](https://github.com/KatherLab/ToolArena) benchmark. ToolArena includes significantly more tools than the original TM-Bench which was released as part of ToolMaker. As such, the tasks are no longer defined in this repository, but in the ToolArena repository (though imported into this repository via the [`benchmark`](benchmark/) submodule, which points to ToolArena).
23
28
>
24
29
> You can still access the original code release of ToolMaker including the original TM-Bench benchmark in the [`original`](https://github.com/KatherLab/ToolMaker/tree/original) branch.
25
30
26
-
## News
27
31
28
-
-**[May 2025]** Our [paper](https://arxiv.org/abs/2502.11705) has been accepted at [ACL 2025](https://2025.aclweb.org/)! 🎉
29
-
-**[Feb 2025]** Initial code release
32
+
## Overview
33
+
ToolMaker is an agentic workflow that turns GitHub repositories into LLM-compatible tools. Given a short task description and a repository URL, ToolMaker autonomously installs required dependencies and generates code to perform the task, using a closed-loop self-correction mechanism to iteratively diagnose and rectify errors.
This will create a `my_uni_tool.html` file in the current directory which you can view in your browser.
80
88
89
+
The diagram below provides an illustration of the ToolMaker workflow in action.
90
+

91
+
81
92
## Benchmarking
82
93
To run the unit tests that constitute the benchmark, use the following command (note that this requires the `benchmark` dependency group to be installed via `uv sync --group benchmark`):
0 commit comments