By Matt Stammers
25/11/2025 - Presented at HDUG, London
An agent is a large language model(LLM)/foundation model(FM), which is capable of using at least one tool and can also learn from past failures. This differentiates it from an LLM/FM because it can iteratively improve its own performance and also has 'agency' ie. it can complete tasks in the digital world for the user using tools. In this case a single tool.
These demo agents illustrate how an agent can solve a very simple HPO mapping problem using clinical psuedo-data.
I have picked out a handful of the experiments to illustrate different agent behaviours for this repo. I ran over 25 different experiments on 10 different LLMs to come up with these demo scripts.
They demonstrate an MVP version of how the problem can be solved and were given in demonstration at HDUG 2025 - London @ St Mary's House, Euston, as part of the data for R&D program event.
Quality Ranking:
- The deepseek-14b agent is a very early experiment and tends to work, but it doesn't do much. It was the third experiment I ran.
- The gpt-oss20b agent is a middle-stage smolagent experiment, which works some of the time but illustrates nicely how unreliable these agents can be in terms of performance.
- The qwenv3-8b agent is the best performing and generally succeeds to a moderate degree within 4-10 steps. However, it sometimes goes down rabbit holes.
All of them are slow, and none are intended for any type of production purpose - only to illustrate what even singular non-finetuned agents are capable of.
- git clone this repo to a location of your choice, which is GPU-enabled or it will run very slowly
- Go to: https://hpo.jax.org/data/ontology and download hp.json (then rename to hpo to avoid confusion and drop into /src)
- Install UV and Python if not already available
- call:
uv sync
python src/{agent_name}The agent should then run if you have a working internet connection. If you run into problems, please raise an issue. This has only been tested in Linux.
Unfortunately, I have to give these disclaimers because I am a clinician and need to protect my employer.
-
This is not a medical device, and it is not a clinical tool - it is a basic demo of how one of the tools we are building works. It is not intended for any production purpose whatsoever.
-
Performance is unstable because it is a single-agent implementation, and the models being used are not even that good. Feel free to switch them out, and performance should improve. I have not included benchmarks in this version, but the F1 score is about 60% in this demo (so not very good, but also not much worse than many humans would perform the task.)
-
UHSFT and the author give no guarantees and take no legal responsibility for any use of this code, which is provided for free under an MIT licence purely to demonstrate the principles of how a single HPO agent works and to allow others to get started with using agents for similar tasks.
-
Finally, I know that there are bugs that the agent finds in the first attempts, and that this could be easily solved with prompts. These were left in intentionally to visually demonstrate how agents are superior to static LLM/FM-based workflows because they can improve their own performance. The non-demo version is obviously not like this.
Funder: Data for R&D Driver Program, NHS England.
