This repository contains the complete source code and pipeline for the Master's thesis:
“Parameter-Efficient Adaptation of Open-Source Language Models for Clinical MRI Protocol Automation.”
This thesis examines the capability of modern open-source large language models (LLMs) to automate MRI protocol assignment. It focuses on adapting these models efficiently for this specialized clinical task while ensuring local deployment to maintain data privacy and transparency.
The project is structured as an end-to-end pipeline:
-
Data Preprocessing (
data_processing.py)
Loads clinical data from Excel files and formats each case into a structured conversation:- System message: Defines the task
- User message: Provides patient data and available MRI programs
- Assistant message: Contains the ground-truth sequences in JSON format
-
Model Adaptation (
train.py)
Fine-tunes a pre-trained base LLM (e.g., Llama, MedGemma) using parameter-efficient fine-tuning (PEFT) methods, enabling the model to output clinically valid sequences in JSON. -
Inference (
inference.py)
Loads the fine-tuned (or pre-trained) model, processes new patient data, and generates MRI sequence recommendations. -
Evaluation (
evaluate.py)
Compares model outputs with expert-annotated ground truth using metrics such as exact fuzzy matching, edit distance, and semantic similarity (BioBERT and MiniLM).
- PEFT Methods: Supports LoRA, VeRA, and Prompt Tuning.
- Model-Agnostic: Works with open-source decoder-only LLMs (e.g., Llama, Phi, MedGemma).
- Main Loss Function: Cross Entropy loss.
- Custom Loss Function: Implements
CustomTrainerusing Focal Loss and example-level weighting for rare medical sequences. - Structured Output: Forces models to produce clean, parsable JSON for automation.
- Robust Evaluation: Multi-metric evaluation to measure clinical utility beyond simple accuracy.
peft-mri-protocol-automation/
├── configs/
│ └── config.yaml # Configuration file (paths, models, training)
├── data/
│ ├── train.xlsx # Training dataset
│ └── evaluation.xlsx # test dataset
├── model/
│ └── ... # Trained PEFT adapters
├── out/
│ └── ... # Logs and evaluation outputs
├── src/
│ ├── __init__.py
│ ├── main.py # Main entry point
│ ├── train.py # PEFT fine-tuning
│ ├── inference.py # Sequence generation
│ ├── evaluate.py # Evaluation metrics
│ ├── config.py # Configuration loader
│ ├── model_utils.py # PEFT configs, Focal Loss custom trainer
│ └── data_processing.py # Data loading, prompt formatting, tokenization
├── tests/# Unit tests
│ └── __init__.py
│ └── test_config.py
│ └── test_data_processing.py
│ └── test_evaluate.py
│ └── test_inference.py
│ └── test_model_utils.py
├── requirements.txt # Dependencies
└── README.md # This file
Clone the repository:
git clone https://github.com/your-username/peft-mri-protocol-automation.git
cd peft-mri-protocol-automationInstall dependencies:
pip install -r requirements.txtAll pipeline settings are controlled by configs/config.yaml. Before running the pipeline, edit this file:
base_project_dir: Set this to the absolute path of the cloned repository.model_mapping: Update paths to your pre-trained base models (e.g.,Llama-3.1-8B).data_paths: Verify the names of your training and validation data files.
The pipeline expects data in Excel files:
-
Training/Validation (
train.xlsx,val.xlsx)
Must contain the following columns:
Indication,Symptoms,Age,Gender,Protocol
TheProtocolcolumn contains the ground-truth sequence list, e.g.,"axial FLAIR, axial DWI". -
Test (
evaluation.xlsx)
Must contain:Indication,Symptoms,Age,Gender
TheProtocol ZeynepandProtocol Ralfcolumns are used as ground truth.
The main entry point is src/main.py. You can run training, inference, and evaluation using command-line arguments.
Run inference and evaluation using a pre-trained model without fine-tuning:
python src/main.py \
--fine_tuning_method "none" \
--model_name "llama" \
--layers "qkvo" \
--test_file "evaluation" \
--model_folder "model" \
--rank "2"
This command will:
- Train: Fine-tune the Llama model using LoRA with a rank of 2.
- Infer: Run inference on the evaluation test file.
- Evaluate: Calculate and print performance metrics.
python src/main.py \
--fine_tuning_method "lora" \
--model_name "llama" \
--layers "qkvo" \
--rank "2" \
--test_file "evaluation" \
--model_folder "model"| Argument | Description |
|---|---|
--fine_tuning_method |
PEFT method to use: none — Runs baseline inference only lora — Low-Rank Adaptation vera — Vector-based Random Matrix Adaptation prompt — Prompt Tuning |
--model_name |
Key from config.yaml model mapping (e.g., llama, qwen, gemma) |
--layers |
Target modules for LoRA/VeRA (e.g., qkvo, o, q) |
--rank |
Rank for the PEFT method (e.g., 8, 16, 32) |
--test_file |
Name of the test file in your /data/ directory (e.g., evaluation) |
--model_folder |
directory (e.g. model) to save/load the trained adapter |
The evaluate.py script computes a wide range of metrics to provide a review of model performance:
- Lexical Similarity: Metrics based on
rapidfuzz(Fuzzy Ratio) and Levenshtein (Edit Distance) to account for minor spelling variations. - Semantic Similarity: Uses
sentence-transformers(BioBERT and MiniLM) to compute cosine similarity between predicted and ground-truth sequences. This metric correctly identifies semantically equivalent sequences (e.g.,"axial T1"vs"T1 axial").
If you use this repository, please cite:
Ganji, Z. (2025). Parameter-Efficient Adaptation of Open-Source Language Models for Clinical MRI Protocol Automation. Master Thesis, Rheinische Friedrich-Wilhelms-Universität Bonn, Computer Science.