MediaEval Medico 2026: VQA (with multimodal explanations) for GastroIntestinal Imaging

MediaEval 2026 | Registration Form | 2025 Edition (Archived) | 🏆 Leaderboard / Registered Submissions

The MediaEval Medico 2026 Challenge focuses on Visual Question Answering (VQA) for Gastrointestinal (GI) imaging, emphasizing explainability to foster trustworthy AI for clinical adoption.

This task continues the long-running Medico series at MediaEval, now leveraging the newly developed Kvasir-VQA-x1 dataset, designed to support multimodal reasoning and interpretable clinical decision support.

About MediaEval 2026

The Benchmarking Initiative for Multimedia Evaluation (MediaEval) offers challenges related to multimedia analysis, retrieval and exploration. MediaEval tasks involve multiple modalities (e.g., audio, visual, textual, and/or contextual) and focus on the human and social aspects of multimedia. Our larger aim is to promote reproducible research that makes multimedia a positive force for society.

Signup for MediaEval 2026 is now open via the MediaEval 2026 registration form

Important Dates

The MediaEval 2026 Workshop will be held: Monday-Tuesday, 15-16 June 2026 | Amsterdam & Online (co-located with ACM ICMR)

Data release will start soon and continue over the next months. Runs will be due at the beginning of May.

Task Descriptions

Subtask 1: AI Performance on Medical Image Question Answering

Goal: Develop AI models that can accurately answer clinical questions using GI endoscopic images.

The task uses Kvasir-VQA-x1, an advanced dataset comprising 159,549 QA pairs from 6,500 original GI images, featuring:

Multi-step reasoning questions
Naturalized medical language
Complexity scores for curriculum training

Question Types include:

Yes/No
Single-Choice
Multiple-Choice
Color-related
Location-related
Numerical Count
Merged reasoning-based questions

Example Training Notebook: Not sure where to start? Check out: Training with ms-swift

Note: You can only submit work for Task 1 if you wish to participate.

It is acceptable to use the full test set for training in your final submission to get competitive score. However, we strongly recommend using proper splits for training and clearly reporting in your paper which splits were used for training, and validation.

Subtask 2: Clinician-Oriented Multimodal Explanations in GI

Goal: Move beyond simply predicting an answer (Subtask 1) and generate rich, multimodal explanations that are transparent, understandable, and trustworthy for clinicians.

Your system should justify its predictions using multiple complementary reasoning forms—e.g., combining a detailed textual clinical explanation with a visual localization and/or a confidence measure.

Requirements:

Faithful to the model's reasoning.
Clinically relevant and medically sound.
Useful for real-world decision-making.

Validation set for Subtask 2:

from datasets import load_dataset, Image as HfImage

ds = load_dataset("SimulaMet/Kvasir-VQA-x1")["test"]
val_set_task2 = (
    ds.filter(lambda x: x["complexity"] == 1)
      .shuffle(seed=42)
      .select(range(1500))
      .add_column("val_id", list(range(1500)))
      .remove_columns(["complexity", "answer", "original", "question_class"])
      .cast_column("image", HfImage())
)

val_set_task2 is a HuggingFace Dataset containing the columns val_id, img_id, image, and question, where image is Pillow Image for easy access.

Submission Format

A JSONL file where each entry corresponds to one test case:

{
  "val_id": "index of validation subset for subtask 2, as in val_set_task2",
  "img_id": "UNIQUE_IMAGE_IDENTIFIER",
  "question": "Original question posed to the model.",
  "answer": "Prediction from your model from Subtask 1.",
  "textual_explanation": "Detailed narrative in clinical language justifying the answer.",
  "visual_explanation": [{
    "type": "heatmap | segmentation_mask | bounding_box | etc.",
    "data": "path/to/visual.png | [[x1,y1,x2,y2]]",
    "description": "(Optional) Highlights the region of interest that supports the answer (e.g., bounding box around the polyp, or heatmap showing focus on mucosal irregularity)."
  }],
  "confidence_score": 0.92
}

Field-by-Field Requirements:

img_id / question / answer - Must match Subtask 1 data and predictions exactly.
textual_explanation (Mandatory) - Clinician-oriented reasoning referencing visual cues (location, morphology, color, size, vascular pattern, etc.).
visual_explanation (Optional but encouraged) - Heatmaps, segmentation masks, or bounding boxes linked to the textual explanation.
confidence_score (Optional but encouraged) - Float in [0, 1], from model confidence or uncertainty estimation.

Suggested Approaches

VLM Self-Probing for Explanations - Ask auxiliary questions (e.g., "What is the abnormality?", "Where is it located?", "Describe its morphology") and combine answers into the textual_explanation.
Visual Grounding - Generate heatmaps or attention maps showing influential regions and link them to textual descriptions.
Segmentation / Detection - Produce masks or bounding boxes highlighting relevant pathology, reinforcing clinician trust.

Note: Participation in Subtask 2 requires completion of Subtask 1.

Dataset Overview: Kvasir-VQA-x1

Built on HyperKvasir and Kvasir-Instrument, the Kvasir-VQA-x1 dataset includes:

159,549 QA pairs
6,500 original GI images
10 weakly augmented images per original (augmentation script provided)
Complexity levels 1–3
Realistic medical question reformulations using LLMs

Dataset: Kvasir-VQA-x1 @ SimulaMet on Hugging Face

Evaluation Methodology

Subtask 1 (VQA Performance)

Metrics: BLEU, ROUGE (1/2/L), METEOR
Settings: Original & augmented images
Criteria: Accuracy, relevance, medical correctness

The official challenge score will be computed on a separate hidden challenge set with more metrics. This ensures fairness and that final results truly reflect model performance.

Subtask 2 (Explainability) Rated by experts on:

Answer correctness
Clarity & clinical relevance
Visual alignment
Confidence calibration
Methodology & novelty

Submission System

Please do not hesitate to contact us if you encounter any issues.

View Registered Submissions

We use the medvqa Python package to validate and submit models to the official system.

Install

pip install -U medvqa

Always use the latest version.

The model that needs to be submitted is expected to be in a HuggingFace repository. Your HuggingFace repo must include a standalone script named:

submission_task1.py for task 1.
submission_task2.py for task 2.

Instructions for Participants

Use the provided template script, and make sure to:

Modify all TODO sections
Add required information (e.g., model path, inference logic, preprocessing steps) directly in the script
Keep the required input/output format unchanged

Task 1: Script Variants & Naming Requirements

You have two template options for the Task 1 inference script:

MS-Swift version: submission_task1_swift.py
PyTorch version: submission_task1.py

Both scripts already include template example code for model loading and inference.

Important: Even if you use the MS-Swift template, your final script in the repository must still be named submission_task1.py.

Task 2: What to Submit (Repository Layout)

Host your submission in a Hugging Face model repository containing:

submission_task2.jsonl — one object per val_id
visuals/ — optional folder with any referenced visual artifacts (heatmaps, masks, boxes as JSON, etc.)
submission_task2.py file with you team details
A short README.md explaining how you created the explanations and any post-processing you want to share

Demo submission repo: https://huggingface.co/SushantGautam/Medico2026_subtask2_demo_submission/tree/main

Naming tips

Keep data paths in visual_explanation relative to repo root (e.g., visuals/1234_heatmap.png).
Ensure every val_id in the file corresponds to an item in val_set_task2.

Validate Before Submitting

First make sure your submission script works fine in your working environment and it loads the model correctly from your submission repo and generates outputs in the required format.

python submission_task1.py

Next, you can validate the script to work independently. The .py script should now be in the root of the same HuggingFace repo as your model. You can try this in a new venv:

medvqa validate --competition=medico-2026 --task=1/2 --repo_id=<your_repo_id>

--competition: Set to medico-2026
--task: Use 1 for Task 1 or 2 for Task 2
--repo_id: Your HuggingFace model repo ID (e.g., SushantGautam/Florence-2-vqa-demo)

Additional Dependencies

If your code requires extra packages, you must include a requirements.txt in the root of the repo. The system will install these automatically during validation/submission. Else you will get package missing errors.

Submit

If validation is okey, you can just run:

medvqa validate_and_submit --competition=medico-2026 --task=1/2 --repo_id=<your_repo_id>

This will make a submission and your username, along with the task and time, should be visible on the leaderboard for it to be considered officially submitted. The submission library will make your Hugging Face repository public but gated, granting the organizers access to your repo. It must remain unchanged at least until the results of the competition are announced. However, you are free to make your model fully public (non-gated). If you encounter any issues with submission, don’t hesitate to contact us.

Tools & Resources

Scripts for augmentation, splits, and baselines
Submission templates
Fine-tuned model configs
Attention & saliency visualization methods

Timeline (Preliminary)

Now - Registration for task participation is open
February-March 2026 - Development data release
April 2026 - Test data release
1 June 2026 - Runs due
10 June 2026 - Working Notes deadline
15-16 June 2026 (Mon.–Tue.) - MediaEval Workshop (Amsterdam + Online, co-located with ACM ICMR)

Organizers

Steven A. Hicks - steven@simula.no
Michael A. Riegler - michael@simula.no
Vajira Thambawita - vajira@simula.no
Pål Halvorsen - paalh@simula.no
Sushant Gautam - sushant@simula.no

Join Us

Let's build the future of trustworthy, explainable medical AI. GI diagnostics needs interpretable answers. Your model can help save lives.

Register: MediaEval 2026 Repo: GitHub

Develop explainable AI. Help doctors. Improve lives.

How to Cite

If you are inspired by the MediaEval Medico 2026 Challenge or the Kvasir-VQA-x1 dataset in your research, please cite the following papers:

@article{Gautam2026,
	author = {Gautam, Sushant and Thambawita, Vajira and Riegler, Michael and others},
	title = {{Medico 2026: Visual Question Answering for Gastrointestinal Imaging}},
	journal = {arXiv},
	year = {2026},
	note = {To be published}
}

@article{Gautam2025Jun,
	author = {Gautam, Sushant and Riegler, Michael A. and Halvorsen, P{\aa}l},
	title = {{Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy}},
	journal = {arXiv},
	year = {2025},
	month = jun,
	eprint = {2506.09958},
	doi = {10.48550/arXiv.2506.09958}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
Task_1_Sample_Notebook.ipynb		Task_1_Sample_Notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MediaEval Medico 2026: VQA (with multimodal explanations) for GastroIntestinal Imaging

About MediaEval 2026

Important Dates

Task Descriptions

Subtask 1: AI Performance on Medical Image Question Answering

Subtask 2: Clinician-Oriented Multimodal Explanations in GI

Validation set for Subtask 2:

Submission Format

Suggested Approaches

Dataset Overview: Kvasir-VQA-x1

Evaluation Methodology

Submission System

Install

Instructions for Participants

Task 1: Script Variants & Naming Requirements

Task 2: What to Submit (Repository Layout)

Validate Before Submitting

Additional Dependencies

Submit

Tools & Resources

Timeline (Preliminary)

Organizers

Join Us

How to Cite

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MediaEval Medico 2026: VQA (with multimodal explanations) for GastroIntestinal Imaging

About MediaEval 2026

Important Dates

Task Descriptions

Subtask 1: AI Performance on Medical Image Question Answering

Subtask 2: Clinician-Oriented Multimodal Explanations in GI

Validation set for Subtask 2:

Submission Format

Suggested Approaches

Dataset Overview: Kvasir-VQA-x1

Evaluation Methodology

Submission System

Install

Instructions for Participants

Task 1: Script Variants & Naming Requirements

Task 2: What to Submit (Repository Layout)

Validate Before Submitting

Additional Dependencies

Submit

Tools & Resources

Timeline (Preliminary)

Organizers

Join Us

How to Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages