Code for scientific figure multiple-choice QA with contrastive decoding.
SciCon is a simple contrastive decoding method for scientific figure multiple-choice QA.
- The model first scores each answer candidate with the full multimodal input.
- It then scores the same candidates again using a text-only version of the question.
- SciCon subtracts the text-only prior, scaled by
alpha, from the multimodal score. - This suppresses answers that are mainly favored by textual bias and promotes answers grounded in the figure.
In short, SciCon turns answer choices into an explicit prior and removes that prior during decoding so that the final prediction relies more on visual evidence.
- evaluation script for contrastive decoding over answer candidates
- automatic dataset path discovery under
data/ - support for
mac,scifi, andmmsci - OpenAI-compatible API inference
- dataset files
- model weights
- training code
- built-in model serving
This repository does not bundle the datasets. If you want to run the released script on the same benchmarks, use the following dataset repositories:
- https://huggingface.co/datasets/mhjiang0408/MAC_Bench
- https://huggingface.co/datasets/jonathan-roberts1/SciFIBench
- https://huggingface.co/datasets/MMSci/NatureCommsCorpus
Place the prepared files under data/. The script will try to auto-detect standard layouts, and you can also pass paths manually through command-line arguments.
Example layout:
data/
MAC_Bench/
test.jsonl
images/
MAC_Bench/
...
scifi/
test.parquet
mmsci/
test.json
images/
...
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtRun against any OpenAI-compatible VLM endpoint:
python src/run_always_contrastive_all_candidate.py \
--dataset mac \
--api-base http://127.0.0.1:30000/v1 \
--output-jsonl results/mac_predictions.jsonlSupported datasets:
macscifimmsci
If --api-model is not provided, the script tries to auto-detect it from /v1/models.
The script expects an OpenAI-compatible API for a vision-language model.
Typical options:
sglangvLLM- other compatible servers exposing
/v1/modelsand chat/completions APIs
python src/run_always_contrastive_all_candidate.py \
--dataset mac \
--api-base http://127.0.0.1:30000/v1 \
--output-jsonl results/mac_predictions.jsonlpython src/run_always_contrastive_all_candidate.py \
--dataset scifi \
--api-base http://127.0.0.1:8000/v1 \
--api-model your-vlm-name \
--output-jsonl results/scifi_predictions.jsonlNotes:
- the served model must support image input
- if
/v1/modelsis unavailable or empty, set--api-modelexplicitly - some VLMs need serving-side options such as chat templates or multimodal limits
Smoke test on a small subset:
python src/run_always_contrastive_all_candidate.py \
--dataset mac \
--api-base http://127.0.0.1:30000/v1 \
--max-samples 10 \
--output-jsonl results/mac_smoke.jsonlExplicit MMSci paths:
python src/run_always_contrastive_all_candidate.py \
--dataset mmsci \
--input-mmsci-json data/mmsci/test.json \
--image-root data/mmsci/images \
--api-base http://127.0.0.1:30000/v1 \
--output-jsonl results/mmsci_predictions.jsonlBy default, outputs are written to:
results/<dataset>_predictions.jsonl
Override this with --output-jsonl if needed.
If you use this repository in your research, please cite:
@article{roh2026choices,
title={When Choices Become Priors: Contrastive Decoding for Scientific Figure Multiple-Choice QA},
author={Roh, Taeyun and Jo, Eun-yeong and Jang, Wonjune and Kang, Jaewoo},
journal={arXiv preprint arXiv:2603.28026},
year={2026}
}
When Choices Become Priors: Contrastive Decoding for Scientific Figure Multiple-Choice QA
Taeyun Roh, Eun-yeong Jo, Wonjune Jang, Jaewoo Kang
This repository contains the evaluation code accompanying the paper and is intended as a lightweight research release for scientific figure multiple-choice QA.