Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge

This repository contains the source code and experimental setup for our solution developed by Biomedical and Data Lab at Mahidol University, Thailand, to the Third Scientific Figure Captioning Challenge (SciCap Challenge 2025), held as part of the LM4Sci Workshop at COLM 2025 (October 7–10, Montreal, Canada). Our full paper can found via our arXiv paper.

Challenge Overview

The SciCap Challenge 2025 focuses on personalized caption generation for scientific figures using the new LaMP-CAP dataset, which includes over 300,000 figures from 110,000+ scientific papers. The dataset is designed for multimodal caption generation with emphasis on personalization across writing styles and research domains.

SciCap Challenge Dataset

The dataset consists of 110,828 scientific articles. Each article includes one target figure and up to three associated profile figures. Each figure contains mentioned text, accompanying paragraph, OCR texts, caption length, and figure type as a context for the caption generation. This dataset encompasses 8 fields with 155 unique categories. For more details about the competition and dataset, visit the SciCap Challenge 2025 website.

Download SciCap Challenge Dataset

from huggingface_hub import snapshot_download
snapshot_download(repo_id="CrowdAILab/scicap", repo_type='dataset')

then split the dataset,

zip -F img-split.zip --out img.zip

After installation, we divide the training split into 155 categories based on the article's category for further training. The metadata for referencing target and profile figures can be found in this LaMP-Cap Repository.

Our Approach

Our solution includes two-stage caption generation pipeline, integrating both contextual understanding in Stage 1 with author-specific stylistic adaptation in Stage 2.

Stage 1 — Content-Grounded Caption Generation:

Sentence-based filtering by Flan-T5 to remove noisy or irrelevant text segments from paragraph
Category-Level Prompt Optimization by DSPy's MIPROv2 and SIMBA to develop domain-specific prompt optimized for each specific field.
Caption Candidate Selection by Gemini 2.5 Flash to rank then select the highest contextually accurate caption

Stage 2 — Profile-Informed Stylistic Refinement:

We used few-shot prompting with profile figures (up to 3 examples) to enhance the content-grounded caption in terms of stylistic similarity.

Evaluation:

We used BLEU 1 to 4 and ROUGE 1, 2 and F-1 to evaluate the generated caption from the test set.
Stage 1: improved ROUGE-1 recall by +8.3% while limiting precision loss to -2.8% and BLEU-4 reduction to -10.9%.
Stage 2: yielded 40-48% gains in BLEU scores and 25-27% in ROUGE scores.

Inference

Clone our Github repo and install dependencies

git clone https://github.com/biodatlab/scicap-titipapa
cd scicap-titipapa
pip install -r requirements.txt

Inference caption candidates with the optimized prompts from optimized_prompt folder

python candidate_inference.py

Select the best caption by LLMs

python llm_reranking.py

Few-shot refinement with profile figures

python caption_refinement.py

Evaluation

To evaluate, the output has to be in this format

[
  {
    "id": 1,
    "candidate": "example_candidate_1", 
    "reference": "example_reference_1" 
  },
  {
    "id": 2,
    "candidate": "example_candidate_2", 
    "reference": "example_reference_2"  
  }
]

The evaluation can be found in utils/evaluation.py and used with the following steps;

Download the tokenizer

import nltk
nltk.download('punkt')

Run the evaluation script

python utils/evaluation.py data/sample_input.json

After successful execution, a CSV file will be created in the same directory as the input JSON file.

BibTeX Citation

If you use our solution in your research, please cite our paper using the following BibTex


@misc{timklaypachara2025leveragingauthorspecificcontextscientific,
      title={Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge}, 
      author={Watcharapong Timklaypachara and Monrada Chiewhawan and Nopporn Lekuthai and Titipat Achakulvisut},
      year={2025},
      eprint={2510.07993},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.07993}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
assets		assets
optimized_prompt		optimized_prompt
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
candidate_inference.py		candidate_inference.py
caption_refinement.py		caption_refinement.py
llm_reranking.py		llm_reranking.py
prompt_optimization.py		prompt_optimization.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge

Challenge Overview

SciCap Challenge Dataset

Download SciCap Challenge Dataset

Our Approach

Stage 1 — Content-Grounded Caption Generation:

Stage 2 — Profile-Informed Stylistic Refinement:

Evaluation:

Inference

Evaluation

BibTeX Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge

Challenge Overview

SciCap Challenge Dataset

Download SciCap Challenge Dataset

Our Approach

Stage 1 — Content-Grounded Caption Generation:

Stage 2 — Profile-Informed Stylistic Refinement:

Evaluation:

Inference

Evaluation

BibTeX Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages