Skip to content

biodatlab/scicap-titipapa

Repository files navigation

Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge

This repository contains the source code and experimental setup for our solution developed by Biomedical and Data Lab at Mahidol University, Thailand, to the Third Scientific Figure Captioning Challenge (SciCap Challenge 2025), held as part of the LM4Sci Workshop at COLM 2025 (October 7–10, Montreal, Canada). Our full paper can found via our arXiv paper.

Challenge Overview

The SciCap Challenge 2025 focuses on personalized caption generation for scientific figures using the new LaMP-CAP dataset, which includes over 300,000 figures from 110,000+ scientific papers. The dataset is designed for multimodal caption generation with emphasis on personalization across writing styles and research domains.

SciCap Challenge Dataset

The dataset consists of 110,828 scientific articles. Each article includes one target figure and up to three associated profile figures. Each figure contains mentioned text, accompanying paragraph, OCR texts, caption length, and figure type as a context for the caption generation. This dataset encompasses 8 fields with 155 unique categories. For more details about the competition and dataset, visit the SciCap Challenge 2025 website.

Download SciCap Challenge Dataset

from huggingface_hub import snapshot_download
snapshot_download(repo_id="CrowdAILab/scicap", repo_type='dataset')

then split the dataset,

zip -F img-split.zip --out img.zip

After installation, we divide the training split into 155 categories based on the article's category for further training. The metadata for referencing target and profile figures can be found in this LaMP-Cap Repository.

Our Approach

Our Approach” width=

Our solution includes two-stage caption generation pipeline, integrating both contextual understanding in Stage 1 with author-specific stylistic adaptation in Stage 2.

Stage 1 — Content-Grounded Caption Generation:

  • Sentence-based filtering by Flan-T5 to remove noisy or irrelevant text segments from paragraph
  • Category-Level Prompt Optimization by DSPy's MIPROv2 and SIMBA to develop domain-specific prompt optimized for each specific field.
  • Caption Candidate Selection by Gemini 2.5 Flash to rank then select the highest contextually accurate caption

Stage 2 — Profile-Informed Stylistic Refinement:

  • We used few-shot prompting with profile figures (up to 3 examples) to enhance the content-grounded caption in terms of stylistic similarity.

Evaluation:

  • We used BLEU 1 to 4 and ROUGE 1, 2 and F-1 to evaluate the generated caption from the test set.
  • Stage 1: improved ROUGE-1 recall by +8.3% while limiting precision loss to -2.8% and BLEU-4 reduction to -10.9%.
  • Stage 2: yielded 40-48% gains in BLEU scores and 25-27% in ROUGE scores.

Inference

  1. Clone our Github repo and install dependencies
git clone https://github.com/biodatlab/scicap-titipapa
cd scicap-titipapa
pip install -r requirements.txt
  1. Inference caption candidates with the optimized prompts from optimized_prompt folder
python candidate_inference.py
  1. Select the best caption by LLMs
python llm_reranking.py
  1. Few-shot refinement with profile figures
python caption_refinement.py

Evaluation

To evaluate, the output has to be in this format

[
  {
    "id": 1,
    "candidate": "example_candidate_1", 
    "reference": "example_reference_1" 
  },
  {
    "id": 2,
    "candidate": "example_candidate_2", 
    "reference": "example_reference_2"  
  }
]

The evaluation can be found in utils/evaluation.py and used with the following steps;

  1. Download the tokenizer
import nltk
nltk.download('punkt')
  1. Run the evaluation script
python utils/evaluation.py data/sample_input.json

After successful execution, a CSV file will be created in the same directory as the input JSON file.

BibTeX Citation

If you use our solution in your research, please cite our paper using the following BibTex


@misc{timklaypachara2025leveragingauthorspecificcontextscientific,
      title={Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge}, 
      author={Watcharapong Timklaypachara and Monrada Chiewhawan and Nopporn Lekuthai and Titipat Achakulvisut},
      year={2025},
      eprint={2510.07993},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.07993}, 
}

About

Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages