Skip to content

ffhibnese/CMI_VLD_Hallucination_Mitigation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs[Paper]

Overview of the proposed CMI-VLD decoding.

Abstract: Large Vision-Language Models (LVLMs) are susceptible to hallucinations, where generated responses seem semantically plausible yet exhibit little or no relevance to the input image. Previous studies reveal that this issue primarily stems from LVLMs' over-reliance on language priors while disregarding the visual information during decoding. To alleviate this issue, we introduce a novel Conditional Pointwise Mutual Information (C-PMI) calibrated decoding strategy, which adaptively strengthens the mutual dependency between generated texts and input images to mitigate hallucinations. Unlike existing methods solely focusing on text token sampling, we propose to jointly model the contributions of visual and textual tokens to C-PMI, formulating hallucination mitigation as a bi-level optimization problem aimed at maximizing mutual information. To solve it, we design a token purification mechanism that dynamically regulates the decoding process by sampling text tokens remaining maximally relevant to the given image, while simultaneously refining image tokens most pertinent to the generated response. Extensive experiments across various benchmarks reveal that the proposed method significantly reduces hallucinations in LVLMs while preserving decoding efficiency.

Setup

Before using our CMI-VLD framework, please set up the environment with the following commands:

conda env create -f environment.yml
conda activate CMI-VLD
python -m pip install -e transformers

Implementation

Before evaluation, you need to download the following datasets and checkpoints of 7B base models:

After setting up the environment, you can train the Visual Token Purifier by running:

bash predictor_train.sh

After training the Visual Token Purifier, you can run the following code to perform evaluation:

python pope_eval.py --model llava-1.5  --pope-type coco_random --use-cd --use-cmi --use-fast-v --sample --predictor your/path/to/PREDICTOR #CMI-VLD

During evaluation, we combined the VTI method with our approach. If you want to apply our method to other models, please adjust the corresponding parameters accordingly. All code was evaluated on an NVIDIA A6000 device.

You can also directly use our code to run multiple hallucination mitigation methods: Self-Introspective Decoding (SID, Vision Contrastive Decoding (VCD), Instruction Contrastive Decoding (ICD), OPERA, VTI.

python pope_eval.py --model llava-1.5  --pope-type coco_random --use-cd  --use-fast-v #SID
python pope_eval.py --model llava-1.5  --pope-type coco_random --use-vcd  --sample #VCD
python pope_eval.py --model llava-1.5  --pope-type coco_random --use-icd  --sample  #ICD
python pope_eval.py --model llava-1.5  --pope-type coco_random --vti  --sample #VTI
python pope_eval.py --model llava-1.5  --pope-type coco_random --beam 5 --opera #OPERA

The CHAIR metric utilizes the same configuration.

Arguments

Argument Example Description
--model llava-1.5 Specify the LVLM model.
--data-path dataset/MSCOCO/val2014 Path to the dataset file or folder.
--data-file dataset/MSCOCO/ Path to the dataset file or folder.
--pope-type coco_adversarial Type for POPE evaluation.
--sample store_true Use the modified decoding strategy.
--sample-greedy store_true Use CD with sampling and greedy decoding.
--beam 5 Beam search number.
--opera store_true Use OPERA.
--vti store_true Use VTI.

Acknowledgement

Some codes are based on the LVLMs codebase of SID, VTI, OPERA and VCD. Thanks for their excellent works!

About

[NeurIPS-2025] Large Vision-Language Models, Hallucination Mitigation, Conditional Mutual Information, Token Purification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors