Skip to content

keshav22/Mitigate-Relation-Hallucination

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

319 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Relation-Hallucination: Evaluating Visual Contrastive Decoding on Relation Hallucinations in LVLMs.

Overview

This repository contains the code and experiments for investigating whether Visual Contrastive Decoding (VCD) can be adapted to mitigate relation hallucinations in Large Vision-Language Models (LVLMs). While VCD effectively reduces object hallucinations, relation hallucinations remain underexplored. Our project evaluates targeted, relation-specific perturbations against full-image corruption to see if we can provide a stronger contrastive signal for relational reasoning.

Related Repositories

  • LLaVA - Large Language and Vision Assistant
  • Grounding DINO - Object detection and grounding
  • Visual Contrastive Decoding - Mitigating object hallucinations in LVLMs
  • ReefKnot - Reefknot: A Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in MVLMs
  • R-Bench - R-Bench: Benchmark with image and instance Level Yes/No Questions

Key Methods & Features

  • Relation-Aware VCD: Adapts standard VCD by applying Gaussian noise only to specific detected objects or regions instead of the entire image.
  • Targeted Perturbation Strategies: Uses Grounding DINO for object detection to perform single object masking, all-object masking, inter-object region masking, and patch shuffling.
  • Counterfactual Prompting: A text-based contrastive strategy that generates counterfactual prompts by replacing the relation in the original prompt.
  • Extended Detect-then-Calibrate (DTC): Extends the standard DTC baseline, originally limited to Yes/No questions, to support Multiple Choice (MCQ) and Visual Question Answering (VQA) formats using a generalized token set gathered by top-p or top-k.

Datasets & Models

  • Datasets Evaluated: Reefknot (comprising Y/N, MCQ, and VQA splits based on Visual Genome) and the R-Bench benchmark.
  • Models: LLaVA-1.5-13B (primary) and Qwen-VL-7B.

Key Findings

  • Targeted contrastive decoding strategies plateau at a ~36% hallucination rate (for LLaVA on Y/N questions of Reefknot), failing to offer meaningful improvements over the base model.
  • VCD variants do not match or outperform the Detect-then-Calibrate (DTC) baseline.
  • Logit distribution and attention analyses reveal that pixel-level perturbations are insufficient to decouple the model's relational reasoning.
  • Effective mitigation for relation hallucinations likely requires targeting internal model mechanisms rather than corrupting visual input at inference time.

Prerequisites & Setup

The following models, tools, and environments are necessary to reproduce the experiments of this project:

1. Models & Evaluation Libraries

  • LVLMs: Set up the environments for LLaVA-1.5-13B and Qwen-VL-7B.
  • Grounding DINO: Required for the object detection and targeted perturbation steps.
  • DeBERTa-v2: Used for bidirectional textual entailment to evaluate VQA question types.

2. Datasets

Reefknot Benchmark

Built on Visual Genome; includes Y/N, MCQ, and VQA subsets. Following scripts run inference using different methods on Reefknot.

  • VCD (Visual Contrastive Decoding) Based (and base model): VCD (example script)
  • DTC (Detect-then-Calibrate): DTC

The generated result files can be evaluated using: Reefknot Evaluation

R-Bench

Specifically the image-level subset containing Y/N questions. Following scripts run the inference for different methods on R-Bench Benchmark Dataset:

The result files generated can be evaluated using : R-Bench Evaluation Script

3. Compute Infrastructure

Running inference with large models like LLaVA-13B and contrastive decoding requires significant GPU resources. During development, the following platforms were utilized:

  • Initial Development & Debugging: Lightning.ai, Kaggle (free-tier), and Google Colab.
  • Full-Scale Experiments: Lichtenberg and ADA HPC clusters. Ensure you have adequate VRAM and compute limits to run the full evaluation suites.

For better insights

Contributors

  • Keshav Agrawal
  • Nico Lick
  • Anusha Siddapati Mohanreddy
  • Romila Singh
  • Manu Thomas

Final Report Link

About

A repo to execute LVLM's mitigating Relation Hallucination

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors