🚀 News

Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs

What doesn't kill me makes me stronger!

Kejia Zhang, Keda Tao, Jiasheng Tang, Huan Wang

📖 Paper Teaser

We introduce VAP (Visual Adversarial Perturbation), a novel approach that strategically injects beneficial visual noise to mitigate object hallucinations in LVMs, without altering the complex base model. Our method consistently enhances robustness across 8 state-of-the-art LVMs under the rigorous POPE hallucination evaluation.

🎬 Demo: VAP in Action

Examples of the vision-question-answer (VQA) tasks before and after applying our proposed method to the original images. (a) and (b) demonstrates the suppression of hallucinations in vision-/text-axis evaluations. (c) and (d) shows the reduction of hallucinations in open-ended tasks..

🚀 News

📢 [2025-02-02] VAP is now open-source! Explore our Visual Adversarial Perturbation (VAP) method to mitigate object hallucinations in LVMs. Check out the repo and get started! 🔥

📢 [2025-02-03] Our paper “Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs” is now available! 🎉

Code Implementation Overview

1. Baseline LVMs Details

We evaluate eight state-of-the-art Large Vision-Language Models (LVMs) to validate the efficacy of our proposed approach. These models, spanning significant advancements in multimodal AI from September 2023 to December 2024, range from 7.1B to 16.1B parameters.

Model	Parameters	Language Model	Vision Model	Release Date
LLaVA-v1.5	7.1B	Vicuna-7B	CLIP ViT-L/14	2023-09
Instruct-BLIP	7.9B	Vicuna-7B	ViT-G	2023-09
Intern-VL2	8.1B	InternLM2.5-7B	InternViT-300M	2024-07
Intern-VL2-MPO	8.1B	InternLM2.5-7B	InternViT-300M	2024-11
DeepSeek-VL2	16.1B	DeepSeekMoE-16B	SigLIP-400M	2024-12
Qwen-VL2	8.3B	Qwen2-7B	ViT-Qwen	2024-08
LLaVA-OV	8.0B	Qwen2-7B	SigLIP-400M	2024-08
Ovis1.6-Gemma2	9.4B	Gemma2-9B	SigLIP-400M	2024-11

2. Hallucination Evaluation Dataset

POPE Benchmark: Evaluation triplets are sampling from the MS-COCO dataset and are formatted in the following JSON structure:

Example JSON format:

{
  "id": 0,
  "image": "name", // The image filename from the dataset
  "question": "XXX", // The generated question related to the image
  "gt": "yes/no" // The ground truth answer (binary: "yes" or "no")
}

BEAF Benchmark: This dataset follows the format specified in the BEAF repository.

3. Environment Setup

This section provides detailed instructions for setting up the environments required to run various LVMs. We ensure compatibility across different models by structuring the setup efficiently, leveraging shared dependencies where applicable.

LLaVA-v1.5

Model: liuhaotian/llava-v1.5-7b
Environment Setup:

cd env_setting/LLaVA
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip
pip install -e .
pip install ftfy regex tqdm
pip install protobuf transformers_stream_generator matplotlib

Instruct-BLIP & InternVL Series

Models:
Shared Environment Setup:

conda create -n internvl python=3.9 -y
conda activate internvl
pip install lmdeploy==0.5.3
pip install timm ftfy regex tqdm matplotlib
pip install flash-attn==2.3.6 --no-build-isolation

LLaVA-OneVision & Ovis1.6-Gemma2

Model:
- lmms-lab/llava-onevision-qwen2-7b-ov
- AIDC-AI/Ovis1.6-Gemma2-9B
Shared Environment Setup:

cd env_setting/LLaVA-NeXT
conda create -n llava_onevision python=3.10 -y
conda activate llava_onevision
pip install --upgrade pip
pip install -e ".[train]"
pip install ftfy regex tqdm matplotlib
pip install transformers==4.47.0
pip install flash-attn==2.5.2

Qwen2-VL Series

Model: Qwen/Qwen2-VL-7B-Instruct
Environment Setup:

conda create -n qwen python==3.11 -y
conda activate qwen
pip install transformers
pip install qwen-vl-utils
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install accelerate==0.26.1
pip install ftfy regex tqdm matplotlib

DeepSeek-VL2

Model: deepseek-ai/deepseek-vl2
Environment Setup:

conda create -n deepseek python==3.10 -y
conda activate deepseek
cd env_setting/DeepSeek-VL2/
pip install -e .
pip install ftfy regex tqdm matplotlib
pip install flash_attn==2.5.8
pip install --force-reinstall --no-deps --pre xformers
pip install transformers==4.38.2

This setup ensures a streamlined development environment while minimizing conflicts across dependencies. For further optimizations, consider leveraging CUDA-based optimizations, distributed inference, and efficient memory management strategies.

4. Running the Code

Visual Adversarial Perturbation (VAP) Execution

To generate visual noise for mitigating hallucinations in LVMs, run the following command:

bash script/VAP.sh

Hallucination Evaluation

To assess the model’s performance on hallucination benchmarks, execute:

bash script/evaluate.sh

📌 Citation

If you find our work helpful, please consider citing our paper:

@article{zhang2025poison,
  title={Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs},
  author={Zhang, Kejia and Tao, Keda and Tang, Jiasheng and Wang, Huan},
  journal={arXiv preprint arXiv:2501.19164},
  year={2025}
}

Your citation helps support our research and further advances the field of reliable vision-language models. 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
clip		clip
env_setting		env_setting
image		image
script		script
.DS_Store		.DS_Store
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
VAP.py		VAP.py
evaluate_hallucination.py		evaluate_hallucination.py
model_loaders.py		model_loaders.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs

What doesn't kill me makes me stronger!

Kejia Zhang, Keda Tao, Jiasheng Tang, Huan Wang

📖 Paper Teaser

🎬 Demo: VAP in Action

🚀 News

Code Implementation Overview

1. Baseline LVMs Details

2. Hallucination Evaluation Dataset

3. Environment Setup

LLaVA-v1.5

Instruct-BLIP & InternVL Series

LLaVA-OneVision & Ovis1.6-Gemma2

Qwen2-VL Series

DeepSeek-VL2

4. Running the Code

📌 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

KejiaZhang-Robust/VAP

Folders and files

Latest commit

History

Repository files navigation

Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs

What doesn't kill me makes me stronger!

Kejia Zhang, Keda Tao, Jiasheng Tang, Huan Wang

📖 Paper Teaser

🎬 Demo: VAP in Action

🚀 News

Code Implementation Overview

1. Baseline LVMs Details

2. Hallucination Evaluation Dataset

3. Environment Setup

LLaVA-v1.5

Instruct-BLIP & InternVL Series

LLaVA-OneVision & Ovis1.6-Gemma2

Qwen2-VL Series

DeepSeek-VL2

4. Running the Code

📌 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages