Skip to content

ShunqiM/PM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding

This is the official implementation of the paper:
"Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding"

💡 Overview

We introduce Perception Magnifier (PM), an inference-time decoding method for vision-language models. PM constructs and refines perception maps from attention, then magnifies critical visual regions while compressing less relevant areas, guiding the model to focus on fine-grained details without losing global context. This adaptive magnification strengthens visual grounding during decoding, effectively reducing hallucinations while preserving reasoning ability.

Perception Magnifier Overview

🕹️ Usage

Environment Setup

conda create --name pm python=3.10
conda activate pm
pip install -r requirements.txt

Then set up the environment variables in starter/env_definer.sh and follow the instructions in starter/ReadMe.md.

Run

The main implementation of Perception Magnifier (PM) is located in:

experiments/eval/model_generate.py

This file contains the core functions for generating model outputs with PM.

Evaluation Scripts

To reproduce results, we provide ready-to-use scripts under the scripts/ folder:

For example, to run PM with LLaVA on MME:

bash experiments/scripts/mme/run_llava_pm.sh

Run PM with LLaVA on POPE:

bash experiments/scripts/pope/run_llava_pm.sh

📑 Citation

If you find our work useful, please consider citing:

@article{mao2025through,
  title={Through the magnifying glass: Adaptive perception magnification for hallucination-free vlm decoding},
  author={Mao, Shunqi and Zhang, Chaoyi and Cai, Weidong},
  journal={arXiv preprint arXiv:2503.10183},
  year={2025}
}

🌟 Acknowledgment

This repository extends existing implementations, with some functions and evaluation scripts adapted from VDD, API, PAI, Transformer-Explainability, and OPERA.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published