Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".
This RAF
branch is for training RAF.
- Python 3.10
- PyTorch 1.12.1
conda create -n raf python=3.10 -y
conda activate raf
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch -y
pip install ftfy regex tqdm
Download COCO and LVIS to data
.
~/data
├── coco
│ ├── annotations/instances_val2017.json
│ ├── train2017
│ └── val2017
└── lvis
├── lvis_v1_val.json
├── train2017
└── val2017
To train RAF, we utilized object features extracted from OADP. Referencing OADP, prepare oake features for COCO and LVIS under the clip_region
.
~/clip_region
├── coco_oake_object_train
└── lvis_oake_object_train
Generate preprocessed region features from annotation file, using the following commands.
python make_gt_region_feats.py --dataset coco --train_val val
python make_gt_region_feats.py --dataset lvis --train_val val
It produces coco/val
and lvis/val
as follows.
~/clip_region
├── coco/val
├── lvis/val
├── coco_oake_object_train
└── lvis_oake_object_train
Download v3det_{dataset}_strict.json
and v3det_gpt_noun_chunk_{dataset}_strict.pkl
which is the noun chunk file generated by GPT from here.
~
├── v3det_coco_strict.json
├── v3det_lvis_strict.json
├── v3det_gpt_noun_chunk_coco_strict.pkl
└── v3det_gpt_noun_chunk_lvis_strict.pkl
Train RAF with the following command.
python raf.py --dataset coco --work_dir output/raf_coco --concept_pkl_path v3det_gpt_noun_chunk_coco_strict.pkl --oake_file_path clip_region/coco_oake_info_strict.pkl
python raf.py --dataset lvis --work_dir output/raf_lvis --concept_pkl_path v3det_gpt_noun_chunk_lvis_strict.pkl --oake_file_path clip_region/lvis_oake_info_strict.pkl
The checkpoint is saved as output/raf_{dataset}/weight_10.pth
and will be used as {dataset}_strict.pth
under various baselines.
Also, it can be downloaded from here.