Skip to content

mlvlab/RegFormer

Repository files navigation

[CVPR2026] RegFormer: Transferable Relational Grounding for Efficient Weakly-Supervised Human-Object Interaction Detection

Jihwan Park1, Chanhyeong Yang2, Jinyoung Park1, Taehoon Song1, Hyunwoo J. Kim1

1KAIST 2LG Energy Solution

Installation

Create and activate the conda environment:

conda create -n regformer python=3.9
conda activate regformer

Install PyTorch:

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2

Install the remaining dependencies:

pip install -r requirements.txt

Install pocket, which is required by the detection code inherited from UPT:

git clone https://github.com/fredzzhang/pocket.git ../pocket
pip install -e ../pocket

If pocket is already available locally, install that checkout instead:

pip install -e /path/to/pocket

Data

Download HICO-DET by following the data preparation instructions in the UPT repository. Place the annotations and images under hicodet/:

hicodet/
  instances_train2015.json
  instances_test2015.json
  hico_20160224_det/
    images/
      train2015/
      test2015/

Detection file extraction expects the detector checkpoint at:

params/detr-r50-e632da11.pth

Download the DETR R50 checkpoint from the official DETR model zoo:

mkdir -p params
wget https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth -O params/detr-r50-e632da11.pth

Training

Train the weak HOI model in full mode with:

bash scripts/weak_run.sh 768 facebook/dinov2-with-registers-small openai/clip-vit-base-patch16 518

For zero-shot settings, pass the training split mode as the fifth argument:

# RF-UC
bash scripts/weak_run.sh 768 facebook/dinov2-with-registers-small openai/clip-vit-base-patch16 518 rare_first

# NF-UC
bash scripts/weak_run.sh 768 facebook/dinov2-with-registers-small openai/clip-vit-base-patch16 518 non_rare_first

Arguments:

<embed_dim>         attention/pooling embedding dimension
<vision_encoder>   Hugging Face vision encoder name
<text_encoder>     Hugging Face text encoder name
<input_resolution> image input resolution
[zs_type]          optional zero-shot split; use rare_first for RF-UC, non_rare_first for NF-UC

Checkpoints and logs are written under output/weak_hoi/.

Detection File Extraction

Before running detection evaluation, extract detector boxes for HICO-DET:

bash scripts/det_extract/hico_r50.sh

This uses params/detr-r50-e632da11.pth from the official DETR repository by default and writes:

data/hicodet_pkl_files/hicodet_test_bbox_R50_detr-r50-e632da11.p

Detection Evaluation

After training, apply the detector using the weak output directory:

bash scripts/apply_detection.sh {weak_output_dir}/final_model.pth

Detection outputs are saved under {weak_output_dir}/detection/ by default.

Acknowledgements

This repository builds on components from ADA-CM and UPT. We thank the authors for releasing their code.

Citation

If you find this work useful, please cite:

@inproceedings{park2026regformer,
  title     = {RegFormer: Transferable Relational Grounding for Efficient Weakly-Supervised Human-Object Interaction Detection},
  author    = {Park, Jihwan and Yang, Chanhyeong and Park, Jinyoung and Song, Taehoon and Kim, Hyunwoo J.},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}

About

[CVPR2026] Official Implementation of "RegFormer: Transferable Relational Grounding for Efficient Weakly-Supervised Human-Object Interaction Detection"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages