Skip to content

Latest commit

 

History

History
60 lines (49 loc) · 2.4 KB

File metadata and controls

60 lines (49 loc) · 2.4 KB

Training Code for SeC

This folder contains the training code for SeC, which is trained in two stages.

Dataset and Model Preparation

1. Dataset Preparation

Download the SA-V dataset and organize it in the following format. Before training, please extract video frames first following the script here

dataset
├── SA-V
│   ├── metavideo                   # the original SA-V dataset
│   │   ├── {video_id}_manual.json  # video annotations
│   │   ├── {video_id}.mp4
│   │   └── ...
│   ├── train                       # extracted video frames for training
│   │   ├── {video_id}
│   │   │   ├── 00000.jpg           # video frame
│   │   │   ├── 00004.jpg           # video frame
│   │   │   └── ...
│   │   ├── {video_id}
│   │   └── ...
├── sav_scene_top2k.txt
└── sav_video_obj_ids.txt

2. Pretrained Model Preparation

Download the InternVL2_5-4B and SAM2.1-Hiera-L, and organize them in the following format.

pretrained_models
├── InternVL2_5-4B
├── sam2.1_hiera_large.pt
└── reshape.py

Then, please run the reshape.py script to convert the weights of the SAM model into the format suitable for SeC:

Training

Stage 1: Enhanced Pixel-level Association Module

Fine-tune the SAM-based model with memory enhancement:

export PYTHONPATH=.
python training/sam2/training/train.py \
    -c "sec_sam2.1_hiera_l_finetune_maskmem22.yaml" \
    --use-cluster 0 \
    --num-gpus 8

Checkpoints and logs will be saved to work_dirs/sec_sam2/sec_sam2.1_hiera_l_finetune_maskmem22.yaml/.

Stage 2: Concept Guidance Module

Train the Concept Guidance Module using distributed training. Please replace the model_path and sam2_path in training/sec/configs/sec-4b.py with the paths to the pretrained InternVL2_5-4B model and the fine-tuned SAM model from Stage 1, respectively.

export PYTHONPATH=.
bash tools/dist.sh train "training/sec/configs/sec_4b.py" 8

Checkpoints and logs will be saved to work_dirs/sec_4b/.