Skip to content

arnodjiang/Fake-HR1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[⭐️ICASSP 2026] Fake-HR1: Rethinking Reasoning of vision language model for Synthetic Image Detection

Changjiang Jiang1, Xinkuan Sha2, Fengchang Yu1,*, Jingjing Liu2,*, Jian Liu2,*, Mingqi Fang2, Chenfeng Zhang3, Wei Lu1

1 Wuhan University   2 Ant Group   3 Zhejiang University
* Corresponding author

Environment

pip install -r requirements.txt

Dataset

FakeClue GenImage

Training

SFT

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 \
NNODES=$WORLD_SIZE \
NODE_RANK=$RANK \
MASTER_ADDR=${MASTER_ADDR} \
SIZE_FACTOR=8 \
NPROC_PER_NODE=16 \
swift sft \
    --model "./Qwen2.5-VL-7B-Instruct" \
    --model_type "qwen2_5_vl" \
    --train_type full \
    --dataset "./GenImage/train_cleaned_0915.json" \
    "./FakeClue/data_json/train_clean_0916.json" \
    --split_dataset_ratio 0.001 \
    --max_length 8192 \
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 4 \
    --learning_rate 1e-6 \
    --gradient_accumulation_steps 2 \
    --freeze_vit false \
    --freeze_llm false \
    --freeze_aligner false \
    --save_steps 1000 \
    --save_total_limit 2 \
    --logging_steps 200 \
    --output_dir ./checkpoints/fakeclue_0917 \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --attn_impl flash_attention_2 \
    --deepspeed zero2 \
    --system "You are a helpful assistant for AI-generated image detection. Inspect the image and decide if it is real or fake.\nReasoning mode:\n- If the image shows **obvious AI-generation traces** and is easy to detect, give no think steps.\n- Otherwise, provide **careful, step-by-step** reasoning. If the image is easy to detect, output no think steps: <think>\n\n</think>\n\nreal or fake. If the user requests explain the think or the image is hard to detect, output the think steps: \n<think>\n[Your reasoning here]\n</think>\n\nreal or fake." \
    --response_prefix "<think>\n" \
    --max_pixels 12845056 \
    --gradient_checkpointing true

RL

verl: https://github.com/THUDM/verl

NNODES=$WORLD_SIZE \
NODE_RANK=$RANK \
MASTER_ADDR=${MASTER_ADDR} \
python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    custom_reward_function.name="compute_score" \
    custom_reward_function.path="./reward.py" \
    data.train_files=$train_file_path \
    data.train_batch_size=32 \
    data.max_prompt_length=8192 \
    data.max_response_length=2048 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.image_key=images \
    actor_rollout_ref.model.path=$model_path \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=128 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.04 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=8 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.max_num_batched_tokens=10240 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=False \
    actor_rollout_ref.rollout.n=8 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.kl_ctrl.kl_coef=0.004 \
    trainer.default_local_dir=$model_output_path \
    trainer.critic_warmup=0 \
    trainer.logger=['console','tensorboard'] \
    trainer.project_name='verl_grpo_example_geo3k' \
    trainer.experiment_name='test' \
    trainer.n_gpus_per_node=16 \
    trainer.nnodes=1 \
    trainer.test_freq=50 \
    trainer.total_epochs=1

Evaluation

First get inference results:

You need to organize your dataset in the following format (jsonl):

{
    "file_path": "image_0.jpg",
    "label": "fake or real"
}

Then run the evaluation script:

cd eva

infer_path=eva_total_ood_test.jsonl model_path=Qwen3-VL-30B-A3B-Instruct to_path=qwen25_vl_infer.jsonl
python qwen25_vl_infer.py \
  --root "$infer_path" \
  --to_path "$to_path" \
  --model_path "$model_path"

$to_path will contain the inference results.

About

[⭐️ICASSP 2026] Fake-HR1: Rethinking Reasoning of vision language model for Synthetic Image Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages