This is an official repository for our TNNLS 2025 paper:
Diversify and Conquer: Open-set Disagreement for Robust Semi-supervised Learning with Outliers
Heejo Kong, Sung-Jin Kim, Gunho Jung, Seong-Whan Lee*
[Paper (IEEE)] [Paper (arXiv)] [Models and Logs] [BibTeX]
This is an example of how to set up DAC locally. We implement DAC using the SSL and Open-set SSL PyTorch benchmarks, USB and IOMatch.
To get a local copy up, running follow these simple example steps.
Our open-set SSL benchmark is built on pytorch, with torchvision, torchaudio, and transformers.
To install the required packages, you can create a conda environment:
conda create --name dac python=3.10then use pip to install required packages:
pip install -r requirements.txtFrom now on, you can start use our benchmark by typing
python train.py --c config/classic_cv/fixmatch/fixmatch_cifar100_50_1250_1.yamlAll datasets are supposed to be under ./data (or create soft links) as follows.
DivCon
├── config
└── ...
├── data
├── cifar10
└── cifar-10-batches-py
└── cifar100
└── cifar-100-python
└── imagenet30
└── one_class_test
└── one_class_train
└── semi-inat-2021
└── l_train
└── u_train
└── test
└── ood_data
├── semilearn
└── ...
└── ...
The detailed instructions for downloading and processing are shown in Dataset Download. Please follow it to download datasets before running or developing algorithms.
The data For ImageNet-30 and Semi-iNat-2021 can be downloaded from the following repositories, OpenMatch and semi-inat-2021.
The out-of-dataset testing data for extended open-set evaluation can be downloaded in CSI.
You can also develop your own open-set SSL algorithm and evaluate it by cloning this repository:
git clone https://github.com/heejokong/DivCon.gitWe implement Diversify and Conquer (DAC) using the codebase of USB.
Here is an example to train DAC on CIFAR-100 with the seen/unseen split of "50/50" and 25 labels per seen class (i.e., the task CIFAR-50-1250 with 1250 labeled samples in total).
# seed = 1
CUDA_VISIBLE_DEVICES=0 python train.py --c config/openset_cv/dac/dac_cifar100_50_1250_1.yamlTraining DAC on other datasets with different open-set SSL settings can be specified by a config file:
# CIFAR10, seen/unseen split of 6/4, 25 labels per seen class (CIFAR-6-150), seed = 1
CUDA_VISIBLE_DEVICES=0 python train.py --c config/openset_cv/dac/dac_cifar10_6_150_1.yaml
# CIFAR100, seen/unseen split of 20/80, 25 labels per seen class (CIFAR-20-500), seed = 1
CUDA_VISIBLE_DEVICES=0 python train.py --c config/openset_cv/dac/dac_cifar100_20_500_1.yaml
# ImageNet30, seen/unseen split of 20/10, 5% labeled data (ImageNet-20-p5), seed = 1
CUDA_VISIBLE_DEVICES=0 python train.py --c config/openset_cv/dac/dac_in30_p5_1.yamlAfter training, the best checkpoints will be saved in ./saved_models. The closed-set performance has been reported in the training logs. For the open-set evaluation, please see eval.py.
Through hyper-parameter tuning, we achieved slight improvements over the results reported in the TNNLS paper.
For CIFAR-10 and ImageNet-30, experiments were rerun with
| CIFAR-6-30 | CIFAR-6-60 | CIFAR-6-150 | ImageNet-20-P1 | ImageNet-20-P5 | |
|---|---|---|---|---|---|
| Seed=1 | 87.13 | 93.90 | 92.12 | 86.10 | 93.35 |
| Seed=2 | 86.95 | 93.55 | 93.83 | 88.15 | 93.25 |
| Seed=3 | 91.38 | 93.70 | 94.05 | 86.40 | 93.50 |
| Average | 88.49 | 93.72 | 93.33 | 86.88 | 93.37 |
| CIFAR-20-100 | CIFAR-20-200 | CIFAR-20-500 | CIFAR-50-250 | CIFAR-50-500 | CIFAR-50-1250 | |
|---|---|---|---|---|---|---|
| Seed=1 | 51.45 | 60.30 | 66.95 | 56.56 | 65.40 | 71.26 |
| Seed=2 | 59.55 | 62.60 | 68.75 | 56.58 | 64.62 | 70.58 |
| Seed=3 | 54.40 | 57.30 | 65.75 | 60.80 | 64.98 | 71.28 |
| Average | 55.13 | 60.07 | 67.15 | 57.98 | 65.00 | 71.04 |
| CIFAR-6-30 | CIFAR-6-60 | CIFAR-6-150 | ImageNet-20-P1 | ImageNet-20-P5 | |
|---|---|---|---|---|---|
| Seed=1 | 72.18 | 78.07 | 77.35 | 82.09 | 88.69 |
| Seed=2 | 72.19 | 77.99 | 78.10 | 77.69 | 89.11 |
| Seed=3 | 77.00 | 78.11 | 81.31 | 77.64 | 85.50 |
| Average | 73.79 | 78.06 | 78.92 | 79.14 | 87.77 |
| CIFAR-20-100 | CIFAR-20-200 | CIFAR-20-500 | CIFAR-50-250 | CIFAR-50-500 | CIFAR-50-1250 | |
|---|---|---|---|---|---|---|
| Seed=1 | 48.64 | 56.82 | 64.00 | 53.68 | 62.77 | 65.99 |
| Seed=2 | 54.03 | 58.57 | 64.74 | 54.59 | 63.08 | 69.03 |
| Seed=3 | 51.79 | 54.36 | 62.38 | 58.14 | 62.01 | 67.24 |
| Average | 51.49 | 56.58 | 63.71 | 55.47 | 62.62 | 67.42 |
We sincerely thank the authors of USB (NeurIPS'22) for creating such an awesome SSL benchmark.
We sincerely thank the authors of the following projects for sharing the code of their great works:
@article{kong2025diversify,
title={Diversify and Conquer: Open-Set Disagreement for Robust Semi-Supervised Learning With Outliers},
author={Kong, Heejo and Kim, Sung-Jin and Jung, Gunho and Lee, Seong-Whan},
journal={IEEE Transactions on Neural Networks and Learning Systems},
year={2025},
publisher={IEEE}
}

