Skip to content

heejokong/DivCon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diversify and Conquer (DAC) for Open-Set Semi-Supervised Learning

Introduction

This is an official repository for our TNNLS 2025 paper:

Diversify and Conquer: Open-set Disagreement for Robust Semi-supervised Learning with Outliers
Heejo Kong, Sung-Jin Kim, Gunho Jung, Seong-Whan Lee*
[Paper (IEEE)] [Paper (arXiv)] [Models and Logs] [BibTeX]

Getting Started

This is an example of how to set up DAC locally. We implement DAC using the SSL and Open-set SSL PyTorch benchmarks, USB and IOMatch.

To get a local copy up, running follow these simple example steps.

Prerequisites

Our open-set SSL benchmark is built on pytorch, with torchvision, torchaudio, and transformers.

To install the required packages, you can create a conda environment:

conda create --name dac python=3.10

then use pip to install required packages:

pip install -r requirements.txt

From now on, you can start use our benchmark by typing

python train.py --c config/classic_cv/fixmatch/fixmatch_cifar100_50_1250_1.yaml

Dataset Preparation

All datasets are supposed to be under ./data (or create soft links) as follows.

DivCon
├── config
    └── ...
├── data
    ├── cifar10
        └── cifar-10-batches-py
    └── cifar100
        └── cifar-100-python
    └── imagenet30
        └── one_class_test
        └── one_class_train
    └── semi-inat-2021
        └── l_train
        └── u_train
        └── test
    └── ood_data
├── semilearn
    └── ...
└── ...

The detailed instructions for downloading and processing are shown in Dataset Download. Please follow it to download datasets before running or developing algorithms.

The data For ImageNet-30 and Semi-iNat-2021 can be downloaded from the following repositories, OpenMatch and semi-inat-2021.

The out-of-dataset testing data for extended open-set evaluation can be downloaded in CSI.

Development

You can also develop your own open-set SSL algorithm and evaluate it by cloning this repository:

git clone https://github.com/heejokong/DivCon.git

Usage

We implement Diversify and Conquer (DAC) using the codebase of USB.

Training

Here is an example to train DAC on CIFAR-100 with the seen/unseen split of "50/50" and 25 labels per seen class (i.e., the task CIFAR-50-1250 with 1250 labeled samples in total).

# seed = 1
CUDA_VISIBLE_DEVICES=0 python train.py --c config/openset_cv/dac/dac_cifar100_50_1250_1.yaml

Training DAC on other datasets with different open-set SSL settings can be specified by a config file:

# CIFAR10, seen/unseen split of 6/4, 25 labels per seen class (CIFAR-6-150), seed = 1
CUDA_VISIBLE_DEVICES=0 python train.py --c config/openset_cv/dac/dac_cifar10_6_150_1.yaml

# CIFAR100, seen/unseen split of 20/80, 25 labels per seen class (CIFAR-20-500), seed = 1
CUDA_VISIBLE_DEVICES=0 python train.py --c config/openset_cv/dac/dac_cifar100_20_500_1.yaml

# ImageNet30, seen/unseen split of 20/10, 5% labeled data (ImageNet-20-p5), seed = 1
CUDA_VISIBLE_DEVICES=0 python train.py --c config/openset_cv/dac/dac_in30_p5_1.yaml

Evaluation

After training, the best checkpoints will be saved in ./saved_models. The closed-set performance has been reported in the training logs. For the open-set evaluation, please see eval.py.

Experimental Results

Through hyper-parameter tuning, we achieved slight improvements over the results reported in the TNNLS paper. For CIFAR-10 and ImageNet-30, experiments were rerun with $\lambda_{mi}$ set to 0.05; for CIFAR-100, with $\lambda_{mi}$ set to 0.10. The trained models and training logs are available at this page.

Closed-Set Classification Accuracy

CIFAR-6-30 CIFAR-6-60 CIFAR-6-150 ImageNet-20-P1 ImageNet-20-P5
Seed=1 87.13 93.90 92.12 86.10 93.35
Seed=2 86.95 93.55 93.83 88.15 93.25
Seed=3 91.38 93.70 94.05 86.40 93.50
Average 88.49 93.72 93.33 86.88 93.37
CIFAR-20-100 CIFAR-20-200 CIFAR-20-500 CIFAR-50-250 CIFAR-50-500 CIFAR-50-1250
Seed=1 51.45 60.30 66.95 56.56 65.40 71.26
Seed=2 59.55 62.60 68.75 56.58 64.62 70.58
Seed=3 54.40 57.30 65.75 60.80 64.98 71.28
Average 55.13 60.07 67.15 57.98 65.00 71.04

Open-Set Classification Accuracy

CIFAR-6-30 CIFAR-6-60 CIFAR-6-150 ImageNet-20-P1 ImageNet-20-P5
Seed=1 72.18 78.07 77.35 82.09 88.69
Seed=2 72.19 77.99 78.10 77.69 89.11
Seed=3 77.00 78.11 81.31 77.64 85.50
Average 73.79 78.06 78.92 79.14 87.77
CIFAR-20-100 CIFAR-20-200 CIFAR-20-500 CIFAR-50-250 CIFAR-50-500 CIFAR-50-1250
Seed=1 48.64 56.82 64.00 53.68 62.77 65.99
Seed=2 54.03 58.57 64.74 54.59 63.08 69.03
Seed=3 51.79 54.36 62.38 58.14 62.01 67.24
Average 51.49 56.58 63.71 55.47 62.62 67.42

Comparisons with Other Baselines

Closed-Set Classification Accuracy

Open-Set Classification Accuracy

Acknowledgments

We sincerely thank the authors of USB (NeurIPS'22) for creating such an awesome SSL benchmark.

We sincerely thank the authors of the following projects for sharing the code of their great works:

Citation

@article{kong2025diversify,
  title={Diversify and Conquer: Open-Set Disagreement for Robust Semi-Supervised Learning With Outliers},
  author={Kong, Heejo and Kim, Sung-Jin and Jung, Gunho and Lee, Seong-Whan},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  year={2025},
  publisher={IEEE}
}

About

[TNNLS 2025] Official PyTorch Implementation of "Diversify and Conquer: Open-set Disagreement for Robust Semi-supervised Learning with Outliers"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors