Phenotype augmentation using generative AI
for isocitrate dehydrogenase mutation prediction in glioma
2Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine
3Department of Biomedical Engineering, AMIST, Asan Medical Center, University of Ulsan College of Medicine
4Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
5Department of Statistics and Data Science, Korea National Open University, Seoul, Korea
6Department of Korea and Center for Imaging Science, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
* corresponding author
This study investigated the effects of feature augmentation, which uses generated images with specific imaging features, on the performance of isocitrate dehydrogenase (IDH) mutation prediction models in gliomas. A total of 598 patients were included from our institution (310 training, 152 internal test) and the Cancer Genome Atlas (136 external test). Score-based diffusion models were used to generate T2-weighted, FLAIR, and contrast-enhanced T1-weighted image triplets. Three neuroradiologists independently assessed visual Turing tests and various morphological features. Multivariable logistic regression models were developed using real images, random augmented data, and feature-augmented datasets. While random augmentation yielded models with AUCs comparable to real image-based models, it led to reduced specificity, particularly in the external test set (specificity: 83.2% vs. 73.0%, P = .013). In contrast, feature-augmented models maintained stable diagnostic performance; however, when more than 70% of training images included synthetic T2-FLAIR mismatch signs, AUC decreased in the external test set (AUC: 0.905–0.906 for ≤ 70%; 0.902–0.876 for ≥ 80%). These findings highlight the value of phenotype-specific augmentation for IDH prediction, while emphasizing the need to optimize augmentation proportion to avoid performance degradation.
Install the other packages in requirements.txt, jax, jaxlib, numpy, and opencv-python as following:
pip install -r requirements.txt
pip install jax==0.4.6 jaxlib==0.4.6 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.htm
pip install numpy==1.23.0
pip install opencv-python==4.5.5.64For example, you should set dataset path following:
root_path
├── train
├── <Patient_Folder>
├── T1CE
├── 0001.npy
├── 0002.npy
└── 0003.npy
├── T2
└── FLAIR
└── test
python main.py --config='configs/ve/t1t2flair.py' --workdir='result' --mode=trainModel checkpoints and validation samples will be stored in ./result/checkpoints and ./result/samples, respectively.
python t1t2flair_sampling.pySampling results will be stored in ./result/generated_images as png file.
Our main code is heavily based on score_sde_pytorch.
@article{jung2025idh,
title={Phenotype augmentation using generative AI for isocitrate dehydrogenase mutation prediction in glioma},
author={Jung, Ha Kyung and Choi, Changyong and Park, Ji Eun and Park, Seo Young and Lee, Jae Ho and Kim, Namkug and Kim, Ho Sung},
journal={Scientific Reports},
volume={15},
number={1},
pages={28913},
year={2025},
publisher={Nature Publishing Group UK London},
doi={10.1038/s41598-025-14477-z}
}
