This repository contains the implementation of the method described in our paper, "Divide and Conquer: Isolating Normal-Abnormal Attributes in Knowledge Graph-Enhanced Radiology Report Generation".
To set up the necessary environment:
- Clone the repository:
git clone https://github.com/yourusername/DCG_Enhanced_distilGPT2.git cd DCG_Enhanced_distilGPT2 - Install the latest PyTorch:
- Visit PyTorch's official website to find the command suitable for your system configuration.
- Install the required dependencies:
pip install -r requirements.txt
Store all the pre-trained weights in the ./checkpoint/ directory. Below are the details and corresponding links for each:
-
BiomedCLIP (for offline retrieval)
-
MedSAM (for image encoder)
-
distilgpt2 (for text and node encoder)
-
chextbert and bert (for validation)
- chextbert:
- bert:
-
MIMIC-CXR:
- Download from Physionet.
- Place the files in
dataset/mimic_cxr/images. Ensure the pathdataset/mimic_cxr_jpg/physionet.org/files/mimic-cxr-jpg/2.0.0/filesexists. Note: This dataset requires authorization.
-
Chen et al. Labels for MIMIC-CXR:
- Download from one of the following sources:
- Place
annotations.jsonindataset/mimic_cxr. The path should bedataset/mimic_cxr/annotations.json.
- Chen et al. Labels and Chest X-Rays in PNG Format for IU X-Ray:
- Download from one of the following sources:
- Place the files into
dataset/iu_x-ray. Ensure the pathsdataset/iu_x-ray/annotations.jsonanddataset/iu_x-ray/imagesexist.
Note: The dataset directory can be configured for each task using the dataset_dir variable in config/train_mimic_cxr.yaml and config/train_iu_xray.yaml.
To run the project, follow these steps:
-
(Optional) Use BiomedCLIP to initialize image features and perform offline retrieval. The results have been pre-saved in
./dataset/iu_xray/annotation_top5.jsonand./dataset/mimic_cxr/annotation_top5.json. For specific steps, refer totools/offline_retrieval. -
(Optional) Extract entities from the retrieved reports and initialize them as node features and adjacency matrices. Our pre-processed results are saved in
./dataset/iu_xray/node_mapping.json,node_features_gpt2.h5,adjacency_matrix_191, and./dataset/mimic_cxr/adjacency_matrix_276. For specific steps, refer totools/generate_graph. -
Model training and validation:
python train_ver4_iu_xray.pyor
python train_ver4_mimic.py -
Checkpoint and Generate report: Comming soon
Note: The complete execution steps, code for processing image and graph features (only for IU-Xray; MIMIC-CXR requires authorization), and the weights will be uploaded later.
- See
folder_structure.txt
If you find our work useful, please consider citing our paper:
Comming soon
This project is built upon cvt2distilgpt2 and MedSAM. We would like to thank them for their great work.