This originated from the March 26 StrOntEx meeting.
Status Quo
The ChEBI datasets all extend the _DynamicDataset class. This means that they produce
- a
data.pkl file with all molecule-label pairs in ChEBI
- a
splits.csv file matching ChEBI IDs to train/val/test subsets
This has the advantage of enhancing reproducibility (e.g. by only supplying the splits file and chebai version used)
Goal
Implement the same for PubChem
This originated from the March 26 StrOntEx meeting.
Status Quo
The ChEBI datasets all extend the
_DynamicDatasetclass. This means that they producedata.pklfile with all molecule-label pairs in ChEBIsplits.csvfile matching ChEBI IDs to train/val/test subsetsThis has the advantage of enhancing reproducibility (e.g. by only supplying the splits file and chebai version used)
Goal
Implement the same for PubChem