APBIO is a tool developed to predict compound-target interactions, with a specific focus on air pollutants and their bioactivity. It computes bioactivity signatures for compounds starting from the SMILES representation (e.g., C1=CC=CC=C1) and FASTA sequence features for targets starting from the UniProtKB identifier (e.g., Q9UHW9).
Bioactivity signatures are computed via the signaturizer package. For further details please visit the Chemical Checker (CC) paper, the CC signaturizers paper, and the relative repositories.
Sequence descriptors are calculated via the iFeature toolkit. Specifically, we use the main iFeature.py program and the required files in the codes and data folders. Additional information is provided in the iFeature paper and the relative repository.
To run notebooks and reproduce results, you can clone this repo and set up a conda environment using the code snippet below:
$ conda create --no-default-packages -n cti -y python=3.7.16
$ conda activate cti
$ pip install -r requirements.txt
The main methodology can be executed via the APBIO_pipeline.ipynb notebook.
If you want to perform the sampling strategy evaluation, please name the generated datasets following the examples below for each dataset name and ratio:
APchem_ratio10_sampled_CT_ds.pklAPchem_rnd_ratio10_sampled_CT_ds.pkl
and place them in the corresponding sampled and random folders under the path: /cti_datasets/AP_CTIs/.
The Streamlit web app is available at: https://ap-bio.streamlit.app/.