A containerized benchmarking system for Multiple Sequence Alignment (MSA) tools using the BAliBASE benchmark dataset.
# Build the Docker image
docker build -t msa-benchmark .
# Run the benchmark
docker run -v "${PWD}:/app" -w /app msa-benchmark python3 main.py- Automated Tool Installation: MAFFT, MUSCLE, and Clustal Omega are automatically installed
- BAliBASE Integration: Automatic download and processing of BAliBASE benchmark datasets
- Multiple Format Support: Handles FASTA, MSF, and RSF alignment formats
- Comprehensive Scoring: Calculates both SP (Sum-of-Pairs) and TC (Total Column) scores
- Result Visualization: Generates performance comparison plots and summary statistics
- Docker-Based: Runs entirely in container with no host dependencies
- Docker Desktop (Windows/macOS) or Docker Engine (Linux)
- At least 4GB of available RAM
- 2GB of free disk space
If you prefer not to use Docker, you'll need to install the following:
- Python 3.8 or higher
- pip (Python package manager)
- MAFFT:
- Windows: Download from MAFFT website
- Linux:
sudo apt-get install mafft(Ubuntu/Debian) orsudo yum install mafft(CentOS/RHEL) - macOS:
brew install mafft
- MUSCLE:
- Windows: Download from MUSCLE website
- Linux:
sudo apt-get install muscle(Ubuntu/Debian) orsudo yum install muscle(CentOS/RHEL) - macOS:
brew install muscle
- Clustal Omega:
- Windows: Download from Clustal Omega website
- Linux:
sudo apt-get install clustalo(Ubuntu/Debian) orsudo yum install clustalo(CentOS/RHEL) - macOS:
brew install clustal-omega
Make sure all installed tools are available in your system's PATH.
-
Clone this repository:
git clone https://github.com/ibrqures-uf/compass.git cd compass -
Choose your installation method:
Build the Docker image:
docker build -t msa-benchmark .Install Python dependencies:
pip install -r requirements.txt
docker run -v "${PWD}:/app" -w /app msa-benchmark python3 main.pydocker run -e BENCH_LIMIT=5 -v "${PWD}:/app" -w /app msa-benchmark python3 main.pydocker run --memory=4g --cpus=2 -v "${PWD}:/app" -w /app msa-benchmark python3 main.pyThe benchmark generates several outputs in the results/ directory:
results/benchmark_results.csv: Raw benchmark dataresults/alignments/: Generated MSA filesresults/figures/:accuracy_comparison.png: SP/TC score comparisonefficiency_comparison.png: Runtime/memory usageperformance_by_refset.png: Performance across reference sets
- SP Score (Sum-of-Pairs): Measures alignment accuracy by comparing aligned residue pairs
- TC Score (Total Column): Measures the fraction of correctly aligned columns
- Runtime: Execution time in seconds
- Memory Usage: Peak memory usage in MB
BENCH_LIMIT: Limit the number of sequences to process (e.g.,5for testing)PYTHONPATH: Automatically set by Docker to/app
- Minimal: 2GB RAM, 1 CPU
- Recommended: 4GB RAM, 2 CPUs
- Full Dataset: 8GB RAM, 4 CPUs
| Tool | Version | Status |
|---|---|---|
| MAFFT | Latest | ✅ Included |
| MUSCLE | Latest | ✅ Included |
| Clustal Omega | Latest | ✅ Included |
| T-Coffee | - | |
| ProbCons | - |
Contributions are welcome! Please feel free to submit pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
- BAliBASE dataset providers
- Developers of MAFFT, MUSCLE, and Clustal Omega
- Python Bio community
For issues and questions:
- Create an issue in the repository
- Include detailed reproduction steps
- Attach relevant error messages and logs