A Python tool for batch predicting plant protein subcellular localization via Plant-mPLoc.
This project provides a Python script designed to automate batch submissions to the Plant-mPLoc server developed by the CSBio Laboratory at Shanghai Jiao Tong University. It overcomes the web interface's limitation of only allowing one query at a time by automatically parsing and exporting results from large FASTA files.
- Batch Processing: Supports querying hundreds of plant protein sequences at once using a standard FASTA file.
- Dual Mode: Offers an interactive drag-and-drop interface for beginners and a Command Line Interface (CLI) for automation.
- Smart Parsing: Accurately extracts results, including multi-site predictions (proteins localized to multiple cellular components).
- Server-Friendly: Features a built-in 2-second delay between requests to prevent overloading the academic server and avoid IP bans.
git clone https://github.com/Sherry520/Cell-PLoc_batch.git
cd Cell-PLoc_batchThe script is built on Python 3. Ensure you have the required external libraries (requests and beautifulsoup4) installed before running the script:
pip install requests beautifulsoup4The script supports two flexible modes of operation:
Simply run the script. It will prompt you to enter the path, where you can directly drag and drop your FASTA file into the terminal window:
python cell-ploc-batch.pyNote: If no output path is specified, the script will automatically generate an output file named [input_filename]_subcellular_results.txt in the same directory as your input file.
Pass the input (-i) and output (-o) paths directly as arguments in your terminal:
# Provide only the input file (output is auto-generated)
python cell-ploc-batch.py -i sequences.fasta
# Provide both input and explicit output files
python cell-ploc-batch.py -i ./data/sequences.fasta -o ./results/my_results.txtTo run the script directly as an executable without typing python, grant it execution permissions:
chmod +x cell-ploc-batch.py
./cell-ploc-batch.py -i sequences.fastaTroubleshooting: If you encounter a \r: No such file or directory error on Linux, it means the file contains Windows line endings. Fix it by running: sed -i 's/\r$//' cell-ploc-batch.py.
You can bundle this script into a standalone executable that runs on machines without a Python environment using PyInstaller.
.exe, while building on Linux creates a extensionless binary).
pip install pyinstallerpyinstaller -F cell-ploc-batch.pyOnce completed, find your standalone application inside the newly created dist/ directory:
-
Windows:
dist/cell-ploc-batch.exe -
Linux/macOS:
dist/cell-ploc-batch
The output is exported as a standard tab-separated values (.txt / .tsv) file. It can be opened directly with Excel or any text editor:
Protein_ID Predicted_Location
Zm00001eb010460 Cell membrane
Zm00001eb018230 Cell membrane
Zm00001eb018250 Cell membrane
-
This tool serves strictly as an automation wrapper for submitting web forms. The core predictive algorithm and hosting services belong entirely to the CSBio Laboratory at Shanghai Jiao Tong University.
-
If you use results obtained via this tool in an academic publication, please make sure to cite the official Plant-mPLoc paper:
Kuo-Chen Chou and Hong-Bin Shen, "Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization", PLoS ONE, 2010, 5: e11335.
-
Please use this tool responsibly. Do not remove the time delays or abuse the academic server with excessive concurrent requests.