A web API and client for aggregating genomic information from multiple data sources.
Imagine if every appliance you purchased came with its own unique power cord requiring custom wiring to fit your outlets. Instead, appliances are designed to plug into standardized outlets — no special knowledge required. This is what an API accomplishes for data.
Genomics data presents a particular challenge: extreme variability in both the data itself and the formats used to represent it. Gene-Genie addresses this by consolidating information from multiple API sources into a unified interface, allowing you to query and cross-reference genomic databases without navigating their underlying complexity.
Data Sources:
- Entrez — Access to PubMed, PMC, Gene, and Nuccore databases
- String-db.org — Protein annotations and network interactions
- UniProt — Resource ID tags and cross-references
- GenomeWeb — Genomics news and articles
- Search genes by name and retrieve functional annotations
- Cross-reference identifiers across multiple databases
- Fetch related articles from PubMed
- Display protein-protein interaction networks
- Tag cloud visualization of database cross-references
The project consists of two components:
- API Layer — Flask-based REST API that wraps external genomics data sources
- Web Client — Flask application with a search interface styled using Bulma
Note: A migration to FastAPI is planned. See Roadmap for details.
- Python 3.8+
- pip
# Clone the repository
git clone https://github.com/<your-username>/gene_genie.git
cd gene_genie
# Create and activate virtual environment
python3 -m venv venv
source ./venv/bin/activate
# Install dependencies
pip install -r requirements.txtcd gene_genie
source ./venv/bin/activate
export FLASK_APP=gene_genie.py
export FLASK_ENV=development
flask run -h localhost -p 8000The web client will be available at http://localhost:8000/.
The web client provides a search interface for querying genomic data:
Search Page
- Header — Navigation with link to home
- Search — Gene name entry form
- Results — Display area for query results
Results Page
- Annotations — Functional annotations from String-db.org
- Tags — Cross-reference IDs to external databases
- Related Articles — Publications from PubMed
- Interaction Graph — Protein network visualization (when available)
| HTTP Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1.0/stringdb/annotation/<term> |
Retrieve annotation from String-db.org |
| POST | /api/v1.0/stringdb/annotations/<term> |
Retrieve multiple annotations from String-db.org |
| POST | /api/v1.0/uniprot/<term> |
Retrieve resource ID tags from UniProt |
| GET | /api/v1.0/pubmed/summary/<pmid> |
Retrieve summary article from PubMed |
| GET | /api/v1.0/genomeweb/article/<gwid> |
Retrieve single news item from GenomeWeb |
| GET | /api/v1.0/genomeweb/articles/ |
Retrieve multiple news items from GenomeWeb |
# String-db annotations
curl -i http://localhost:8000/api/v1.0/stringdb/annotation/BRCA1
curl -i http://localhost:8000/api/v1.0/stringdb/annotations/BRCA1
# UniProt cross-references
curl -i http://localhost:8000/api/v1.0/uniprot/BRCA1
# PubMed article summary
curl -i http://localhost:8000/api/v1.0/pubmed/summary/31411802
# GenomeWeb news
curl -i http://localhost:8000/api/v1.0/genomeweb/article/1
curl -i http://localhost:8000/api/v1.0/genomeweb/articles/UniProt cross-references (/api/v1.0/uniprot/BRCA1):
{
"EMBL": "AF355273",
"GO": "GO:0016567",
"ID": "Q90Z51_CHICK",
"KEGG": "gga:373983",
"KO": "K10605",
"PROSITE": "PS50089",
"Pfam": "PF00097",
"Reactome": "R-GGA-351433",
"RefSeq": "NP_989500.1",
"SMART": "SM00184"
}PubMed summary (/api/v1.0/pubmed/summary/<pmid>):
{
"PMID": "31411802",
"Title": "BRCA mutation frequency and clinical features of ovarian...",
"Author": "Bu H",
"Journal": "J Obstet Gynaecol Res",
"Abstract": "Subjects with germline BRCA1/2 mutations (gBRCAm) have...",
"Citation": "Bu H - J Obstet Gynaecol Res",
"Url": "http://www.ncbi.nlm.nih.gov/pubmed/31411802"
}GenomeWeb article (/api/v1.0/genomeweb/article/<gwid>):
{
"feature_items": [
{
"date": "Aug 20, 2019",
"link": "USPSTF Formalizes Slightly Expanded BRCA Screening Recommendations",
"id": 1
}
]
}These scenarios illustrate how Gene-Genie serves different users in a genomics workflow.
A developer building a bioinformatics pipeline needs to map an Entrez GeneID to corresponding identifiers in UniProt, KEGG, and other databases. Rather than querying each resource separately, she calls the UniProt endpoint with a gene name and receives a unified set of cross-references.
curl -i http://localhost:8000/api/v1.0/uniprot/BRCA1A graduate student researching a specific gene wants to find recently published articles. Using a PubMed ID obtained from previous results, he retrieves article metadata including title, authors, abstract, and a direct link to the publication.
curl -i http://localhost:8000/api/v1.0/pubmed/summary/31411802The same student recalls that BRCA1 interacts with other genes as part of a network. The String-db endpoint returns functional annotations and can visualize protein-protein interaction networks, helping clarify the gene's role in broader biological pathways.
curl -i http://localhost:8000/api/v1.0/stringdb/annotations/BRCA1A non-technical user wants to learn more about genomics in the news. The GenomeWeb endpoint provides access to curated articles, making current developments accessible without requiring domain expertise.
curl -i http://localhost:8000/api/v1.0/genomeweb/articles/# Install dependencies on EC2 instance
sudo apt-get install python3-venv
# Clone and set up the project
git clone https://github.com/<your-username>/gene_genie.git
cd gene_genie
python3 -m venv venv
source ./venv/bin/activate
pip install -r requirements.txt
# Run the application
./run.shThe application will be accessible at http://<your-ec2-public-ip>:8000/.
Tip: For production deployments, consider using Gunicorn with Nginx as a reverse proxy.
Individual API modules can be run standalone for testing and development:
source ./venv/bin/activate
# Entrez API
./app/api/entrez_api.py
# String-db API
./app/api/stringdb_api.py
# UniProt API
./app/api/uniprot_api.py
# PubMed API
./app/api/pubmed_api.pyGenomeWeb API (interactive):
from app.api import genome_web_api
gw_api = genome_web_api.GenomeWebAPI()
items = gw_api.fetch()
print(items)Framework Migration:
- Migrate from Flask to FastAPI
- Add OpenAPI/Swagger documentation (automatic with FastAPI)
Async & Performance:
- Implement async support for concurrent API requests
- Add Celery task queue with Redis backend for request scheduling
- Implement Ajax for browser-based async calls
- Add response caching
Infrastructure:
- Containerize with Docker
- Add Docker Compose for local development
Testing & Quality:
- Add unit tests (expand Flask-Testing coverage)
- Add integration tests for external API endpoints
- Set up CI/CD pipeline
MIT License

