Agentic BTE 🧬🤖

AI-Powered Biomedical Knowledge Graph Queries
Intelligent agents for drug discovery, disease research, and biomedical question answering using BioThings Explorer

🎯 Overview

Agentic BTE is a next-generation biomedical research platform that combines the power of:

🧠 Large Language Models (GPT-4, Claude) for intelligent query understanding
🕸️ Knowledge Graphs via BioThings Explorer for comprehensive biomedical data
🤖 AI Agents using LangGraph and MCP protocols for autonomous research workflows
⚡ Query Optimization with advanced decomposition and planning strategies

✨ Key Features

🔬 Biomedical NER: Extract and link biomedical entities using spaCy/SciSpaCy + LLMs
🧬 Smart Query Classification: Automatically categorize research questions for optimal processing
⚙️ Query Optimization: Decompose complex questions into optimized subquery strategies
🌐 Multi-Agent Architecture: LangGraph orchestration with specialized research agents
🔌 MCP Server: Model Context Protocol integration for seamless AI tool usage
📊 Entity Resolution: Map biomedical IDs to human-readable names
🎯 Drug Discovery Focus: Specialized workflows for therapeutic research

🚀 Quick Start

Installation

# Install from PyPI (when available)
pip install agentic-bte

# Or install from source
git clone https://github.com/mastorga589/agentic-bte.git
cd agentic-bte
pip install -e .

Setup Environment

# Copy environment template
cp .env.example .env

# Edit with your API keys
export AGENTIC_BTE_OPENAI_API_KEY="your-openai-key"
export AGENTIC_BTE_DEBUG_MODE=true

Install SpaCy Models (Optional but Recommended)

# Install biomedical NLP models
python -m spacy download en_core_sci_lg
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bc5cdr_md-0.5.1.tar.gz

📖 Usage Examples

🔬 Basic Entity Recognition

from agentic_bte.core.entities import extract_entities

# Extract biomedical entities from text
entities = extract_entities("What drugs can treat Alzheimer's disease?")
print(entities)
# Output: ['drugs', 'treat', "Alzheimer's disease"]

🤖 MCP Server Usage

Start the MCP server:

agentic-bte-mcp

Then use with any MCP-compatible client:

# Example MCP client usage
from mcp import Client

client = Client("agentic-bte-mcp")

# Ask complex biomedical questions
result = await client.call_tool(
    "plan_and_execute_query",
    {
        "query": "Which drugs can treat Angina Pectoris by acting on vasodilation?",
        "execute_after_plan": True,
        "max_results": 10
    }
)

print(result["final_answer"])
# Gets: AI-generated list of vasodilatory drugs for angina

🧠 LangGraph Multi-Agent Workflow

from agentic_bte.agents import BiomedicalResearchAgent

# Initialize the multi-agent system
agent = BiomedicalResearchAgent()

# Run complex research workflow
result = agent.research(
    query="How does metformin treat diabetes?",
    include_mechanism=True,
    max_depth=3
)

print(result.summary)
# Gets: Comprehensive mechanism of action analysis

⚡ Direct API Usage

from agentic_bte.core.queries import QueryClassifier, QueryDecomposer
from agentic_bte.core.knowledge import BTEClient

# Classify query type
classifier = QueryClassifier()
query_type = classifier.classify("What genes cause breast cancer?")
print(query_type)  # QueryType.DISEASE_GENE

# Decompose into subqueries  
decomposer = QueryDecomposer()
plan = decomposer.decompose(
    query="What drugs treat diabetes?",
    query_type=QueryType.DISEASE_TREATMENT
)

# Execute against BTE knowledge graph
bte_client = BTEClient()
results = bte_client.execute_plan(plan)

🏗️ Architecture

🧱 Core Components

agentic_bte/
├── core/                      # Core biomedical processing
│   ├── entities/              # Entity recognition & linking  
│   ├── queries/               # Query classification & optimization
│   └── knowledge/             # Knowledge graph interactions
├── agents/                    # AI agent implementations
├── servers/                   # Server implementations (MCP, API)
└── utils/                     # Shared utilities

🔄 Processing Pipeline

🔤 Entity Recognition: Extract biomedical entities using spaCy/SciSpaCy + LLMs
🎯 Query Classification: Determine query type (drug mechanism, disease treatment, etc.)
⚙️ Query Decomposition: Break complex queries into optimized subqueries
🌐 Knowledge Graph Query: Execute TRAPI queries against BTE knowledge graph
🧠 Result Synthesis: Generate human-readable answers using LLMs
📊 Entity Resolution: Map IDs to readable names for final presentation

🤖 Agent Architectures

MCP Server

graph LR
    A[MCP Client] --> B[MCP Server Wrapper]
    B --> C[Core Processing Pipeline]
    C --> D[Entity Recognition]
    C --> E[Query Planning]
    C --> F[BTE Execution]
    D --> G[Final Answer]
    E --> G
    F --> G
    G --> B
    B --> A

The MCP Server acts as a lightweight wrapper that exposes the core biomedical processing pipeline as MCP-compatible tools. It provides a standardized interface for AI assistants (Claude, ChatGPT, etc.) to access the same entity recognition, query planning, and knowledge graph execution capabilities used by the LangGraph agents. This allows seamless integration with any MCP-compatible client while maintaining a single unified codebase.

LangGraph Multi-Agent

graph TB
    A[Query] --> B[Annotator Agent]
    B --> C[Planner Agent] 
    C --> D[BTE Search Agent]
    D --> C
    C --> E[Synthesis Agent]
    E --> F[Final Research Report]

The complex biomedical query is first processed by the Annotator Agent to tag all biomedical entities in the query with their standardized IDs. The Planner Agent then decomposes the query into single-hop subqueries, which are converted into actionable TRAPI queries by the BTE Search Agent and executed against BioThings Explorer.

This is an iterative process where results accumulate in an RDF knowledge graph. After each search, the Planner evaluates whether sufficient information has been gathered to answer the original query. If more information is needed, it generates the next subquery based on existing results. Once complete, all accumulated results and the original query are passed to the Synthesis Agent, which generates the final comprehensive research report.

🔧 Configuration

Environment Variables

Variable	Description	Default
`AGENTIC_BTE_OPENAI_API_KEY`	OpenAI API key	Required
`AGENTIC_BTE_OPENAI_MODEL`	OpenAI model	`gpt-4o`
`AGENTIC_BTE_MAX_SUBQUERIES`	Max subqueries per decomposition	`10`
`AGENTIC_BTE_CONFIDENCE_THRESHOLD`	Minimum result confidence	`0.7`
`AGENTIC_BTE_DEBUG_MODE`	Enable debug logging	`False`

Advanced Configuration

from agentic_bte.config import settings

# Customize processing parameters
settings.max_subqueries = 15
settings.confidence_threshold = 0.8
settings.enable_semantic_classification = True

# Exclude noisy predicates
settings.excluded_predicates = [
    "biolink:related_to",
    "biolink:associated_with"
]

📊 Supported Query Types

Query Type	Description	Example	Complexity
Drug Mechanism	How drugs work	"How does aspirin prevent heart attacks?"	⭐⭐⭐⭐
Disease Treatment	What treats diseases	"What drugs treat diabetes?"	⭐⭐⭐
Gene Function	What genes do	"What does the BRCA1 gene do?"	⭐⭐⭐
Drug Target	Drug-protein interactions	"What proteins does ibuprofen target?"	⭐⭐
Disease Gene	Genes causing diseases	"What genes cause Alzheimer's?"	⭐⭐⭐
Pathway Analysis	Biological pathways	"What pathways regulate apoptosis?"	⭐⭐⭐⭐

🧪 Examples & Notebooks

Explore comprehensive examples in the examples/ directory:

📓 Drug Discovery Demo: End-to-end drug discovery workflow
🧬 Entity Resolution Demo: Biomedical entity processing
⚡ Query Optimization Demo: Advanced query strategies
📈 Benchmarking Studies: Performance comparisons

🧪 Testing

# Run all tests
pytest

# Run specific test categories
pytest -m unit          # Unit tests only
pytest -m integration   # Integration tests
pytest -m external      # Tests requiring external services

# Run with coverage
pytest --cov=agentic_bte --cov-report=html

🤝 Contributing

We welcome contributions! Please see our Contributing Guide.

Development Setup

# Clone repository
git clone https://github.com/mastorga589/agentic-bte.git
cd agentic-bte

# Install development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Run code formatting
black agentic_bte/
isort agentic_bte/

# Run type checking
mypy agentic_bte/

📚 Documentation

📖 User Guide: Comprehensive usage documentation
🏗️ Architecture Guide: System design and components
🔧 API Reference: Complete API documentation
🚀 Deployment Guide: Production deployment instructions

🔬 Research & Publications

Built upon cutting-edge research in:

Biomedical Knowledge Graphs: BioThings Explorer, Translator ecosystem
Large Language Models: GPT-4, Claude for biomedical reasoning
Multi-Agent Systems: LangGraph orchestration patterns
Query Optimization: TRAPI query decomposition strategies

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

BioThings Explorer team for the amazing knowledge graph infrastructure
LangChain/LangGraph for the multi-agent framework
spaCy/SciSpaCy for biomedical NLP models
OpenAI for GPT-4 API access
NCATS Translator program for biomedical data standards

🚀 What's Next?

🔍 Vector Search: Semantic similarity search over biomedical literature
📱 Web Interface: Interactive query builder and result visualization
🧬 Multi-Modal: Integration with biomedical images and molecular structures
🌐 Federation: Multi-knowledge graph federation beyond BTE
📊 Analytics: Query performance and result quality analytics

Happy Researching! 🧬✨

For questions, issues, or collaboration opportunities, please open an issue or reach out to our team.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
Prototype		Prototype
agentic_bte		agentic_bte
archive_dev_files_20260130		archive_dev_files_20260130
debug_output		debug_output
docs		docs
examples		examples
logs/evaluations		logs/evaluations
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
mcp-config.json		mcp-config.json
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic BTE 🧬🤖

🎯 Overview

✨ Key Features

🚀 Quick Start

Installation

Setup Environment

Install SpaCy Models (Optional but Recommended)

📖 Usage Examples

🔬 Basic Entity Recognition

🤖 MCP Server Usage

🧠 LangGraph Multi-Agent Workflow

⚡ Direct API Usage

🏗️ Architecture

🧱 Core Components

🔄 Processing Pipeline

🤖 Agent Architectures

MCP Server

LangGraph Multi-Agent

🔧 Configuration

Environment Variables

Advanced Configuration

📊 Supported Query Types

🧪 Examples & Notebooks

🧪 Testing

🤝 Contributing

Development Setup

📚 Documentation

🔬 Research & Publications

📄 License

🙏 Acknowledgments

🚀 What's Next?

About

Uh oh!

Releases

Packages

Languages

License

SuLab/agentic-bte

Folders and files

Latest commit

History

Repository files navigation

Agentic BTE 🧬🤖

🎯 Overview

✨ Key Features

🚀 Quick Start

Installation

Setup Environment

Install SpaCy Models (Optional but Recommended)

📖 Usage Examples

🔬 Basic Entity Recognition

🤖 MCP Server Usage

🧠 LangGraph Multi-Agent Workflow

⚡ Direct API Usage

🏗️ Architecture

🧱 Core Components

🔄 Processing Pipeline

🤖 Agent Architectures

MCP Server

LangGraph Multi-Agent

🔧 Configuration

Environment Variables

Advanced Configuration

📊 Supported Query Types

🧪 Examples & Notebooks

🧪 Testing

🤝 Contributing

Development Setup

📚 Documentation

🔬 Research & Publications

📄 License

🙏 Acknowledgments

🚀 What's Next?

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages