Skip to content

AstraZeneca/boat

Repository files navigation

Python version

BOAT

A toolkit for Bayesian Optimization of Antibody Traits (BOAT). The methods naturally extend to other modalities based on sequences of amino acids.

It combines:

  • Sequence encodings (PLMs, bio-specific embeddings)
  • Bayesian & genetic optimization loops
  • Liability filtering
  • An interface to wrap models for sequence scoring
  • Modular acquisition & model abstractions for rapid experimentation

The goal: enable fast design-iteration cycles with pluggable scoring functions and flexible optimization strategies.

Key Features (overview)

  • Bayesian optimization (single & multi-objective) with BoTorch / GPyTorch
  • Genetic algorithm framework for sequence-level search
  • Encodings: one-hot, physicochemical, PLM-based, etc.
  • Liability and developability scoring utilities
  • Pluggable scoring interfaces (fake, PLM, Oasis, liabilities)

Installation

Requires: Python 3.10 or 3.11 (see pyproject). Poetry is used for dependency management.

  1. Install Poetry (if needed).

  2. Standard install (core only): poetry install

  3. Install with selected extras, e.g.:

poetry install --extras "plms"
  1. Activate virtual environment:
   eval $(poetry env activate)

and run commands with:

   poetry run python ...

Optional Extras (summary)

  • boat: Bayesian optimization stack (ablang2, blosum, botorch, gpytorch, scikit-learn)

Install any combination via:

poetry install --extras "<space separated extras>"

Quick Start

Example (pseudo) usage sketch:

from boat.bayesopt.mo_loop import MOBayesOptOnSequences
from boat.scoring_function.fake import FakeScoringFunction

loop = MOBayesOptOnSequences(
    scoring_functions=[FakeScoringFunction()],
    n_init=8,
    n_iter=5,
)
loop.run()

Replace FakeScoringFunction with real interfaces (PLM, humanness, etc.) as configured.

Project Organization

├── .github/workflows        # CI workflows (lint, test, build, publish, docs)
├── Makefile                 # Common developer shortcuts
├── Dockerfile               # Base container recipe
├── README.md
├── data/                    # Example data
├── docs/                    # MkDocs documentation project
├── pyproject.toml           # Poetry configuration & extras
└── boat/
    ├── data_utils.py        # Generic data helpers
    ├── bayesopt/            # Bayesian optimization components
    │   ├── mo_loop.py       # Multi-objective loop orchestration
    │   ├── acquisition/     # Acquisition strategies & utilities
    │   ├── encodings/       # Feature encodings for sequences
    │   ├── loop/            # Core loop utilities
    │   └── models/          # GP models, kernels, wrappers
    ├── biologics/           # Domain-specific sequence & liability helpers
    ├── genetic_algorithm/   # GA operators, optimizers, vocabularies
    ├── scoring_function/    # Unified scoring interfaces (fake, PLM, Oasis, liabilities)
    └── __init__.py

Subpackage Highlights

  • bayesopt: Acquisition functions, GP kernels, loops for sequential / multi-objective optimization.
  • genetic_algorithm: Mutation / crossover / population management for sequence search.
  • scoring_function: Abstraction layer to plug different scoring backends uniformly.
  • biologics: Sequence manipulation, liabilities and developability heuristics.

Troubleshooting

  • Missing optional features: confirm you installed correct extras.

About

Multi-objective Bayesian optimization for antibody lead optimization with multiple property predictors

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages