MetabolismGraph/index.qmd at main · allierc/MetabolismGraph · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
title: "MetabolismGraph: Learning Metabolism Dynamics with Graph Neural Networks"
---

## Overview

**MetabolismGraph** is a framework for learning the structure of metabolic networks from concentration dynamics using Graph Neural Networks (GNNs). Given time-series measurements of metabolite concentrations, the model recovers:

1. **Rate constants** $k_j$ — the intrinsic speed of each reaction
2. **Functional forms** $f_{\text{sub}}(c, s)$, $f_{\text{node}}(c)$ — how substrates drive reactions and how metabolites self-regulate

See the [current results](results.qmd) for rate constant recovery on an oscillatory regime with 100 metabolites and 256 reactions.

```{mermaid}
%%| fig-width: 9
flowchart LR
    subgraph met["Metabolites"]
        direction TB
        c1((c₁))
        c2((c₂))
        c3((c₃))
        c4((c₄))
    end

    subgraph rxn["Reactions"]
        direction TB
        R1["R₁ · k₁"]
        R2["R₂ · k₂"]
        R3["R₃ · k₃"]
    end

    c1 -->|"−1"| R1
    c2 -->|"−1"| R1
    R1 -->|"+1"| c3
    R1 -->|"+1"| c4
    c3 -->|"−1"| R2
    R2 -->|"+1"| c1
    c2 -->|"−1"| R3
    c4 -->|"−1"| R3
    R3 -->|"+1"| c2

    style c1 fill:#e1f5fe,stroke:#0277bd
    style c2 fill:#e1f5fe,stroke:#0277bd
    style c3 fill:#e1f5fe,stroke:#0277bd
    style c4 fill:#e1f5fe,stroke:#0277bd
    style R1 fill:#fff3e0,stroke:#ef6c00
    style R2 fill:#fff3e0,stroke:#ef6c00
    style R3 fill:#fff3e0,stroke:#ef6c00
    style met fill:none,stroke:#0277bd,stroke-dasharray: 5 5
    style rxn fill:none,stroke:#ef6c00,stroke-dasharray: 5 5
```

Metabolites (blue circles) and reactions (orange boxes) form a **bipartite graph** — a graph with two distinct node types where edges only connect nodes of different types. Each edge carries a stoichiometric coefficient $S_{ij}$. A standard single-partite graph (metabolite $\leftrightarrow$ metabolite) cannot represent this system because each reaction involves *multiple* substrates and products simultaneously. A single edge between two metabolites would lose the information that they participate in the *same* reaction with a specific rate constant $k_j$.

## The Full Model

The complete metabolic dynamics:

$$
\frac{dc_i}{dt} = \underbrace{-\lambda_i \cdot (c_i - c_i^{\text{baseline}})}_{\text{homeostasis}} + \underbrace{\sum_{j=1}^{m} S_{ij} \cdot v_j}_{\text{reaction dynamics}}
$$

where the reaction rate $v_j$ depends on aggregation type:

| Aggregation | Rate $v_j$ |
|-------------|------------|
| Additive | $v_j = k_j \cdot \sum_{k \in \text{sub}(j)} c_k^{|S_{kj}|}$ |
| Multiplicative | $v_j = k_j \cdot \prod_{k \in \text{sub}(j)} c_k^{|S_{kj}|}$ |

See [Model](model.qmd) for detailed equations, diagrams, and model configurations.

## The Inverse Problem

The forward model describes how concentrations evolve given all parameters. In practice, the parameters themselves are unknown. The **inverse problem** is to recover them from observed dynamics.

**Given:**

- Concentration trajectories $\{c_i(t)\}_{i=1}^{n}$ measured over time
- Stoichiometric matrix $\mathbf{S}$ (known from biochemistry)

**To learn:**

- **Substrate function** $\text{MLP}_{\text{sub}}(c_k, |S_{kj}|)$ — discovers the mass-action power law $c_k^{|S_{kj}|}$
- **Homeostasis function** $\text{MLP}_{\text{node}}(c_i)$ — discovers per-metabolite regulation $-\lambda_i(c_i - c_i^{\text{baseline}})$
- **Rate constants** $k_j$ — per-reaction speed scalars

This is challenging because the system is high-dimensional ($n$ metabolites, $m$ reactions), the mapping from parameters to dynamics is nonlinear, and multiple parameter combinations can produce similar trajectories (identifiability). Classical optimization approaches struggle with this combinatorial landscape.

We address this by casting the inverse problem as a **Graph Neural Network** learning task. The metabolic network is naturally a bipartite graph (metabolites $\leftrightarrow$ reactions), and we replace the unknown functions with learnable MLPs that operate on this graph structure. The GNN is trained end-to-end by minimizing the prediction error on $dc/dt$, recovering the rate constants and homeostatic functions simultaneously. An LLM-driven closed-loop exploration engine systematically searches the hyperparameter space — see [GNN-LLM-Memory](gnn-llm-memory.qmd) for the training scheme, regularization terms, and exploration loop.

### GNN Parameterization

$$
\frac{dc_i}{dt} = \underbrace{\text{MLP}_{\text{node}}(c_i, a_i)}_{\text{learns } -\lambda_i(c_i - c_i^{\text{baseline}})} + \sum_{j=1}^{m} S_{ij} \cdot \underbrace{k_j \cdot \prod_{k \in \text{sub}(j)} \text{MLP}_{\text{sub}}(c_k, |S_{kj}|)}_{\text{learns } k_j \text{ and } c_k^{|S_{kj}|}}
$$

where:

- $a_i \in \mathbb{R}^d$ is a **learnable embedding** for metabolite $i$
- $k_j$ are **learnable rate constants**
- $\prod$ denotes multiplicative aggregation (mass-action kinetics)

### Learnable Parameters

| Parameter | Type | Purpose |
|-----------|------|---------|
| $a_i$ | Embedding vectors | Per-metabolite identity |
| $k_j$ | Scalars | Per-reaction rate constants |
| $\text{MLP}_{\text{node}}$ | Neural network | Learns $-\lambda_i(c_i - c_i^{\text{baseline}})$ |
| $\text{MLP}_{\text{sub}}$ | Neural network | Learns $c_k^{|S_{kj}|}$ |

## Citation

If you use MetabolismGraph in your research, please cite:

```bibtex
@software{metabolismgraph2025,
  author = {Allier, Cédric},
  title = {MetabolismGraph: Learning Metabolism Dynamics with GNNs},
  year = {2026},
  url = {https://github.com/allierc/MetabolismGraph}
}
```