-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.qmd
More file actions
123 lines (91 loc) · 5.46 KB
/
index.qmd
File metadata and controls
123 lines (91 loc) · 5.46 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
title: "MetabolismGraph: Learning Metabolism Dynamics with Graph Neural Networks"
---
## Overview
**MetabolismGraph** is a framework for learning the structure of metabolic networks from concentration dynamics using Graph Neural Networks (GNNs). Given time-series measurements of metabolite concentrations, the model recovers:
1. **Rate constants** $k_j$ — the intrinsic speed of each reaction
2. **Functional forms** $f_{\text{sub}}(c, s)$, $f_{\text{node}}(c)$ — how substrates drive reactions and how metabolites self-regulate
See the [current results](results.qmd) for rate constant recovery on an oscillatory regime with 100 metabolites and 256 reactions.
```{mermaid}
%%| fig-width: 9
flowchart LR
subgraph met["Metabolites"]
direction TB
c1((c₁))
c2((c₂))
c3((c₃))
c4((c₄))
end
subgraph rxn["Reactions"]
direction TB
R1["R₁ · k₁"]
R2["R₂ · k₂"]
R3["R₃ · k₃"]
end
c1 -->|"−1"| R1
c2 -->|"−1"| R1
R1 -->|"+1"| c3
R1 -->|"+1"| c4
c3 -->|"−1"| R2
R2 -->|"+1"| c1
c2 -->|"−1"| R3
c4 -->|"−1"| R3
R3 -->|"+1"| c2
style c1 fill:#e1f5fe,stroke:#0277bd
style c2 fill:#e1f5fe,stroke:#0277bd
style c3 fill:#e1f5fe,stroke:#0277bd
style c4 fill:#e1f5fe,stroke:#0277bd
style R1 fill:#fff3e0,stroke:#ef6c00
style R2 fill:#fff3e0,stroke:#ef6c00
style R3 fill:#fff3e0,stroke:#ef6c00
style met fill:none,stroke:#0277bd,stroke-dasharray: 5 5
style rxn fill:none,stroke:#ef6c00,stroke-dasharray: 5 5
```
Metabolites (blue circles) and reactions (orange boxes) form a **bipartite graph** — a graph with two distinct node types where edges only connect nodes of different types. Each edge carries a stoichiometric coefficient $S_{ij}$. A standard single-partite graph (metabolite $\leftrightarrow$ metabolite) cannot represent this system because each reaction involves *multiple* substrates and products simultaneously. A single edge between two metabolites would lose the information that they participate in the *same* reaction with a specific rate constant $k_j$.
## The Full Model
The complete metabolic dynamics:
$$
\frac{dc_i}{dt} = \underbrace{-\lambda_i \cdot (c_i - c_i^{\text{baseline}})}_{\text{homeostasis}} + \underbrace{\sum_{j=1}^{m} S_{ij} \cdot v_j}_{\text{reaction dynamics}}
$$
where the reaction rate $v_j$ depends on aggregation type:
| Aggregation | Rate $v_j$ |
|-------------|------------|
| Additive | $v_j = k_j \cdot \sum_{k \in \text{sub}(j)} c_k^{|S_{kj}|}$ |
| Multiplicative | $v_j = k_j \cdot \prod_{k \in \text{sub}(j)} c_k^{|S_{kj}|}$ |
See [Model](model.qmd) for detailed equations, diagrams, and model configurations.
## The Inverse Problem
The forward model describes how concentrations evolve given all parameters. In practice, the parameters themselves are unknown. The **inverse problem** is to recover them from observed dynamics.
**Given:**
- Concentration trajectories $\{c_i(t)\}_{i=1}^{n}$ measured over time
- Stoichiometric matrix $\mathbf{S}$ (known from biochemistry)
**To learn:**
- **Substrate function** $\text{MLP}_{\text{sub}}(c_k, |S_{kj}|)$ — discovers the mass-action power law $c_k^{|S_{kj}|}$
- **Homeostasis function** $\text{MLP}_{\text{node}}(c_i)$ — discovers per-metabolite regulation $-\lambda_i(c_i - c_i^{\text{baseline}})$
- **Rate constants** $k_j$ — per-reaction speed scalars
This is challenging because the system is high-dimensional ($n$ metabolites, $m$ reactions), the mapping from parameters to dynamics is nonlinear, and multiple parameter combinations can produce similar trajectories (identifiability). Classical optimization approaches struggle with this combinatorial landscape.
We address this by casting the inverse problem as a **Graph Neural Network** learning task. The metabolic network is naturally a bipartite graph (metabolites $\leftrightarrow$ reactions), and we replace the unknown functions with learnable MLPs that operate on this graph structure. The GNN is trained end-to-end by minimizing the prediction error on $dc/dt$, recovering the rate constants and homeostatic functions simultaneously. An LLM-driven closed-loop exploration engine systematically searches the hyperparameter space — see [GNN-LLM-Memory](gnn-llm-memory.qmd) for the training scheme, regularization terms, and exploration loop.
### GNN Parameterization
$$
\frac{dc_i}{dt} = \underbrace{\text{MLP}_{\text{node}}(c_i, a_i)}_{\text{learns } -\lambda_i(c_i - c_i^{\text{baseline}})} + \sum_{j=1}^{m} S_{ij} \cdot \underbrace{k_j \cdot \prod_{k \in \text{sub}(j)} \text{MLP}_{\text{sub}}(c_k, |S_{kj}|)}_{\text{learns } k_j \text{ and } c_k^{|S_{kj}|}}
$$
where:
- $a_i \in \mathbb{R}^d$ is a **learnable embedding** for metabolite $i$
- $k_j$ are **learnable rate constants**
- $\prod$ denotes multiplicative aggregation (mass-action kinetics)
### Learnable Parameters
| Parameter | Type | Purpose |
|-----------|------|---------|
| $a_i$ | Embedding vectors | Per-metabolite identity |
| $k_j$ | Scalars | Per-reaction rate constants |
| $\text{MLP}_{\text{node}}$ | Neural network | Learns $-\lambda_i(c_i - c_i^{\text{baseline}})$ |
| $\text{MLP}_{\text{sub}}$ | Neural network | Learns $c_k^{|S_{kj}|}$ |
## Citation
If you use MetabolismGraph in your research, please cite:
```bibtex
@software{metabolismgraph2025,
author = {Allier, Cédric},
title = {MetabolismGraph: Learning Metabolism Dynamics with GNNs},
year = {2026},
url = {https://github.com/allierc/MetabolismGraph}
}
```