Skip to content

Commit c36c7e2

Browse files
committed
feat: add financial governance evaluators (spend limits + transaction policy)
Implements the financial governance evaluator proposed in #129, following the technical guidance from @lan17: 1. Decoupled from data source — SpendStore protocol with pluggable backends (InMemorySpendStore included, PostgreSQL/Redis via custom implementation) 2. No new tables in core agent control — self-contained contrib package 3. Context-aware limits — channel/agent/session overrides via evaluate metadata 4. Python SDK compatible — standard Evaluator interface, works with both server and SDK evaluation engine Two evaluators: - financial_governance.spend_limit: Cumulative spend tracking with per-transaction caps and rolling period budgets - financial_governance.transaction_policy: Static policy enforcement (currency allowlists, recipient blocklists, amount bounds) 53 tests passing. Closes #129 Signed-off-by: up2itnow0822 <up2itnow0822@users.noreply.github.com> Signed-off-by: up2itnow0822 <up2itnow0822@gmail.com> Signed-off-by: up2itnow0822 <up2itnow0822@users.noreply.github.com>
1 parent da05f98 commit c36c7e2

13 files changed

Lines changed: 2064 additions & 0 deletions

File tree

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
# Financial Governance Evaluators for Agent Control
2+
3+
Evaluators that enforce financial spend limits and transaction policies for autonomous AI agents.
4+
5+
As agents transact autonomously via protocols like [x402](https://github.com/coinbase/x402) and payment layers like [agentpay-mcp](https://github.com/AI-Agent-Economy/agentpay-mcp), enterprises need governance over what agents spend. These evaluators bring financial policy enforcement into the Agent Control framework.
6+
7+
## Evaluators
8+
9+
### `financial_governance.spend_limit`
10+
11+
Tracks cumulative agent spend and enforces rolling budget limits. Stateful — records approved transactions and checks new ones against accumulated spend.
12+
13+
- **Per-transaction cap** — reject any single payment above a threshold
14+
- **Rolling period budget** — reject payments that would exceed a time-windowed budget
15+
- **Context-aware overrides** — different limits per channel, agent, or session via evaluate metadata
16+
- **Pluggable storage** — abstract `SpendStore` protocol with built-in `InMemorySpendStore`; bring your own PostgreSQL, Redis, etc.
17+
18+
### `financial_governance.transaction_policy`
19+
20+
Static policy checks with no state tracking. Enforces structural rules on individual transactions.
21+
22+
- **Currency allowlist** — only permit specific currencies (e.g., `["USDC", "USDT"]`)
23+
- **Recipient blocklist/allowlist** — control which addresses an agent can pay
24+
- **Amount bounds** — minimum and maximum per-transaction limits
25+
26+
## Installation
27+
28+
```bash
29+
# From the repo root (development)
30+
cd evaluators/contrib/financial-governance
31+
pip install -e ".[dev]"
32+
```
33+
34+
## Configuration
35+
36+
### Spend Limit
37+
38+
```yaml
39+
controls:
40+
- name: spend-limit
41+
evaluator:
42+
type: financial_governance.spend_limit
43+
config:
44+
max_per_transaction: 100.0 # Max USDC per single payment
45+
max_per_period: 1000.0 # Rolling 24h budget
46+
period_seconds: 86400 # Budget window (default: 24 hours)
47+
currency: USDC # Currency to govern
48+
selector:
49+
path: input # Extract step.input (transaction dict)
50+
action: deny
51+
```
52+
53+
### Transaction Policy
54+
55+
```yaml
56+
controls:
57+
- name: transaction-policy
58+
evaluator:
59+
type: financial_governance.transaction_policy
60+
config:
61+
allowed_currencies: [USDC, USDT]
62+
blocked_recipients: ["0xDEAD..."]
63+
allowed_recipients: ["0xALICE...", "0xBOB..."]
64+
min_amount: 0.01
65+
max_amount: 5000.0
66+
selector:
67+
path: input
68+
action: deny
69+
```
70+
71+
## Selector Paths
72+
73+
Both evaluators support two selector configurations:
74+
75+
- **`selector.path: "input"`** (recommended) — The evaluator receives `step.input` directly, which should be the transaction dict.
76+
- **`selector.path: "*"`** — The evaluator receives the full Step object. It automatically extracts `step.input` for transaction fields and `step.context` for channel/agent/session metadata.
77+
78+
## Input Data Schema
79+
80+
The transaction dict (from `step.input`) should contain:
81+
82+
```python
83+
# step.input — transaction payload
84+
{
85+
"amount": 50.0, # required — transaction amount
86+
"currency": "USDC", # required — payment currency
87+
"recipient": "0xABC...", # required — payment recipient
88+
}
89+
```
90+
91+
## Context-Aware Limits
92+
93+
Context fields (`channel`, `agent_id`, `session_id`) and per-context limit overrides can be provided in two ways:
94+
95+
**Option A: Via `step.context`** (recommended for engine integration)
96+
97+
```python
98+
step = Step(
99+
type="tool",
100+
name="payment",
101+
input={"amount": 75.0, "currency": "USDC", "recipient": "0xABC"},
102+
context={
103+
"channel": "experimental",
104+
"agent_id": "agent-42",
105+
"channel_max_per_transaction": 50.0,
106+
"channel_max_per_period": 200.0,
107+
},
108+
)
109+
```
110+
111+
When using `selector.path: "*"`, the evaluator merges `step.context` fields into the transaction data automatically. When using `selector.path: "input"`, context fields must be included directly in `step.input`.
112+
113+
**Option B: Inline in the transaction dict** (simpler, for direct SDK use)
114+
115+
```python
116+
result = await evaluator.evaluate({
117+
"amount": 75.0,
118+
"currency": "USDC",
119+
"recipient": "0xABC",
120+
"channel": "experimental",
121+
"channel_max_per_transaction": 50.0,
122+
"channel_max_per_period": 200.0,
123+
})
124+
```
125+
126+
Spend budgets are **scoped by context** — spend in channel A does not count against channel B's budget. When no context fields are present, budgets are global.
127+
128+
## Custom SpendStore
129+
130+
The `SpendStore` protocol requires two methods. Implement them for your backend:
131+
132+
```python
133+
from agent_control_evaluator_financial_governance.spend_limit import (
134+
SpendStore,
135+
SpendLimitConfig,
136+
SpendLimitEvaluator,
137+
)
138+
139+
class PostgresSpendStore:
140+
"""Example: PostgreSQL-backed spend tracking."""
141+
142+
def __init__(self, connection_string: str):
143+
self._conn = connect(connection_string)
144+
145+
def record_spend(self, amount: float, currency: str, metadata: dict | None = None) -> None:
146+
self._conn.execute(
147+
"INSERT INTO agent_spend (amount, currency, metadata, recorded_at) VALUES (%s, %s, %s, NOW())",
148+
(amount, currency, json.dumps(metadata)),
149+
)
150+
151+
def get_spend(self, currency: str, since_timestamp: float) -> float:
152+
row = self._conn.execute(
153+
"SELECT COALESCE(SUM(amount), 0) FROM agent_spend WHERE currency = %s AND recorded_at >= to_timestamp(%s)",
154+
(currency, since_timestamp),
155+
).fetchone()
156+
return float(row[0])
157+
158+
# Use it:
159+
store = PostgresSpendStore("postgresql://...")
160+
evaluator = SpendLimitEvaluator(config, store=store)
161+
```
162+
163+
## Running Tests
164+
165+
```bash
166+
cd evaluators/contrib/financial-governance
167+
pip install -e ".[dev]"
168+
pytest tests/ -v
169+
```
170+
171+
## Design Decisions
172+
173+
1. **Decoupled from data source** — The `SpendStore` protocol means no new tables in core Agent Control. Bring your own persistence.
174+
2. **Context-aware limits** — Override keys in the evaluate data dict allow per-channel, per-agent, or per-session limits without multiple evaluator instances.
175+
3. **Python SDK compatible** — Uses the standard evaluator interface; works with both the server and the Python SDK evaluation engine.
176+
4. **Fail-open on errors** — Missing or malformed data returns `matched=False` with an `error` field, following Agent Control conventions.
177+
178+
## Related Projects
179+
180+
- [x402](https://github.com/coinbase/x402) — HTTP 402 payment protocol
181+
- [agentpay-mcp](https://github.com/up2itnow0822/agentpay-mcp) — MCP server for non-custodial agent payments
182+
183+
## License
184+
185+
Apache-2.0 — see [LICENSE](../../../LICENSE).
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
[project]
2+
name = "agent-control-evaluator-financial-governance"
3+
version = "0.1.0"
4+
description = "Financial governance evaluators for agent-control — spend limits and transaction policy enforcement"
5+
readme = "README.md"
6+
requires-python = ">=3.12"
7+
license = { text = "Apache-2.0" }
8+
authors = [{ name = "agent-control contributors" }]
9+
keywords = ["agent-control", "evaluator", "financial", "spend-limit", "x402", "agentpay"]
10+
classifiers = [
11+
"Development Status :: 4 - Beta",
12+
"Intended Audience :: Developers",
13+
"License :: OSI Approved :: Apache Software License",
14+
"Programming Language :: Python :: 3",
15+
"Programming Language :: Python :: 3.12",
16+
"Topic :: Software Development :: Libraries",
17+
]
18+
dependencies = [
19+
"agent-control-evaluators>=3.0.0",
20+
"agent-control-models>=3.0.0",
21+
]
22+
23+
[project.optional-dependencies]
24+
dev = [
25+
"pytest>=8.0.0",
26+
"pytest-asyncio>=0.23.0",
27+
"pytest-cov>=4.0.0",
28+
"ruff>=0.1.0",
29+
"mypy>=1.8.0",
30+
]
31+
32+
[project.entry-points."agent_control.evaluators"]
33+
"financial_governance.spend_limit" = "agent_control_evaluator_financial_governance.spend_limit:SpendLimitEvaluator"
34+
"financial_governance.transaction_policy" = "agent_control_evaluator_financial_governance.transaction_policy:TransactionPolicyEvaluator"
35+
36+
[build-system]
37+
requires = ["hatchling"]
38+
build-backend = "hatchling.build"
39+
40+
[tool.hatch.build.targets.wheel]
41+
packages = ["src/agent_control_evaluator_financial_governance"]
42+
43+
[tool.ruff]
44+
line-length = 100
45+
target-version = "py312"
46+
47+
[tool.ruff.lint]
48+
select = ["E", "F", "I"]
49+
50+
[tool.pytest.ini_options]
51+
asyncio_mode = "auto"
52+
53+
[tool.uv.sources]
54+
agent-control-evaluators = { path = "../../builtin", editable = true }
55+
agent-control-models = { path = "../../../models", editable = true }
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
"""Financial governance evaluators for agent-control.
2+
3+
Provides two evaluators for enforcing financial policy on AI agent transactions:
4+
5+
- ``financial_governance.spend_limit``: Tracks cumulative spend against rolling
6+
period budgets and per-transaction caps.
7+
- ``financial_governance.transaction_policy``: Static policy checks — allowlists,
8+
blocklists, amount bounds, and permitted currencies.
9+
10+
Both evaluators are registered automatically when this package is installed and
11+
the ``agent_control.evaluators`` entry point group is discovered.
12+
13+
Example usage in an agent-control control config::
14+
15+
{
16+
"condition": {
17+
"selector": {"path": "*"},
18+
"evaluator": {
19+
"name": "financial_governance.spend_limit",
20+
"config": {
21+
"max_per_transaction": 100.0,
22+
"max_per_period": 1000.0,
23+
"period_seconds": 86400,
24+
"currency": "USDC"
25+
}
26+
}
27+
},
28+
"action": {"decision": "deny"}
29+
}
30+
"""
31+
32+
from agent_control_evaluator_financial_governance.spend_limit import (
33+
SpendLimitConfig,
34+
SpendLimitEvaluator,
35+
)
36+
from agent_control_evaluator_financial_governance.transaction_policy import (
37+
TransactionPolicyConfig,
38+
TransactionPolicyEvaluator,
39+
)
40+
41+
__all__ = [
42+
"SpendLimitEvaluator",
43+
"SpendLimitConfig",
44+
"TransactionPolicyEvaluator",
45+
"TransactionPolicyConfig",
46+
]
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
"""Spend-limit evaluator package."""
2+
3+
from .config import SpendLimitConfig
4+
from .evaluator import SpendLimitEvaluator
5+
from .store import InMemorySpendStore, SpendStore
6+
7+
__all__ = [
8+
"SpendLimitEvaluator",
9+
"SpendLimitConfig",
10+
"SpendStore",
11+
"InMemorySpendStore",
12+
]
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
"""Configuration model for the spend-limit evaluator."""
2+
3+
from __future__ import annotations
4+
5+
from pydantic import Field, field_validator
6+
7+
from agent_control_evaluators import EvaluatorConfig
8+
9+
10+
class SpendLimitConfig(EvaluatorConfig):
11+
"""Configuration for :class:`~.evaluator.SpendLimitEvaluator`.
12+
13+
All monetary fields are expressed in the units of *currency*.
14+
15+
Attributes:
16+
max_per_transaction: Hard cap on any single transaction amount. A
17+
transaction whose ``amount`` exceeds this value is blocked
18+
regardless of accumulated period spend. Set to ``0.0`` to disable.
19+
max_per_period: Maximum total spend allowed within the rolling
20+
*period_seconds* window. Set to ``0.0`` to disable.
21+
period_seconds: Length of the rolling budget window in seconds.
22+
Defaults to ``86400`` (24 hours).
23+
currency: Currency symbol this policy applies to (e.g. ``"USDC"``).
24+
Transactions whose currency does not match are passed through as
25+
*not matched* (i.e. allowed).
26+
27+
Example config dict::
28+
29+
{
30+
"max_per_transaction": 500.0,
31+
"max_per_period": 5000.0,
32+
"period_seconds": 86400,
33+
"currency": "USDC"
34+
}
35+
"""
36+
37+
max_per_transaction: float = Field(
38+
default=0.0,
39+
ge=0.0,
40+
description=(
41+
"Per-transaction spend cap in *currency* units. "
42+
"0.0 means no per-transaction limit."
43+
),
44+
)
45+
max_per_period: float = Field(
46+
default=0.0,
47+
ge=0.0,
48+
description=(
49+
"Maximum cumulative spend allowed in the rolling period window. "
50+
"0.0 means no period limit."
51+
),
52+
)
53+
period_seconds: int = Field(
54+
default=86_400,
55+
ge=1,
56+
description="Rolling budget window length in seconds (default: 86400 = 24 h).",
57+
)
58+
currency: str = Field(
59+
...,
60+
min_length=1,
61+
description="Currency symbol this policy applies to (e.g. 'USDC', 'ETH').",
62+
)
63+
64+
@field_validator("currency")
65+
@classmethod
66+
def normalize_currency(cls, v: str) -> str:
67+
"""Normalize currency symbol to upper-case for consistent comparison."""
68+
return v.upper()

0 commit comments

Comments
 (0)