Please read this first
Background
Issue #2848 raised real production pain: an agent loop triggered thousands of unexpected API calls before anyone
noticed. Multiple users in that thread (@VamsiSudhakaran1, @pandego) explicitly asked for hard budget guardrails, not
dashboard alerts, as a pre-deployment gate.
The two answers offered there were:
- "Use
result.usage after the run" too late; the runaway already happened.
- "Set an OpenAI org-level monthly budget" too coarse; one bad run can still exhaust the cap, and it doesn't protect
per-customer / per-tenant flows.
The thread closed stale without a code-level mitigation. This proposal fills that gap.
Describe the feature
A small, opt-in BudgetGuard extension that enforces per-run limits using the existing RunHooks lifecycle no
changes to Runner, no behavior change for users who don't opt in.
from agents import Runner
from agents.extensions import BudgetGuard, Budget
guard = BudgetGuard(Budget(
max_total_tokens=200_000,
max_requests=50,
max_cost_usd=1.50,
))
result = await Runner.run(agent, input, hooks=guard)
# Raises BudgetExceeded(dimension="max_total_tokens", limit=200_000, actual=...)
# between turns if any limit trips.
Optional graceful degradation (downgrade model at a threshold instead of failing):
guard = BudgetGuard(
Budget(max_total_tokens=200_000),
downgrade_to="gpt-4o-mini",
downgrade_at=0.8,
pricing={"gpt-4o": (2.5e-6, 10e-6)},
)
Why this fits the SDK cleanly
Every piece needed already exists BudgetGuard just composes them:
- RunContextWrapper.usage (src/agents/run_context.py:54) is already accumulated across the run.
- RunHooksBase.on_llm_end (src/agents/lifecycle.py:28) fires after every LLM call with the latest ModelResponse —
natural enforcement point.
- MaxTurnsExceeded (src/agents/exceptions.py) is the established pattern for a typed limit-exceeded exception.
- Agent.model is mutable, so model-swap from a hook is supported (same mechanism handoffs already use).
So the surface area is:
- New module: src/agents/extensions/budget_guard.py (~150 LOC)
- New re-exports: Budget, BudgetExceeded, BudgetGuard, CompositeRunHooks
- No changes to Runner, Agent, Usage, or any existing public API.
How it compares to max_turns
max_turns is a coarse proxy — a single turn with a long context can still cost $5+. BudgetGuard enforces the dimension
users actually care about (tokens / cost / request count) and can downgrade before failing, which max_turns cannot.
Composability
Users who already pass hooks=MyHooks() aren't blocked a small CompositeRunHooks(*hooks) helper lets multiple hook
implementations stack. This also solves a generic gap (Runner.run(hooks=...) currently takes a single object).
Out of scope (explicit non-goals)
- Mid-stream abort (would require deeper Runner changes happy to discuss as a follow-up).
- Cross-run cumulative budgets (per-run only; users compose via shared context).
- Provider-specific pricing tables (user supplies pricing dict keeps the SDK provider-agnostic).
Ask
Before I open a PR (and to avoid wasting maintainer review time on an unsolicited feature): would the maintainers accept
a PR along these lines?
Happy to adjust the API shape, scope, or location (extensions/ vs. core) based on your preference. I have a working
prototype + tests ready, but won't push until there's a "yes, send it" or "we'd prefer X instead."
cc @seratch (you replied on #2848) —does this match what you'd want to see, or would you rather this stay user-side?
Please read this first
RunHooks,Usage,max_turns.Background
Issue #2848 raised real production pain: an agent loop triggered thousands of unexpected API calls before anyone
noticed. Multiple users in that thread (@VamsiSudhakaran1, @pandego) explicitly asked for hard budget guardrails, not
dashboard alerts, as a pre-deployment gate.
The two answers offered there were:
result.usageafter the run" too late; the runaway already happened.per-customer / per-tenant flows.
The thread closed stale without a code-level mitigation. This proposal fills that gap.
Describe the feature
A small, opt-in
BudgetGuardextension that enforces per-run limits using the existingRunHookslifecycle nochanges to
Runner, no behavior change for users who don't opt in.