Skip to content

Proposal: per-run BudgetGuard for token / request / cost limits (follow-up to #2848) #3353

@Quratulain-bilal

Description

@Quratulain-bilal

Please read this first

Background

Issue #2848 raised real production pain: an agent loop triggered thousands of unexpected API calls before anyone
noticed. Multiple users in that thread (@VamsiSudhakaran1, @pandego) explicitly asked for hard budget guardrails, not
dashboard alerts
, as a pre-deployment gate.

The two answers offered there were:

  1. "Use result.usage after the run" too late; the runaway already happened.
  2. "Set an OpenAI org-level monthly budget" too coarse; one bad run can still exhaust the cap, and it doesn't protect
    per-customer / per-tenant flows.

The thread closed stale without a code-level mitigation. This proposal fills that gap.

Describe the feature

A small, opt-in BudgetGuard extension that enforces per-run limits using the existing RunHooks lifecycle no
changes to Runner, no behavior change for users who don't opt in.

from agents import Runner
from agents.extensions import BudgetGuard, Budget

guard = BudgetGuard(Budget(
    max_total_tokens=200_000,
    max_requests=50,
    max_cost_usd=1.50,
))

result = await Runner.run(agent, input, hooks=guard)
# Raises BudgetExceeded(dimension="max_total_tokens", limit=200_000, actual=...)
# between turns if any limit trips.

Optional graceful degradation (downgrade model at a threshold instead of failing):

guard = BudgetGuard(
    Budget(max_total_tokens=200_000),
    downgrade_to="gpt-4o-mini",
    downgrade_at=0.8,
    pricing={"gpt-4o": (2.5e-6, 10e-6)},
)

Why this fits the SDK cleanly

Every piece needed already exists  BudgetGuard just composes them:

- RunContextWrapper.usage (src/agents/run_context.py:54) is already accumulated across the run.
- RunHooksBase.on_llm_end (src/agents/lifecycle.py:28) fires after every LLM call with the latest ModelResponsenatural enforcement point.
- MaxTurnsExceeded (src/agents/exceptions.py) is the established pattern for a typed limit-exceeded exception.
- Agent.model is mutable, so model-swap from a hook is supported (same mechanism handoffs already use).

So the surface area is:
- New module: src/agents/extensions/budget_guard.py (~150 LOC)
- New re-exports: Budget, BudgetExceeded, BudgetGuard, CompositeRunHooks
- No changes to Runner, Agent, Usage, or any existing public API.

How it compares to max_turns

max_turns is a coarse proxya single turn with a long context can still cost $5+. BudgetGuard enforces the dimension
users actually care about (tokens / cost / request count) and can downgrade before failing, which max_turns cannot.

Composability

Users who already pass hooks=MyHooks() aren't blocked  a small CompositeRunHooks(*hooks) helper lets multiple hook
implementations stack. This also solves a generic gap (Runner.run(hooks=...) currently takes a single object).

Out of scope (explicit non-goals)

- Mid-stream abort (would require deeper Runner changes  happy to discuss as a follow-up).
- Cross-run cumulative budgets (per-run only; users compose via shared context).
- Provider-specific pricing tables (user supplies pricing dict  keeps the SDK provider-agnostic).

Ask

Before I open a PR (and to avoid wasting maintainer review time on an unsolicited feature): would the maintainers accept
 a PR along these lines?

Happy to adjust the API shape, scope, or location (extensions/ vs. core) based on your preference. I have a working
prototype + tests ready, but won't push until there's a "yes, send it" or "we'd prefer X instead."

cc @seratch (you replied on #2848) —does this match what you'd want to see, or would you rather this stay user-side?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions