Proposal: per-run BudgetGuard for token / request / cost limits (follow-up to #2848)

### Please read this first

  - [x] Have you read the docs? Yes  `RunHooks`, `Usage`, `max_turns`.
  - [x] Have you searched for related issues? Yes  #2848 (closed stale, no resolution).

  ### Background

  Issue #2848 raised real production pain: an agent loop triggered **thousands of unexpected API calls** before anyone
  noticed. Multiple users in that thread (@VamsiSudhakaran1, @pandego) explicitly asked for **hard budget guardrails, not
  dashboard alerts**, as a pre-deployment gate.

  The two answers offered there were:
  1. "Use `result.usage` **after** the run" too late; the runaway already happened.
  2. "Set an OpenAI org-level monthly budget"  too coarse; one bad run can still exhaust the cap, and it doesn't protect
  per-customer / per-tenant flows.

  The thread closed stale without a code-level mitigation. This proposal fills that gap.

  ### Describe the feature

  A small, **opt-in** `BudgetGuard` extension that enforces per-run limits using the existing `RunHooks` lifecycle  no
  changes to `Runner`, no behavior change for users who don't opt in.

  ```python
  from agents import Runner
  from agents.extensions import BudgetGuard, Budget

  guard = BudgetGuard(Budget(
      max_total_tokens=200_000,
      max_requests=50,
      max_cost_usd=1.50,
  ))

  result = await Runner.run(agent, input, hooks=guard)
  # Raises BudgetExceeded(dimension="max_total_tokens", limit=200_000, actual=...)
  # between turns if any limit trips.

  Optional graceful degradation (downgrade model at a threshold instead of failing):

  guard = BudgetGuard(
      Budget(max_total_tokens=200_000),
      downgrade_to="gpt-4o-mini",
      downgrade_at=0.8,
      pricing={"gpt-4o": (2.5e-6, 10e-6)},
  )

  Why this fits the SDK cleanly

  Every piece needed already exists  BudgetGuard just composes them:

  - RunContextWrapper.usage (src/agents/run_context.py:54) is already accumulated across the run.
  - RunHooksBase.on_llm_end (src/agents/lifecycle.py:28) fires after every LLM call with the latest ModelResponse —
  natural enforcement point.
  - MaxTurnsExceeded (src/agents/exceptions.py) is the established pattern for a typed limit-exceeded exception.
  - Agent.model is mutable, so model-swap from a hook is supported (same mechanism handoffs already use).

  So the surface area is:
  - New module: src/agents/extensions/budget_guard.py (~150 LOC)
  - New re-exports: Budget, BudgetExceeded, BudgetGuard, CompositeRunHooks
  - No changes to Runner, Agent, Usage, or any existing public API.

  How it compares to max_turns

  max_turns is a coarse proxy — a single turn with a long context can still cost $5+. BudgetGuard enforces the dimension
  users actually care about (tokens / cost / request count) and can downgrade before failing, which max_turns cannot.

  Composability

  Users who already pass hooks=MyHooks() aren't blocked  a small CompositeRunHooks(*hooks) helper lets multiple hook
  implementations stack. This also solves a generic gap (Runner.run(hooks=...) currently takes a single object).

  Out of scope (explicit non-goals)

  - Mid-stream abort (would require deeper Runner changes  happy to discuss as a follow-up).
  - Cross-run cumulative budgets (per-run only; users compose via shared context).
  - Provider-specific pricing tables (user supplies pricing dict  keeps the SDK provider-agnostic).

  Ask

  Before I open a PR (and to avoid wasting maintainer review time on an unsolicited feature): would the maintainers accept
   a PR along these lines?

  Happy to adjust the API shape, scope, or location (extensions/ vs. core) based on your preference. I have a working
  prototype + tests ready, but won't push until there's a "yes, send it" or "we'd prefer X instead."

  cc @seratch (you replied on #2848) —does this match what you'd want to see, or would you rather this stay user-side?

  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: per-run BudgetGuard for token / request / cost limits (follow-up to #2848) #3353

Please read this first

Background

Describe the feature

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Proposal: per-run BudgetGuard for token / request / cost limits (follow-up to #2848) #3353

Description

Please read this first

Background

Describe the feature

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions