Skip to content

Feature Request: Support defer_loading and tool_search for dynamic tool discovery (parity with Anthropic & OpenAI) #2185

@leeclouddragon

Description

@leeclouddragon

Summary

Both Anthropic (Claude Sonnet 4.0+) and OpenAI (GPT-5.4+) now support deferred tool loading and tool search at the API level. The Gemini API currently lacks this capability.

Same feature request as googleapis/js-genai#1416 — filing here for visibility across both SDKs since this is an API-level feature.

Problem

When building AI agents with many tools (30+):

  1. Context bloat: All tool schemas consume context tokens upfront. A typical multi-server MCP setup can consume 50k+ tokens in tool definitions alone.
  2. Tool selection accuracy: Model accuracy degrades beyond 30-50 tools.
  3. Client-side workarounds break caching: Pre-filtering tools before each API call changes the tools parameter, invalidating prefix cache.

What competitors have

Anthropic

  • defer_loading: true on tools — schema sent but not rendered into model context
  • tool_search_tool_regex/bm25 — model searches deferred tools, matched schemas expanded within same API call
  • Zero extra round-trips, cache preserved
  • Up to 10,000 tools

OpenAI

  • Same defer_loading: true mechanism
  • Namespaces to group tools — model only sees group name + description
  • Discovered schemas injected at end of context window to preserve cache
  • Hosted and client-executed search modes

Proposed for Gemini API

  1. defer_loading: true on functionDeclarations
  2. Built-in tool_search tool for on-demand discovery
  3. Automatic schema expansion within generateContent call

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions