Skip to content

Memory leak: httpx Response bodies retained indefinitely in async generate_content with IMAGE responses #2369

@ctkhanhly

Description

@ctkhanhly

This is a client library issue — httpx Response objects are retained in memory after generate_content returns, causing unbounded memory growth in long-running async services.

Environment details

  • Programming language: Python
  • OS: Linux (x86_64)
  • Language runtime version: Python 3.12
  • Package version: google-genai 1.12.1, httpx 0.28.1

Steps to reproduce

  1. Create an async Client with Vertex AI credentials
  2. Call client.aio.models.generate_content() repeatedly with response_modalities=["IMAGE"] (responses are ~3.5 MB base64-encoded PNGs)
  3. Monitor RSS memory or use tracemalloc to snapshot before/after

Reproduction script

import asyncio
import tracemalloc
from google.genai import Client, types

async def main():
    tracemalloc.start(25)

    client = Client(
        project="my-project",
        location="global",
        vertexai=True,
        credentials=my_credentials,
    )

    test_image = make_1024x1024_png()  # ~3MB PNG

    # Warmup
    await client.aio.models.generate_content(
        model="gemini-2.0-flash-preview-image-generation",
        contents=types.Content(role="user", parts=[
            types.Part.from_text(text="Edit this image"),
            types.Part.from_bytes(data=test_image, mime_type="image/png"),
        ]),
        config=types.GenerateContentConfig(response_modalities=["IMAGE"]),
    )

    snapshot_baseline = tracemalloc.take_snapshot()

    for i in range(10):
        await client.aio.models.generate_content(
            model="gemini-2.0-flash-preview-image-generation",
            contents=types.Content(role="user", parts=[
                types.Part.from_text(text="Edit this image"),
                types.Part.from_bytes(data=test_image, mime_type="image/png"),
            ]),
            config=types.GenerateContentConfig(response_modalities=["IMAGE"]),
        )
        await asyncio.sleep(5)

    snapshot_final = tracemalloc.take_snapshot()
    top_diffs = snapshot_final.compare_to(snapshot_baseline, "lineno")
    for stat in top_diffs[:5]:
        print(stat)

asyncio.run(main())

Observed behavior

After 10 requests, tracemalloc shows +28 MB of growth concentrated in two lines inside httpx/_models.py:

#1: httpx/_models.py:979: +14 MB (count: +4)
    self._content = b"".join([part async for part in self.aiter_bytes()])

#2: httpx/_models.py:649: +14 MB (count: +4)
    self._text = "".join([decoder.decode(self.content), decoder.flush()])

The full traceback from tracemalloc shows the retention path:

google/genai/_api_client.py:1484  → self._async_httpx_client.request(...)
httpx/_client.py:1637             → await response.aread()
httpx/_models.py:979              → self._content = b"".join([...])  ← retained

RSS growth: +3.4 MB/request on average, growing linearly with number of requests. Memory is never reclaimed.

Expected behavior

Response bodies should be eligible for garbage collection after generate_content returns the parsed GenerateContentResponse. The raw httpx Response._content and ._text buffers should not be retained indefinitely.

Root cause hypothesis

The httpx.Response objects (or references to them) are held alive after the SDK finishes parsing. Likely causes:

  1. The internal tenacity retry machinery in _api_client.py retains references to the last Future result via retry_state
  2. The _api_client storing the response object beyond the scope of parsing
  3. Possible circular references between the response and the client preventing GC

With image generation responses being ~3.5 MB each (base64-encoded PNG), even a small number of retained responses causes significant memory growth in long-running services.

Impact

For services making continuous generate_content calls with image outputs, this causes unbounded memory growth of ~3-4 MB per request, eventually leading to OOM in production. Explicit gc.collect() and del response after each call do not reclaim the memory.

Metadata

Metadata

Labels

priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions