Memory leak: httpx Response bodies retained indefinitely in async generate_content with IMAGE responses

This is a **client library issue** — httpx Response objects are retained in memory after `generate_content` returns, causing unbounded memory growth in long-running async services.

#### Environment details

  - Programming language: Python
  - OS: Linux (x86_64)
  - Language runtime version: Python 3.12
  - Package version: google-genai 1.12.1, httpx 0.28.1

#### Steps to reproduce

  1. Create an async `Client` with Vertex AI credentials
  2. Call `client.aio.models.generate_content()` repeatedly with `response_modalities=["IMAGE"]` (responses are ~3.5 MB base64-encoded PNGs)
  3. Monitor RSS memory or use `tracemalloc` to snapshot before/after

#### Reproduction script

```python
import asyncio
import tracemalloc
from google.genai import Client, types

async def main():
    tracemalloc.start(25)

    client = Client(
        project="my-project",
        location="global",
        vertexai=True,
        credentials=my_credentials,
    )

    test_image = make_1024x1024_png()  # ~3MB PNG

    # Warmup
    await client.aio.models.generate_content(
        model="gemini-2.0-flash-preview-image-generation",
        contents=types.Content(role="user", parts=[
            types.Part.from_text(text="Edit this image"),
            types.Part.from_bytes(data=test_image, mime_type="image/png"),
        ]),
        config=types.GenerateContentConfig(response_modalities=["IMAGE"]),
    )

    snapshot_baseline = tracemalloc.take_snapshot()

    for i in range(10):
        await client.aio.models.generate_content(
            model="gemini-2.0-flash-preview-image-generation",
            contents=types.Content(role="user", parts=[
                types.Part.from_text(text="Edit this image"),
                types.Part.from_bytes(data=test_image, mime_type="image/png"),
            ]),
            config=types.GenerateContentConfig(response_modalities=["IMAGE"]),
        )
        await asyncio.sleep(5)

    snapshot_final = tracemalloc.take_snapshot()
    top_diffs = snapshot_final.compare_to(snapshot_baseline, "lineno")
    for stat in top_diffs[:5]:
        print(stat)

asyncio.run(main())
```

#### Observed behavior

After 10 requests, tracemalloc shows **+28 MB** of growth concentrated in two lines inside `httpx/_models.py`:

```
#1: httpx/_models.py:979: +14 MB (count: +4)
    self._content = b"".join([part async for part in self.aiter_bytes()])

#2: httpx/_models.py:649: +14 MB (count: +4)
    self._text = "".join([decoder.decode(self.content), decoder.flush()])
```

The full traceback from tracemalloc shows the retention path:

```
google/genai/_api_client.py:1484  → self._async_httpx_client.request(...)
httpx/_client.py:1637             → await response.aread()
httpx/_models.py:979              → self._content = b"".join([...])  ← retained
```

RSS growth: **+3.4 MB/request** on average, growing linearly with number of requests. Memory is never reclaimed.

#### Expected behavior

Response bodies should be eligible for garbage collection after `generate_content` returns the parsed `GenerateContentResponse`. The raw httpx `Response._content` and `._text` buffers should not be retained indefinitely.

#### Root cause hypothesis

The `httpx.Response` objects (or references to them) are held alive after the SDK finishes parsing. Likely causes:

1. The internal tenacity retry machinery in `_api_client.py` retains references to the last `Future` result via `retry_state`
2. The `_api_client` storing the response object beyond the scope of parsing
3. Possible circular references between the response and the client preventing GC

With image generation responses being ~3.5 MB each (base64-encoded PNG), even a small number of retained responses causes significant memory growth in long-running services.

#### Impact

For services making continuous `generate_content` calls with image outputs, this causes unbounded memory growth of ~3-4 MB per request, eventually leading to OOM in production. Explicit `gc.collect()` and `del response` after each call do not reclaim the memory.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak: httpx Response bodies retained indefinitely in async generate_content with IMAGE responses #2369

Environment details

Steps to reproduce

Reproduction script

Observed behavior

Expected behavior

Root cause hypothesis

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory leak: httpx Response bodies retained indefinitely in async generate_content with IMAGE responses #2369

Description

Environment details

Steps to reproduce

Reproduction script

Observed behavior

Expected behavior

Root cause hypothesis

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions