fix(parsing): prevent Pydantic schema validator leak in parse_response by xodn348 · Pull Request #3235 · openai/openai-python

xodn348 · 2026-05-13T07:30:55Z

Summary

Replace ParsedResponseOutputText[TextFormatT], ParsedResponseOutputMessage[TextFormatT], and ParsedResponse[TextFormatT] with their unparameterised forms in openai/lib/_parsing/_responses.py.
Add cast(...) wrappers to preserve the static type signatures for callers.
Add two regression tests: one for correctness, one that asserts SchemaValidator objects do not grow after the initial warm-up call.

Issue

parse_response constructs ParsedResponse[TextFormatT] (and the two inner types) using a free TypeVar as the type argument. Pydantic v2's model_rebuild cannot resolve a free TypeVar, so it returns False and never populates MockCoreSchema._built_memo. This causes a fresh SchemaValidator and SchemaSerializer (heavy Rust objects) to be allocated on every single responses.parse() call. In a long-running server the process RSS grows linearly with request count.

Root cause:

# Before — free TypeVar causes pydantic model_rebuild to return False every time
construct_type_unchecked(type_=ParsedResponse[TextFormatT], value={...})

Fix — use the unparameterised class so Pydantic builds and caches the schema once:

# After
cast("ParsedResponse[TextFormatT]", construct_type_unchecked(type_=ParsedResponse, value={...}))

All three parameterised Generic fields are either guarded by if TYPE_CHECKING: (runtime-inert) or set explicitly from the dict passed to construct_type_unchecked, so the unparameterised form is runtime-equivalent.

Local verification

=== LOCAL_TEST_PASSED ===
$ cd /tmp/openai-python && python3 -m pytest tests/lib/responses/ tests/test_models.py tests/test_utils/ -v --override-ini="addopts=" -q

============================= test session starts ==============================
platform linux -- Python 3.11.15, pytest-9.0.3, pluggy-1.6.0
rootdir: /tmp/openai-python
configfile: pyproject.toml
plugins: respx-0.23.1, inline-snapshot-0.33.0, xdist-3.8.0, asyncio-1.3.0, anyio-4.46.0
asyncio: mode=Mode.AUTO

collected 186 items

tests/lib/responses/test_responses.py .......                            [  3%]
tests/test_models.py ................................................    [ 29%]
tests/test_utils/test_datetime_parse.py ...................................
...............................                                          [ 63%]
tests/test_utils/test_json.py .........                                  [ 68%]
tests/test_utils/test_logging.py .....                                   [ 70%]
tests/test_utils/test_path.py ..............................................
.....                                                                    [ 96%]
tests/test_utils/test_proxy.py ..                                        [ 97%]
tests/test_utils/test_typing.py .....                                    [100%]

186 passed in 0.86s

Schema-leak verification (warm-up + 50 calls, delta must be 0):

gc.collect(); before = count_schema_validators()
for _ in range(50): parse_response(text_format=CalendarEvent, ...)
gc.collect(); after = count_schema_validators()
# delta = 0  ✓

Risk

Correctness: no runtime change — the type arguments only affect type-checker annotations. The parsed field value is already embedded in the dict passed to construct_type_unchecked.
Type safety: explicit cast preserves all existing type signatures; no downstream callers need updating.
Pydantic v1 / v2: both use construct_type_unchecked the same way; the caching fix only matters for Pydantic v2's model_rebuild path, and the unparameterised form is valid for both.

…Pydantic schema leaks When parse_response constructs ParsedResponseOutputText[TextFormatT], ParsedResponseOutputMessage[TextFormatT], and ParsedResponse[TextFormatT] with an unresolved free TypeVar, Pydantic v2 calls model_rebuild on every invocation and never caches the result because the TypeVar cannot be resolved. Each call therefore allocates fresh SchemaValidator and SchemaSerializer objects (heavy Rust structs) that accumulate without bound in long-running servers. Use the unparameterised base classes instead. All three guard their Generic-annotated fields behind `if TYPE_CHECKING:` so the type argument has no runtime effect on ParsedResponseOutputMessage and ParsedResponse; ParsedResponseOutputText stores the actual parsed value via the dict passed to construct_type_unchecked, so the schema type of the `parsed` field (Optional[Any] vs Optional[TextFormatT]) does not matter at runtime. Cast the results to preserve static type information. Adds two regression tests: - correctness: parsed attribute contains the expected Pydantic model - no-leak: SchemaValidator count does not grow after the first call Fixes openai#3084

xodn348 requested a review from a team as a code owner May 13, 2026 07:30

nomiveritas approved these changes May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(parsing): prevent Pydantic schema validator leak in parse_response#3235

fix(parsing): prevent Pydantic schema validator leak in parse_response#3235
xodn348 wants to merge 1 commit into
openai:mainfrom
xodn348:fix/responses-parse-pydantic-schema-rebuild

xodn348 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xodn348 commented May 13, 2026

Summary

Issue

Local verification

Risk

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants