Skip to content

[7/8] Add Python SDK app-server integration harness#22014

Merged
aibrahim-oai merged 15 commits into
mainfrom
codex/python-sdk-mock-integration-tests
May 11, 2026
Merged

[7/8] Add Python SDK app-server integration harness#22014
aibrahim-oai merged 15 commits into
mainfrom
codex/python-sdk-mock-integration-tests

Conversation

@aibrahim-oai
Copy link
Copy Markdown
Collaborator

@aibrahim-oai aibrahim-oai commented May 10, 2026

Why

The SDK had behavioral tests that replaced SDK client internals. Those tests could catch wrapper mistakes, but they did not prove the pinned app-server runtime, generated notification models, request routing, and sync/async public clients worked together.

This PR adds deterministic integration coverage that starts the pinned codex app-server process and mocks only the upstream Responses HTTP boundary.

What

  • Add AppServerHarness and MockResponsesServer helpers for isolated CODEX_HOME, mock-provider config, queued SSE responses, and captured /v1/responses requests.
  • Add shared helpers for SSE construction, stream assertions, approval-policy inspection, and image fixtures.
  • Split integration coverage into focused modules for run behavior, inputs, streaming, turn controls, approvals, and thread lifecycle.
  • Cover sync and async Thread.run, TurnHandle.stream, interleaved streams, approval-mode persistence, lifecycle helpers, final-answer phase handling, image inputs, loaded skill input injection, steering, interruption, listing, history reads, run overrides, and token usage mapping.
  • Replace public-wrapper tests that duplicated integration-test behavior with lower-level client tests only where direct client behavior is the thing under test.

Stack

  1. [1/8] Pin Python SDK runtime dependency #21891 [1/8] Pin Python SDK runtime dependency
  2. [2/8] Generate Python SDK types from pinned runtime #21893 [2/8] Generate Python SDK types from pinned runtime
  3. [3/8] Run Python SDK tests in CI #21895 [3/8] Run Python SDK tests in CI
  4. [4/8] Define Python SDK public API surface #21896 [4/8] Define Python SDK public API surface
  5. [5/8] Rename Python SDK package to openai-codex #21905 [5/8] Rename Python SDK package to openai-codex
  6. [6/8] Add high-level Python SDK approval mode #21910 [6/8] Add high-level Python SDK approval mode
  7. This PR [7/8] Add Python SDK app-server integration harness
  8. [8/8] Add Python SDK Ruff formatting #22021 [8/8] Add Python SDK Ruff formatting

Verification

  • Added pinned app-server integration tests under sdk/python/tests/test_app_server_*.py and test_real_app_server_integration.py.

@aibrahim-oai
Copy link
Copy Markdown
Collaborator Author

aibrahim-oai commented May 11, 2026

This change is part of the following stack:

Change managed by git-spice.

aibrahim-oai added a commit that referenced this pull request May 11, 2026
## Why

The Python SDK depends on the app-server runtime package for the bundled
`codex` binary and schema source of truth. That relationship should be
explicit in package metadata instead of inferred from matching version
numbers, so installers, lockfiles, and reviewers can see exactly which
runtime the SDK expects.

## What

- Declare `openai-codex-cli-bin==0.131.0a4` as a Python SDK dependency.
- Update runtime setup helpers to resolve the runtime version from the
declared dependency pin.
- Refresh the SDK lockfile for the pinned runtime wheel.
- Update package/runtime tests and docs that describe where the runtime
version comes from.

## Stack

1. This PR `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- Added coverage for the SDK runtime dependency pin and runtime
distribution naming.

---------

Co-authored-by: Codex <noreply@openai.com>
@aibrahim-oai aibrahim-oai force-pushed the codex/python-sdk-approval-never branch from 07c195f to 0521108 Compare May 11, 2026 21:49
@aibrahim-oai aibrahim-oai force-pushed the codex/python-sdk-mock-integration-tests branch from 81dccad to 175fc0a Compare May 11, 2026 21:49
@aibrahim-oai aibrahim-oai force-pushed the codex/python-sdk-approval-never branch from 0521108 to 0e727fd Compare May 11, 2026 21:51
@aibrahim-oai aibrahim-oai force-pushed the codex/python-sdk-mock-integration-tests branch from 175fc0a to 9a169a2 Compare May 11, 2026 21:51
aibrahim-oai added a commit that referenced this pull request May 11, 2026
## Why

Once the SDK declares its runtime package, generated Python artifacts
should come from that pinned runtime rather than whatever app-server
schema happens to be in the current checkout. That keeps the generated
API and model surface aligned with the runtime users install.

## What

- Teach `scripts/update_sdk_artifacts.py generate-types` to invoke the
pinned runtime package for schema generation.
- Regenerate `v2_all.py`, `notification_registry.py`, and generated
public wrapper methods from that schema.
- Add freshness coverage so regenerating from the pinned runtime must
leave checked-in artifacts unchanged.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. This PR `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- Added `test_generated_files_are_up_to_date` for pinned-runtime
generation drift.
- Added generator-structure tests for schema annotation and notification
metadata generation.

---------

Co-authored-by: Codex <noreply@openai.com>
aibrahim-oai added a commit that referenced this pull request May 11, 2026
## Why

The Python SDK stack now depends on packaging metadata, pinned runtime
wheels, generated artifacts, async behavior, and stream interleaving.
Those checks need to run in CI so future changes cannot bypass the SDK
test suite.

## What

- Add a dedicated `python-sdk` job to `.github/workflows/sdk.yml`.
- Run the job in `python:3.12-alpine` so dependency resolution exercises
the pinned musl runtime wheel.
- Keep the Python SDK test job parallel to the existing SDK job instead
of serializing the full workflow.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. This PR `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- The added workflow job installs the SDK with `uv sync --extra dev
--frozen` and runs the Python SDK pytest suite.

---------

Co-authored-by: Codex <noreply@openai.com>
@aibrahim-oai aibrahim-oai force-pushed the codex/python-sdk-approval-never branch from 0e727fd to b218a24 Compare May 11, 2026 21:55
@aibrahim-oai aibrahim-oai force-pushed the codex/python-sdk-mock-integration-tests branch from 9a169a2 to 46d29b5 Compare May 11, 2026 21:55
@aibrahim-oai aibrahim-oai force-pushed the codex/python-sdk-approval-never branch from b218a24 to 9a3d11b Compare May 11, 2026 21:56
@aibrahim-oai aibrahim-oai force-pushed the codex/python-sdk-mock-integration-tests branch from 46d29b5 to 5ee8852 Compare May 11, 2026 21:56
aibrahim-oai added a commit that referenced this pull request May 11, 2026
## Why

The SDK package root should be the ergonomic public client API, not a
dump of every generated app-server schema type. Generated models still
need a supported import path, but callers should be able to tell which
names are high-level SDK entrypoints and which names are protocol value
models.

## What

- Define a curated root `__all__` for clients, handles, input helpers,
retry helpers, config, and public errors.
- Add a `types` module as the supported home for generated app-server
response, event, enum, and helper models.
- Update docs and examples to import protocol/value models from the type
module.
- Add tests that lock root exports, type-module exports, star-import
behavior, and example import hygiene.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. This PR `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- Added public API signature tests for root exports, `types` exports,
and example imports.

---------

Co-authored-by: Codex <noreply@openai.com>
@aibrahim-oai aibrahim-oai force-pushed the codex/python-sdk-approval-never branch from 9a3d11b to b1bd6ed Compare May 11, 2026 21:58
@aibrahim-oai aibrahim-oai force-pushed the codex/python-sdk-mock-integration-tests branch from 5ee8852 to 810898d Compare May 11, 2026 21:58
aibrahim-oai added a commit that referenced this pull request May 11, 2026
## Why

The SDK should publish under the reserved public distribution name
`openai-codex`, and its import module should match that name in the
Python style. Since package names can contain hyphens but import modules
cannot, the public import path becomes `openai_codex`.

Keeping the rename separate from the public API surface change makes the
naming change easy to review and avoids mixing it with API curation.

## What

- Rename the SDK distribution from `openai-codex-app-server-sdk` to
`openai-codex`.
- Rename the import package from `codex_app_server` to `openai_codex`.
- Keep the runtime wheel as the separate `openai-codex-cli-bin`
dependency.
- Update docs, examples, notebooks, artifact scripts, lockfile metadata,
and tests for the new distribution/module names.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. This PR `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- Updated package metadata and public API tests to assert the
distribution and import names.

Co-authored-by: Codex <noreply@openai.com>
@aibrahim-oai aibrahim-oai force-pushed the codex/python-sdk-approval-never branch from b1bd6ed to 24939ab Compare May 11, 2026 22:00
@aibrahim-oai aibrahim-oai force-pushed the codex/python-sdk-mock-integration-tests branch from 810898d to 27366d0 Compare May 11, 2026 22:00
aibrahim-oai added a commit that referenced this pull request May 11, 2026
## Why

The high-level SDK should expose the approval behavior it actually
supports instead of leaking generated app-server routing fields. New
work should have two clear choices: default auto review, or explicitly
deny escalated permission requests. Existing threads and subsequent
turns should preserve their current approval behavior unless the caller
passes an override.

## What

- Add the public `ApprovalMode` enum with `auto_review` and `deny_all`.
- Default new thread creation to `ApprovalMode.auto_review`.
- Preserve existing approval settings by default for resume, fork, run,
and turn helpers.
- Remove raw `approval_policy` / `approvals_reviewer` kwargs from
high-level SDK wrappers.
- Update generated wrapper output, docs, examples, notebooks, and tests
for the high-level approval mode API.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. This PR `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- Added approval-mode mapping/default tests for new threads, existing
threads, forks, resumes, and subsequent turns.

---------

Co-authored-by: Codex <noreply@openai.com>
Base automatically changed from codex/python-sdk-approval-never to main May 11, 2026 22:02
aibrahim-oai and others added 15 commits May 12, 2026 01:06
Build deterministic Python SDK integration coverage around the pinned app-server runtime and a local mock Responses server. Port behavioral coverage off direct SDK monkeypatches where the real app-server boundary is more useful.

Co-authored-by: Codex <noreply@openai.com>
Make the new Python SDK integration tests assert stable app-server behavior: filter run result items to agent messages, accept either ordering for concurrent mock Responses requests, and avoid lifecycle operations that require a persisted rollout before one exists.

Co-authored-by: Codex <noreply@openai.com>
Assert the stable parts of the pinned app-server behavior: the user prompt appears as the final user input, approval overrides update the stored policy, and thread lifecycle coverage does not depend on thread/list indexing.

Co-authored-by: Codex <noreply@openai.com>
Move result extraction, stream_text, approval inheritance, model list, and compact coverage onto the pinned app-server integration harness so the remaining unit tests stay focused on generated models and transport internals.

Co-authored-by: Codex <noreply@openai.com>
Seed approval inheritance coverage with a real persisted turn and align compaction coverage with the pinned runtime's model request path.

Co-authored-by: Codex <noreply@openai.com>
Add new harness coverage for multimodal inputs, active turn controls, and archive lifecycle behavior through the pinned app-server.

Co-authored-by: Codex <noreply@openai.com>
Assert the latest user multimodal payload after history replay and seed a rollout before exercising archive lifecycle helpers.

Co-authored-by: Codex <noreply@openai.com>
Assert the prompt text is present alongside app-server image wrapper text while keeping the request image checks on the real Responses payload.

Co-authored-by: Codex <noreply@openai.com>
Break the large integration test module into focused run, input, stream, turn-control, approval-mode, and lifecycle files with shared helpers for the mock Responses boundary.

Co-authored-by: Codex <noreply@openai.com>
Seed the fork test with a real turn so the pinned app-server has a persisted rollout before thread/fork runs.

Co-authored-by: Codex <noreply@openai.com>
Rename the split Python SDK app-server integration files and helper module to concise group names.

Co-authored-by: Codex <noreply@openai.com>
Add focused integration coverage for thread listing, persisted history reads, async lifecycle wrappers, skill input injection, and run override/usage behavior through the pinned app-server test harness.

Co-authored-by: Codex <noreply@openai.com>
Assert skill inputs as persisted structured history and keep run override coverage to the model request plus token usage, matching the public SDK behavior exercised by the harness.

Co-authored-by: Codex <noreply@openai.com>
Remove the skill-input assertion from the app-server integration suite because the current runtime path does not expose that structured input at the model boundary or in read history.

Co-authored-by: Codex <noreply@openai.com>
Create a repo skill inside the app-server harness workspace and assert that SkillInput resolves to an injected skill block at the model request boundary.

Co-authored-by: Codex <noreply@openai.com>
@aibrahim-oai aibrahim-oai force-pushed the codex/python-sdk-mock-integration-tests branch from 27366d0 to 70cc226 Compare May 11, 2026 22:06
@aibrahim-oai aibrahim-oai merged commit 3e10e09 into main May 11, 2026
14 of 16 checks passed
@aibrahim-oai aibrahim-oai deleted the codex/python-sdk-mock-integration-tests branch May 11, 2026 22:06
@github-actions github-actions Bot locked and limited conversation to collaborators May 11, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant