[7/8] Add Python SDK app-server integration harness#22014
Merged
Conversation
This was referenced May 10, 2026
f3e16de to
07c195f
Compare
5c7b278 to
81dccad
Compare
Collaborator
Author
|
This change is part of the following stack: Change managed by git-spice. |
aibrahim-oai
added a commit
that referenced
this pull request
May 11, 2026
## Why The Python SDK depends on the app-server runtime package for the bundled `codex` binary and schema source of truth. That relationship should be explicit in package metadata instead of inferred from matching version numbers, so installers, lockfiles, and reviewers can see exactly which runtime the SDK expects. ## What - Declare `openai-codex-cli-bin==0.131.0a4` as a Python SDK dependency. - Update runtime setup helpers to resolve the runtime version from the declared dependency pin. - Refresh the SDK lockfile for the pinned runtime wheel. - Update package/runtime tests and docs that describe where the runtime version comes from. ## Stack 1. This PR `[1/8]` Pin Python SDK runtime dependency 2. #21893 `[2/8]` Generate Python SDK types from pinned runtime 3. #21895 `[3/8]` Run Python SDK tests in CI 4. #21896 `[4/8]` Define Python SDK public API surface 5. #21905 `[5/8]` Rename Python SDK package to `openai-codex` 6. #21910 `[6/8]` Add high-level Python SDK approval mode 7. #22014 `[7/8]` Add Python SDK app-server integration harness 8. #22021 `[8/8]` Add Python SDK Ruff formatting ## Verification - Added coverage for the SDK runtime dependency pin and runtime distribution naming. --------- Co-authored-by: Codex <noreply@openai.com>
07c195f to
0521108
Compare
81dccad to
175fc0a
Compare
0521108 to
0e727fd
Compare
175fc0a to
9a169a2
Compare
aibrahim-oai
added a commit
that referenced
this pull request
May 11, 2026
## Why Once the SDK declares its runtime package, generated Python artifacts should come from that pinned runtime rather than whatever app-server schema happens to be in the current checkout. That keeps the generated API and model surface aligned with the runtime users install. ## What - Teach `scripts/update_sdk_artifacts.py generate-types` to invoke the pinned runtime package for schema generation. - Regenerate `v2_all.py`, `notification_registry.py`, and generated public wrapper methods from that schema. - Add freshness coverage so regenerating from the pinned runtime must leave checked-in artifacts unchanged. ## Stack 1. #21891 `[1/8]` Pin Python SDK runtime dependency 2. This PR `[2/8]` Generate Python SDK types from pinned runtime 3. #21895 `[3/8]` Run Python SDK tests in CI 4. #21896 `[4/8]` Define Python SDK public API surface 5. #21905 `[5/8]` Rename Python SDK package to `openai-codex` 6. #21910 `[6/8]` Add high-level Python SDK approval mode 7. #22014 `[7/8]` Add Python SDK app-server integration harness 8. #22021 `[8/8]` Add Python SDK Ruff formatting ## Verification - Added `test_generated_files_are_up_to_date` for pinned-runtime generation drift. - Added generator-structure tests for schema annotation and notification metadata generation. --------- Co-authored-by: Codex <noreply@openai.com>
aibrahim-oai
added a commit
that referenced
this pull request
May 11, 2026
## Why The Python SDK stack now depends on packaging metadata, pinned runtime wheels, generated artifacts, async behavior, and stream interleaving. Those checks need to run in CI so future changes cannot bypass the SDK test suite. ## What - Add a dedicated `python-sdk` job to `.github/workflows/sdk.yml`. - Run the job in `python:3.12-alpine` so dependency resolution exercises the pinned musl runtime wheel. - Keep the Python SDK test job parallel to the existing SDK job instead of serializing the full workflow. ## Stack 1. #21891 `[1/8]` Pin Python SDK runtime dependency 2. #21893 `[2/8]` Generate Python SDK types from pinned runtime 3. This PR `[3/8]` Run Python SDK tests in CI 4. #21896 `[4/8]` Define Python SDK public API surface 5. #21905 `[5/8]` Rename Python SDK package to `openai-codex` 6. #21910 `[6/8]` Add high-level Python SDK approval mode 7. #22014 `[7/8]` Add Python SDK app-server integration harness 8. #22021 `[8/8]` Add Python SDK Ruff formatting ## Verification - The added workflow job installs the SDK with `uv sync --extra dev --frozen` and runs the Python SDK pytest suite. --------- Co-authored-by: Codex <noreply@openai.com>
0e727fd to
b218a24
Compare
9a169a2 to
46d29b5
Compare
b218a24 to
9a3d11b
Compare
46d29b5 to
5ee8852
Compare
aibrahim-oai
added a commit
that referenced
this pull request
May 11, 2026
## Why The SDK package root should be the ergonomic public client API, not a dump of every generated app-server schema type. Generated models still need a supported import path, but callers should be able to tell which names are high-level SDK entrypoints and which names are protocol value models. ## What - Define a curated root `__all__` for clients, handles, input helpers, retry helpers, config, and public errors. - Add a `types` module as the supported home for generated app-server response, event, enum, and helper models. - Update docs and examples to import protocol/value models from the type module. - Add tests that lock root exports, type-module exports, star-import behavior, and example import hygiene. ## Stack 1. #21891 `[1/8]` Pin Python SDK runtime dependency 2. #21893 `[2/8]` Generate Python SDK types from pinned runtime 3. #21895 `[3/8]` Run Python SDK tests in CI 4. This PR `[4/8]` Define Python SDK public API surface 5. #21905 `[5/8]` Rename Python SDK package to `openai-codex` 6. #21910 `[6/8]` Add high-level Python SDK approval mode 7. #22014 `[7/8]` Add Python SDK app-server integration harness 8. #22021 `[8/8]` Add Python SDK Ruff formatting ## Verification - Added public API signature tests for root exports, `types` exports, and example imports. --------- Co-authored-by: Codex <noreply@openai.com>
9a3d11b to
b1bd6ed
Compare
5ee8852 to
810898d
Compare
aibrahim-oai
added a commit
that referenced
this pull request
May 11, 2026
## Why The SDK should publish under the reserved public distribution name `openai-codex`, and its import module should match that name in the Python style. Since package names can contain hyphens but import modules cannot, the public import path becomes `openai_codex`. Keeping the rename separate from the public API surface change makes the naming change easy to review and avoids mixing it with API curation. ## What - Rename the SDK distribution from `openai-codex-app-server-sdk` to `openai-codex`. - Rename the import package from `codex_app_server` to `openai_codex`. - Keep the runtime wheel as the separate `openai-codex-cli-bin` dependency. - Update docs, examples, notebooks, artifact scripts, lockfile metadata, and tests for the new distribution/module names. ## Stack 1. #21891 `[1/8]` Pin Python SDK runtime dependency 2. #21893 `[2/8]` Generate Python SDK types from pinned runtime 3. #21895 `[3/8]` Run Python SDK tests in CI 4. #21896 `[4/8]` Define Python SDK public API surface 5. This PR `[5/8]` Rename Python SDK package to `openai-codex` 6. #21910 `[6/8]` Add high-level Python SDK approval mode 7. #22014 `[7/8]` Add Python SDK app-server integration harness 8. #22021 `[8/8]` Add Python SDK Ruff formatting ## Verification - Updated package metadata and public API tests to assert the distribution and import names. Co-authored-by: Codex <noreply@openai.com>
b1bd6ed to
24939ab
Compare
810898d to
27366d0
Compare
aibrahim-oai
added a commit
that referenced
this pull request
May 11, 2026
## Why The high-level SDK should expose the approval behavior it actually supports instead of leaking generated app-server routing fields. New work should have two clear choices: default auto review, or explicitly deny escalated permission requests. Existing threads and subsequent turns should preserve their current approval behavior unless the caller passes an override. ## What - Add the public `ApprovalMode` enum with `auto_review` and `deny_all`. - Default new thread creation to `ApprovalMode.auto_review`. - Preserve existing approval settings by default for resume, fork, run, and turn helpers. - Remove raw `approval_policy` / `approvals_reviewer` kwargs from high-level SDK wrappers. - Update generated wrapper output, docs, examples, notebooks, and tests for the high-level approval mode API. ## Stack 1. #21891 `[1/8]` Pin Python SDK runtime dependency 2. #21893 `[2/8]` Generate Python SDK types from pinned runtime 3. #21895 `[3/8]` Run Python SDK tests in CI 4. #21896 `[4/8]` Define Python SDK public API surface 5. #21905 `[5/8]` Rename Python SDK package to `openai-codex` 6. This PR `[6/8]` Add high-level Python SDK approval mode 7. #22014 `[7/8]` Add Python SDK app-server integration harness 8. #22021 `[8/8]` Add Python SDK Ruff formatting ## Verification - Added approval-mode mapping/default tests for new threads, existing threads, forks, resumes, and subsequent turns. --------- Co-authored-by: Codex <noreply@openai.com>
Build deterministic Python SDK integration coverage around the pinned app-server runtime and a local mock Responses server. Port behavioral coverage off direct SDK monkeypatches where the real app-server boundary is more useful. Co-authored-by: Codex <noreply@openai.com>
Make the new Python SDK integration tests assert stable app-server behavior: filter run result items to agent messages, accept either ordering for concurrent mock Responses requests, and avoid lifecycle operations that require a persisted rollout before one exists. Co-authored-by: Codex <noreply@openai.com>
Assert the stable parts of the pinned app-server behavior: the user prompt appears as the final user input, approval overrides update the stored policy, and thread lifecycle coverage does not depend on thread/list indexing. Co-authored-by: Codex <noreply@openai.com>
Move result extraction, stream_text, approval inheritance, model list, and compact coverage onto the pinned app-server integration harness so the remaining unit tests stay focused on generated models and transport internals. Co-authored-by: Codex <noreply@openai.com>
Seed approval inheritance coverage with a real persisted turn and align compaction coverage with the pinned runtime's model request path. Co-authored-by: Codex <noreply@openai.com>
Add new harness coverage for multimodal inputs, active turn controls, and archive lifecycle behavior through the pinned app-server. Co-authored-by: Codex <noreply@openai.com>
Assert the latest user multimodal payload after history replay and seed a rollout before exercising archive lifecycle helpers. Co-authored-by: Codex <noreply@openai.com>
Assert the prompt text is present alongside app-server image wrapper text while keeping the request image checks on the real Responses payload. Co-authored-by: Codex <noreply@openai.com>
Break the large integration test module into focused run, input, stream, turn-control, approval-mode, and lifecycle files with shared helpers for the mock Responses boundary. Co-authored-by: Codex <noreply@openai.com>
Seed the fork test with a real turn so the pinned app-server has a persisted rollout before thread/fork runs. Co-authored-by: Codex <noreply@openai.com>
Rename the split Python SDK app-server integration files and helper module to concise group names. Co-authored-by: Codex <noreply@openai.com>
Add focused integration coverage for thread listing, persisted history reads, async lifecycle wrappers, skill input injection, and run override/usage behavior through the pinned app-server test harness. Co-authored-by: Codex <noreply@openai.com>
Assert skill inputs as persisted structured history and keep run override coverage to the model request plus token usage, matching the public SDK behavior exercised by the harness. Co-authored-by: Codex <noreply@openai.com>
Remove the skill-input assertion from the app-server integration suite because the current runtime path does not expose that structured input at the model boundary or in read history. Co-authored-by: Codex <noreply@openai.com>
Create a repo skill inside the app-server harness workspace and assert that SkillInput resolves to an injected skill block at the model request boundary. Co-authored-by: Codex <noreply@openai.com>
27366d0 to
70cc226
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The SDK had behavioral tests that replaced SDK client internals. Those tests could catch wrapper mistakes, but they did not prove the pinned app-server runtime, generated notification models, request routing, and sync/async public clients worked together.
This PR adds deterministic integration coverage that starts the pinned
codex app-serverprocess and mocks only the upstream Responses HTTP boundary.What
AppServerHarnessandMockResponsesServerhelpers for isolatedCODEX_HOME, mock-provider config, queued SSE responses, and captured/v1/responsesrequests.Thread.run,TurnHandle.stream, interleaved streams, approval-mode persistence, lifecycle helpers, final-answer phase handling, image inputs, loaded skill input injection, steering, interruption, listing, history reads, run overrides, and token usage mapping.Stack
[1/8]Pin Python SDK runtime dependency[2/8]Generate Python SDK types from pinned runtime[3/8]Run Python SDK tests in CI[4/8]Define Python SDK public API surface[5/8]Rename Python SDK package toopenai-codex[6/8]Add high-level Python SDK approval mode[7/8]Add Python SDK app-server integration harness[8/8]Add Python SDK Ruff formattingVerification
sdk/python/tests/test_app_server_*.pyandtest_real_app_server_integration.py.