Skip to content

[codex] Harden overflow auto-compaction recovery#22141

Merged
aibrahim-oai merged 13 commits into
mainfrom
dev/auto-compact-context-overflow
May 11, 2026
Merged

[codex] Harden overflow auto-compaction recovery#22141
aibrahim-oai merged 13 commits into
mainfrom
dev/auto-compact-context-overflow

Conversation

@aibrahim-oai
Copy link
Copy Markdown
Collaborator

@aibrahim-oai aibrahim-oai commented May 11, 2026

Why

Dogfooder feedback exposed two correctness gaps in normal-loop overflow recovery:

  1. a sampling request that hit ContextWindowExceeded could keep re-entering auto-compaction indefinitely if the compacted retry still did not fit, and
  2. compacted user-history rebuilds flattened user messages down to text, so an overflowing multimodal turn such as [image, "what is this?"] could be retried without the image after compaction.

That means recovery could either fail to terminate cleanly or continue with a materially weakened version of the user request.

What changed

  • Move normal-loop ContextWindowExceeded handling into the sampling retry loop, so successful rescue compaction consumes the provider retry budget instead of creating an unbounded outer-turn loop.
  • Keep compacted user history structured: collect_user_messages now returns user UserInput content rather than flattened strings, and build_compacted_history reconstructs full user messages from that structured representation.
  • Preserve image inputs through compacted-history rebuilds while retaining the existing text-budget truncation behavior for user text.
  • Reuse the shared user_message_text projection where text-only analysis is still needed, such as connector mention extraction.
  • Preserve existing compaction-task failure handling and client-session reset behavior while bounding repeated overflow retries.

Coverage

  • context_window_error_sets_total_tokens_to_model_window_before_auto_compact_recovery
  • normal_loop_context_window_error_auto_compacts_and_resumes_turn
  • normal_loop_context_window_error_auto_remote_compacts_and_resumes_turn
  • normal_loop_context_window_error_auto_remote_v2_compacts_and_preserves_user_turn
  • normal_loop_context_window_error_stops_after_sample_retry_budget
  • normal_loop_context_window_error_stops_after_remote_compaction_failure
  • collect_user_messages_preserves_user_content

Verification

  • Not run locally; relying on PR CI for this update.

Recover regular turns that overflow the model context by reusing the existing mid-turn auto-compaction flow and retrying the turn.

Co-authored-by: Codex <noreply@openai.com>
@aibrahim-oai aibrahim-oai marked this pull request as ready for review May 11, 2026 12:09
@aibrahim-oai aibrahim-oai requested a review from a team as a code owner May 11, 2026 12:09
Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a175208c81

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread codex-rs/core/src/session/turn.rs Outdated
client_session.reset_websocket_session();
}
can_drain_pending_input = false;
continue;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard context-window recovery retries

This branch retries unconditionally after every successful auto-compaction. If the compacted history still exceeds the model window (for example an oversized current prompt/user input or a remote compact result that is still too large), the next sampling request raises ContextWindowExceeded again and the loop re-enters compaction forever instead of surfacing the error. Add a per-turn retry/fit guard.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Copy link
Copy Markdown
Collaborator

@jif-oai jif-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm after those 2

Comment thread codex-rs/core/src/session/turn.rs Outdated
client_session.reset_websocket_session();
}
can_drain_pending_input = false;
continue;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Comment thread codex-rs/core/src/session/turn.rs Outdated
// Aborted turn is reported via a different event.
break;
}
Err(CodexErr::ContextWindowExceeded) => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can mutate the current turn before the model has ever seen it. On the local path, compaction rebuilds user history from flattened text only and can truncate it, so an overflowing turn with a long prompt or an image can be retried as a different prompt. We probably need to preserve/re-append the exact in-flight input here, not recover it from compacted history

@aibrahim-oai aibrahim-oai changed the title [codex] Recover sampling overflow with auto-compaction [codex] Harden overflow auto-compaction recovery May 11, 2026
aibrahim-oai and others added 4 commits May 11, 2026 13:07
Count overflow rescue compactions against the existing sampling retry budget and preserve the exact current user turn across compaction-backed retries.

Co-authored-by: Codex <noreply@openai.com>
@aibrahim-oai
Copy link
Copy Markdown
Collaborator Author

@codex review this

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

let mut new_history = build_compacted_history(Vec::new(), &user_messages, &summary_text);

P2 Badge Do not rely on compacted text for the current turn

The recovery path installs build_compacted_history(...) directly after overflow. That helper applies the 20k-token compacted-history cap to all user messages, including the just-submitted overflowing turn, so a large current prompt can be truncated before the retry. The commit says the exact current turn is preserved, but this code never exempts it from compaction.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if reset_client_session {
client_session.reset_websocket_session();
}
continue;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore the overflowing user turn before retrying

After this continue, the retry is built from compacted history, but the original input was already consumed and is never reinstalled. Local/remote compaction rebuilds history from text summaries, so an overflowing turn with images or text beyond the compaction limit can be retried with content dropped or truncated before the model ever sees it.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c3a51fb7c2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread codex-rs/core/src/session/turn.rs Outdated
.await
.for_prompt(&turn_context.model_info.input_modalities)
.for_prompt(&turn_context.model_info.input_modalities);
restore_current_turn_for_sampling_retry(prompt_input, current_turn_input.as_ref())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve the restored turn for follow-up sampling

The restored user input is only spliced into this one retry prompt; sess history remains the compacted/truncated version. If the recovered request returns a tool call or otherwise needs a follow-up, the outer loop rebuilds the next prompt from session history, so images or oversized text from the overflowing turn disappear before the final model call.

Useful? React with 👍 / 👎.

Comment thread codex-rs/core/src/session/turn.rs Outdated
Comment on lines +1169 to +1179
fn latest_real_user_turn(input: &[ResponseItem]) -> Option<ResponseItem> {
input
.iter()
.rev()
.find_map(|item| match parse_turn_item(item) {
Some(TurnItem::UserMessage(user)) if !is_summary_message(&user.message()) => {
Some(item.clone())
}
_ => None,
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore every unseen user message before retrying

latest_real_user_turn preserves only one user ResponseItem. A sampling request can contain multiple not-yet-seen user messages, e.g. queued pending inputs accepted before the next model call; after overflow compaction, earlier pending messages are rebuilt from text summaries while only the last one gets its images/full text restored.

Useful? React with 👍 / 👎.

Comment thread codex-rs/core/src/session/turn.rs Outdated
Comment on lines +1169 to +1179
fn latest_real_user_turn(input: &[ResponseItem]) -> Option<ResponseItem> {
input
.iter()
.rev()
.find_map(|item| match parse_turn_item(item) {
Some(TurnItem::UserMessage(user)) if !is_summary_message(&user.message()) => {
Some(item.clone())
}
_ => None,
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve pending tool outputs across overflow recovery

When the overflow happens on a follow-up request after a tool call, the new content the model has not seen is the function/tool output. Compaction drops tool artifacts, but this preservation path only saves the last user message, so the immediate retry can ask the model to continue without the tool result it requested.

Useful? React with 👍 / 👎.

@aibrahim-oai aibrahim-oai enabled auto-merge (squash) May 11, 2026 15:04
@aibrahim-oai
Copy link
Copy Markdown
Collaborator Author

@codex review this

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

let prompt = build_prompt(
prompt_input,
router.as_ref(),
turn_context.as_ref(),

P2 Badge Rebuild tool routing after recovery compaction

router is built once from the overflowing input before the loop, but after run_auto_compact the retry prompt is rebuilt from compacted history while build_prompt still uses the stale router. If compaction drops or truncates connector mentions, the recovered request can expose tools that are no longer present in the prompt (or miss changes from the compacted prompt).


input_messages: sampling_request_input_messages,

P2 Badge Pass hooks the recovered request input

sampling_request_input_messages is captured before run_sampling_request; this change lets that call compact history and retry internally. On a successful recovered response, AfterAgent hooks still receive the pre-compaction messages here, so hooks can validate or log a prompt that is not the one the model actually answered.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread codex-rs/core/src/compact.rs Outdated
Comment on lines +546 to +548
UserInput::Text { .. } if !text_inserted => {
truncated.push(UserInput::Text {
text: truncated_text.clone(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve text segment order when truncating user messages

When a user message has multiple text segments around images, user_message_text concatenates all text and this branch inserts the combined truncation into the first text item while dropping later text. A turn like [text A, image, text B] can be retried as [A+B prefix, image], moving captions/questions before their image and defeating structured-history preservation.

Useful? React with 👍 / 👎.

Comment on lines +492 to 495
let message_text = user_message_text(message);
let tokens = approx_token_count(&message_text);
if tokens <= remaining {
selected_messages.push(message.clone());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count preserved images against the compaction budget

The selection budget only counts text tokens, but the new structured path keeps every non-text item. Image-only messages have tokens == 0, so the reverse scan can preserve an unbounded number of historical images without reducing remaining, and mixed messages keep all images even when text is truncated. Image-heavy threads can still overflow after “successful” compaction.

Useful? React with 👍 / 👎.

@aibrahim-oai
Copy link
Copy Markdown
Collaborator Author

@codex review this

@chatgpt-codex-connector
Copy link
Copy Markdown
Contributor

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 19 to +1101
@@ -310,7 +311,7 @@ pub(crate) async fn run_turn(
Vec::new()
} else {
let initial_input_for_turn: ResponseInputItem = ResponseInputItem::from(input.clone());
let response_item: ResponseItem = initial_input_for_turn.clone().into();
let response_item: ResponseItem = initial_input_for_turn.into();
let user_prompt_submit_outcome = run_user_prompt_submit_hooks(
&sess,
&turn_context,
@@ -918,7 +919,11 @@ pub(super) fn filter_connectors_for_input(
return Vec::new();
}

let mentions = collect_tool_mentions_from_messages(&user_messages);
let user_message_texts = user_messages
.iter()
.map(|message| user_message_text(message))
.collect::<Vec<_>>();
let mentions = collect_tool_mentions_from_messages(&user_message_texts);
let mention_names_lower = mentions
.plain_names
.iter()
@@ -1041,6 +1046,7 @@ async fn run_sampling_request(
Arc::clone(&turn_diff_tracker),
)
.await;
let max_retries = turn_context.provider.info().stream_max_retries();
let mut retries = 0;
let mut initial_input = Some(input);
loop {
@@ -1074,7 +1080,27 @@ async fn run_sampling_request(
}
Err(CodexErr::ContextWindowExceeded) => {
sess.set_total_tokens_full(&turn_context).await;
return Err(CodexErr::ContextWindowExceeded);
if retries >= max_retries {
return Err(CodexErr::ContextWindowExceeded);
}
retries += 1;
let reset_client_session = match run_auto_compact(
&sess,
&turn_context,
client_session,
InitialContextInjection::BeforeLastUserMessage,
CompactionReason::ContextLimit,
CompactionPhase::MidTurn,
)
.await
{
Ok(reset_client_session) => reset_client_session,
Err(_) => return Err(CodexErr::TurnAborted),
};
if reset_client_session {
client_session.reset_websocket_session();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to do this.

@aibrahim-oai aibrahim-oai merged commit 15e79f3 into main May 11, 2026
37 of 38 checks passed
@aibrahim-oai aibrahim-oai deleted the dev/auto-compact-context-overflow branch May 11, 2026 16:16
@github-actions github-actions Bot locked and limited conversation to collaborators May 11, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants