[codex] Harden overflow auto-compaction recovery#22141
Conversation
Recover regular turns that overflow the model context by reusing the existing mid-turn auto-compaction flow and retrying the turn. Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a175208c81
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| client_session.reset_websocket_session(); | ||
| } | ||
| can_drain_pending_input = false; | ||
| continue; |
There was a problem hiding this comment.
Guard context-window recovery retries
This branch retries unconditionally after every successful auto-compaction. If the compacted history still exceeds the model window (for example an oversized current prompt/user input or a remote compact result that is still too large), the next sampling request raises ContextWindowExceeded again and the loop re-enters compaction forever instead of surfacing the error. Add a per-turn retry/fit guard.
Useful? React with 👍 / 👎.
| client_session.reset_websocket_session(); | ||
| } | ||
| can_drain_pending_input = false; | ||
| continue; |
| // Aborted turn is reported via a different event. | ||
| break; | ||
| } | ||
| Err(CodexErr::ContextWindowExceeded) => { |
There was a problem hiding this comment.
I think this can mutate the current turn before the model has ever seen it. On the local path, compaction rebuilds user history from flattened text only and can truncate it, so an overflowing turn with a long prompt or an image can be retried as a different prompt. We probably need to preserve/re-append the exact in-flight input here, not recover it from compacted history
Count overflow rescue compactions against the existing sampling retry budget and preserve the exact current user turn across compaction-backed retries. Co-authored-by: Codex <noreply@openai.com>
This reverts commit 1482455.
|
@codex review this |
There was a problem hiding this comment.
💡 Codex Review
codex/codex-rs/core/src/compact.rs
Line 265 in b20b795
The recovery path installs build_compacted_history(...) directly after overflow. That helper applies the 20k-token compacted-history cap to all user messages, including the just-submitted overflowing turn, so a large current prompt can be truncated before the retry. The commit says the exact current turn is preserved, but this code never exempts it from compaction.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if reset_client_session { | ||
| client_session.reset_websocket_session(); | ||
| } | ||
| continue; |
There was a problem hiding this comment.
Restore the overflowing user turn before retrying
After this continue, the retry is built from compacted history, but the original input was already consumed and is never reinstalled. Local/remote compaction rebuilds history from text summaries, so an overflowing turn with images or text beyond the compaction limit can be retried with content dropped or truncated before the model ever sees it.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c3a51fb7c2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| .await | ||
| .for_prompt(&turn_context.model_info.input_modalities) | ||
| .for_prompt(&turn_context.model_info.input_modalities); | ||
| restore_current_turn_for_sampling_retry(prompt_input, current_turn_input.as_ref()) |
There was a problem hiding this comment.
Preserve the restored turn for follow-up sampling
The restored user input is only spliced into this one retry prompt; sess history remains the compacted/truncated version. If the recovered request returns a tool call or otherwise needs a follow-up, the outer loop rebuilds the next prompt from session history, so images or oversized text from the overflowing turn disappear before the final model call.
Useful? React with 👍 / 👎.
| fn latest_real_user_turn(input: &[ResponseItem]) -> Option<ResponseItem> { | ||
| input | ||
| .iter() | ||
| .rev() | ||
| .find_map(|item| match parse_turn_item(item) { | ||
| Some(TurnItem::UserMessage(user)) if !is_summary_message(&user.message()) => { | ||
| Some(item.clone()) | ||
| } | ||
| _ => None, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Restore every unseen user message before retrying
latest_real_user_turn preserves only one user ResponseItem. A sampling request can contain multiple not-yet-seen user messages, e.g. queued pending inputs accepted before the next model call; after overflow compaction, earlier pending messages are rebuilt from text summaries while only the last one gets its images/full text restored.
Useful? React with 👍 / 👎.
| fn latest_real_user_turn(input: &[ResponseItem]) -> Option<ResponseItem> { | ||
| input | ||
| .iter() | ||
| .rev() | ||
| .find_map(|item| match parse_turn_item(item) { | ||
| Some(TurnItem::UserMessage(user)) if !is_summary_message(&user.message()) => { | ||
| Some(item.clone()) | ||
| } | ||
| _ => None, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Preserve pending tool outputs across overflow recovery
When the overflow happens on a follow-up request after a tool call, the new content the model has not seen is the function/tool output. Compaction drops tool artifacts, but this preservation path only saves the last user message, so the immediate retry can ask the model to continue without the tool result it requested.
Useful? React with 👍 / 👎.
|
@codex review this |
There was a problem hiding this comment.
💡 Codex Review
codex/codex-rs/core/src/session/turn.rs
Lines 1060 to 1063 in 2aa791e
router is built once from the overflowing input before the loop, but after run_auto_compact the retry prompt is rebuilt from compacted history while build_prompt still uses the stale router. If compaction drops or truncates connector mentions, the recovered request can expose tools that are no longer present in the prompt (or miss changes from the compacted prompt).
codex/codex-rs/core/src/session/turn.rs
Line 585 in 2aa791e
sampling_request_input_messages is captured before run_sampling_request; this change lets that call compact history and retry internally. On a successful recovered response, AfterAgent hooks still receive the pre-compaction messages here, so hooks can validate or log a prompt that is not the one the model actually answered.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| UserInput::Text { .. } if !text_inserted => { | ||
| truncated.push(UserInput::Text { | ||
| text: truncated_text.clone(), |
There was a problem hiding this comment.
Preserve text segment order when truncating user messages
When a user message has multiple text segments around images, user_message_text concatenates all text and this branch inserts the combined truncation into the first text item while dropping later text. A turn like [text A, image, text B] can be retried as [A+B prefix, image], moving captions/questions before their image and defeating structured-history preservation.
Useful? React with 👍 / 👎.
| let message_text = user_message_text(message); | ||
| let tokens = approx_token_count(&message_text); | ||
| if tokens <= remaining { | ||
| selected_messages.push(message.clone()); |
There was a problem hiding this comment.
Count preserved images against the compaction budget
The selection budget only counts text tokens, but the new structured path keeps every non-text item. Image-only messages have tokens == 0, so the reverse scan can preserve an unbounded number of historical images without reducing remaining, and mixed messages keep all images even when text is truncated. Image-heavy threads can still overflow after “successful” compaction.
Useful? React with 👍 / 👎.
|
@codex review this |
|
Codex Review: Didn't find any major issues. Bravo. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
| @@ -310,7 +311,7 @@ pub(crate) async fn run_turn( | |||
| Vec::new() | |||
| } else { | |||
| let initial_input_for_turn: ResponseInputItem = ResponseInputItem::from(input.clone()); | |||
| let response_item: ResponseItem = initial_input_for_turn.clone().into(); | |||
| let response_item: ResponseItem = initial_input_for_turn.into(); | |||
| let user_prompt_submit_outcome = run_user_prompt_submit_hooks( | |||
| &sess, | |||
| &turn_context, | |||
| @@ -918,7 +919,11 @@ pub(super) fn filter_connectors_for_input( | |||
| return Vec::new(); | |||
| } | |||
|
|
|||
| let mentions = collect_tool_mentions_from_messages(&user_messages); | |||
| let user_message_texts = user_messages | |||
| .iter() | |||
| .map(|message| user_message_text(message)) | |||
| .collect::<Vec<_>>(); | |||
| let mentions = collect_tool_mentions_from_messages(&user_message_texts); | |||
| let mention_names_lower = mentions | |||
| .plain_names | |||
| .iter() | |||
| @@ -1041,6 +1046,7 @@ async fn run_sampling_request( | |||
| Arc::clone(&turn_diff_tracker), | |||
| ) | |||
| .await; | |||
| let max_retries = turn_context.provider.info().stream_max_retries(); | |||
| let mut retries = 0; | |||
| let mut initial_input = Some(input); | |||
| loop { | |||
| @@ -1074,7 +1080,27 @@ async fn run_sampling_request( | |||
| } | |||
| Err(CodexErr::ContextWindowExceeded) => { | |||
| sess.set_total_tokens_full(&turn_context).await; | |||
| return Err(CodexErr::ContextWindowExceeded); | |||
| if retries >= max_retries { | |||
| return Err(CodexErr::ContextWindowExceeded); | |||
| } | |||
| retries += 1; | |||
| let reset_client_session = match run_auto_compact( | |||
| &sess, | |||
| &turn_context, | |||
| client_session, | |||
| InitialContextInjection::BeforeLastUserMessage, | |||
| CompactionReason::ContextLimit, | |||
| CompactionPhase::MidTurn, | |||
| ) | |||
| .await | |||
| { | |||
| Ok(reset_client_session) => reset_client_session, | |||
| Err(_) => return Err(CodexErr::TurnAborted), | |||
| }; | |||
| if reset_client_session { | |||
| client_session.reset_websocket_session(); | |||
There was a problem hiding this comment.
you don't need to do this.
Why
Dogfooder feedback exposed two correctness gaps in normal-loop overflow recovery:
ContextWindowExceededcould keep re-entering auto-compaction indefinitely if the compacted retry still did not fit, and[image, "what is this?"]could be retried without the image after compaction.That means recovery could either fail to terminate cleanly or continue with a materially weakened version of the user request.
What changed
ContextWindowExceededhandling into the sampling retry loop, so successful rescue compaction consumes the provider retry budget instead of creating an unbounded outer-turn loop.collect_user_messagesnow returns userUserInputcontent rather than flattened strings, andbuild_compacted_historyreconstructs full user messages from that structured representation.user_message_textprojection where text-only analysis is still needed, such as connector mention extraction.Coverage
context_window_error_sets_total_tokens_to_model_window_before_auto_compact_recoverynormal_loop_context_window_error_auto_compacts_and_resumes_turnnormal_loop_context_window_error_auto_remote_compacts_and_resumes_turnnormal_loop_context_window_error_auto_remote_v2_compacts_and_preserves_user_turnnormal_loop_context_window_error_stops_after_sample_retry_budgetnormal_loop_context_window_error_stops_after_remote_compaction_failurecollect_user_messages_preserves_user_contentVerification