Track thread start in ChatGPT telemetry#24143
Conversation
|
This change is part of the following stack: Change managed by git-spice. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aef2ab4daa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| active_permission_profile, | ||
| reasoning_effort: config_snapshot.reasoning_effort, | ||
| }; | ||
| analytics_events_client.track_thread_start_timing(thread.id.clone(), thread_start_timer); |
There was a problem hiding this comment.
Preserve thread initialization when the queue is full
When the analytics channel is backed up, this new timing fact can consume the last available queue slot before send_response records the actual ClientResponse that emits codex_thread_initialized. AnalyticsEventsClient::record_fact uses try_send and ignores Full errors (codex-rs/analytics/src/client.rs:346), so in that scenario the timing fact is retained but the thread-initialized event is dropped; before this change the response event would have had that slot. Consider attaching the duration to the response-derived fact or otherwise ensuring the primary lifecycle event is not displaced by the auxiliary timing fact.
Useful? React with 👍 / 👎.
b676830 to
a8989b3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee5e068b13
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| parent_thread_id: input | ||
| .parent_thread_id | ||
| .or_else(|| subagent_parent_thread_id(&input.subagent_source)), | ||
| thread_start_timing: ThreadStartTimingEventParams::default(), |
There was a problem hiding this comment.
Preserve subagent thread-start timings
When a thread-spawn subagent is created through AgentControl, spawn_thread_with_source now records a ThreadStartTimingFact before emit_subagent_session_started enqueues the subagent codex_thread_initialized event. The reducer stores that timing by thread id, but this direct subagent event path always serializes the new timing fields as null, so those newly created subagent threads lose the latency metric even though the timing fact was emitted. Consider plumbing the stored timing into this event path as well.
Useful? React with 👍 / 👎.
| let mut thread_start_timing = matches!( | ||
| &initial_history, | ||
| InitialHistory::New | InitialHistory::Cleared | ||
| ) | ||
| .then(ThreadStartTiming::start); |
There was a problem hiding this comment.
Include the full app-server start path in timing
When app-server thread start spends time before start_thread_with_options (for example config loading) or after core spawn but before send_response (config snapshot, listener attach, watch updates), the new thread_start_duration_ms omits it because this timer starts inside core and is emitted right after finalize_thread_spawn; the request processor's existing thread_start_total spans the request from thread_start_started_at through the ready notification. In those contexts ChatGPT will see a thread-start duration that can be much lower than the time until the thread is ready, so start/finish this telemetry at the request-processor boundary if the field is meant to represent ready latency.
Useful? React with 👍 / 👎.
970b926 to
a2b956b
Compare
|
Codex Review: Something went wrong. Try again later by commenting “@codex review”. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a2b956b5e5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let thread_state = self.threads.entry(thread_id.clone()).or_default(); | ||
| thread_state.connection_id = Some(connection_id); | ||
| thread_state.metadata = Some(thread_metadata.clone()); | ||
| let thread_start_timing = thread_state.thread_start_timing; |
There was a problem hiding this comment.
Clear consumed thread-start timings
When a client later resumes a thread that is still running, resume_running_thread sends a ThreadResumeResponse through thread_lifecycle.rs without going through ThreadManager::spawn_thread_with_source, so no new ThreadStartTimingFact is recorded for that response. Because this code copies the cached thread_start_timing without clearing it, the resume analytics event reuses the original start/fork timing and thread_start_type for the same thread id, corrupting subsequent codex_thread_initialized telemetry. Consider consuming the timing with take() (or otherwise keying it to a single lifecycle response) so stale timings are not attached to later responses.
Useful? React with 👍 / 👎.
8c657ed to
bdda087
Compare
03394d3 to
3ed1c71
Compare
bdda087 to
7424188
Compare
|
Codex Review: Something went wrong. Try again later by commenting “@codex review”. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
7424188 to
8713ba0
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8713ba0a53
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| use super::analytics::thread_initialized_event; | ||
| use super::analytics::wait_for_analytics_payload; |
There was a problem hiding this comment.
Add the missing analytics test helpers
In this checkout, super::analytics does not define either thread_initialized_event or wait_for_analytics_payload (repo-wide search only finds these new imports/usages), so this test module fails to compile with unresolved imports before the new telemetry coverage can run. Add the helpers to the analytics test module or keep using the existing wait_for_analytics_event helper.
Useful? React with 👍 / 👎.
| let thread_start_timing = matches!(initialization_mode, ThreadInitializationMode::New) | ||
| .then_some(thread_state.thread_start_timing) | ||
| .flatten(); |
There was a problem hiding this comment.
Preserve fork thread-start timings
When a thread is forked, fork_thread_with_source goes through spawn_thread_with_source, which now records a ThreadStartTimingFact for the new forked thread, and the analytics reducer emits the response with ThreadInitializationMode::Forked. This New-only filter therefore drops the recorded timing for every thread/fork initialized event, leaving the new timing fields null even though the measurement was emitted; include forked initialization here if fork startup latency is meant to be tracked.
Useful? React with 👍 / 👎.
| wait_for_analytics_event(&server, ANALYTICS_TEST_TIMEOUT, "codex_thread_initialized") | ||
| .await?; | ||
| assert_basic_thread_initialized_event(&event, &thread.id, "mock-model", "new", "user"); | ||
| let payload = wait_for_analytics_payload(&server, DEFAULT_READ_TIMEOUT).await?; |
There was a problem hiding this comment.
Keep the platform-specific analytics timeout
This test previously waited with ANALYTICS_TEST_TIMEOUT, which is 25s on macOS and Windows, but the new payload wait hard-codes DEFAULT_READ_TIMEOUT to 10s. On those slower CI platforms the analytics request can legitimately take longer than 10s, so after the missing helper is added this coverage can become flaky even though the production event is eventually emitted; use the existing analytics timeout here too.
Useful? React with 👍 / 👎.
0ebd75f to
94d2891
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94d2891f8b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| use super::analytics::thread_initialized_event; | ||
| use super::analytics::wait_for_analytics_payload; |
There was a problem hiding this comment.
Restore the analytics timeout import
This import change drops ANALYTICS_TEST_TIMEOUT, but this same test file still calls timeout(ANALYTICS_TEST_TIMEOUT, ...) in multiple tests, starting near the first initialization path. As a result, the thread_start integration test module fails to compile before the new telemetry assertions can run; re-import the constant or replace the remaining uses with an in-scope timeout value.
Useful? React with 👍 / 👎.
be12003 to
ee60c49
Compare
81d4d94 to
33ac4ff
Compare
75c1117 to
52cfa94
Compare
cf508f7 to
5ffcaa1
Compare
5ffcaa1 to
d562eb8
Compare
d562eb8 to
cceadf2
Compare
77d29c0 to
89ad405
Compare
89ad405 to
5c78d1c
Compare
Why
Thread startup latency should live on thread initialization rather than on turn telemetry. This lets ChatGPT measure how long a new thread takes to become ready without polluting resumed or forked thread events, and keeps turn telemetry focused on turn-specific timing.
What changed
codex-rs/analyticsand plumb it intocodex_thread_initializedThreadRequestProcessorwhen a new thread start completesthread_start_duration_msonly for newly created threads; resumed and forked threads continue to emitnullVerification
cargo check -p codex-analytics -p codex-app-server -p codex-corejust fix -p codex-analytics -p codex-core