Improve goal continuation based on feedback#22045
Conversation
This reverts commit 64e3ab2.
|
The change to user message will also resolve: #21291 |
jif-oai
left a comment
There was a problem hiding this comment.
I like the direction better but the compaction thing must be solved
I approve to unlock
| } | ||
|
|
||
| #[test] | ||
| fn goal_context_does_not_parse_as_visible_turn_item() { |
There was a problem hiding this comment.
I don't think this proves the stale-steer claim. GoalContext is now hidden user context, but collect_user_messages() still drops it during compaction, so an older real steer can remain the last preserved user message after compaction
There was a problem hiding this comment.
Ah, I didn't realize that compaction ignored hidden user context. Yeah, this will require more work. I'd prefer to do that as a follow-up PR. This PR doesn't solve the compaction problem but it doesn't make it any worse.
| FragmentRegistrationProxy::new(); | ||
| static SUBAGENT_NOTIFICATION_REGISTRATION: FragmentRegistrationProxy<SubagentNotification> = | ||
| FragmentRegistrationProxy::new(); | ||
| static GOAL_CONTEXT_REGISTRATION: FragmentRegistrationProxy<GoalContext> = |
There was a problem hiding this comment.
Same, this will still get discarded by compaction
Summary
This PR updates the goal continuation prompt to address feedback from early adopters. There are two primary changes:
The user-message transition is important for two reasons. First, it eliminates an issue where older steering messages could be responded to again after a new turn. Second, it works better with compaction because user messages are treated differently from developer messages during compaction.
The prompt refinements make persistence explicit, ground work in current evidence, encourage
update_planfor multi-step progress visibility, and require stronger completion audits before callingupdate_goal. It also removes the elapsed-time reporting in the prompt; I saw evidence that this was causing the model to shortcut work as it became nervous about time.These changes were tested with evals. Chriss4123 has also been running independent evals in #19910, and many of the improvements in this PR were suggested by him.
Verification
codex-corecoverage for hidden goal user context, continuation and budget-limit request shape, prompt rendering, and objective delimiter escaping.