Skip to content

🤖 feat: system 1 reasoning + background bash filtering#1843

Merged
ThomasK33 merged 25 commits intomainfrom
system-1-dwqk
Jan 25, 2026
Merged

🤖 feat: system 1 reasoning + background bash filtering#1843
ThomasK33 merged 25 commits intomainfrom
system-1-dwqk

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

@ThomasK33 ThomasK33 commented Jan 22, 2026

Summary

Make System 1 bash-output filtering more predictable and consistent by:

  • Adding an explicit System 1 Reasoning (thinking level) setting (defaults to off).
  • Applying System 1 filtering not only to foreground bash, but also to background output paths (bash_output + task_await).

Background

System 1 is used to LLM-filter large bash outputs so the main model doesn’t get flooded with noisy logs. Two gaps remained:

  1. System 1 reasoning/thinking was implicit (provider defaults), making cost/latency unpredictable.
  2. Background bash output retrieval could bypass System 1 filtering and bloat context.

Implementation

  • Persisted preferredSystem1ThinkingLevel and plumbed system1ThinkingLevel through SendMessageOptions → backend stream startup.
  • Applied provider-specific thinking/reasoning options to the System 1 generateText(...) call (via buildProviderOptions + enforceThinkingPolicy).
  • Refactored bash filtering into a reusable helper and wrapped bash_output + task_await so background task chunks and completed reports are filtered consistently.
  • Extracted bash task markdown report formatting/parsing into a shared util to safely rewrite task_await results.

Validation

  • make static-check
  • Unit tests for bash task markdown report roundtripping (bashTaskReport.test.ts)
  • Manual testing with scripts/system1-noisy-output.ts in both single-shot and burst modes

Risks

  • Over-filtering could hide a useful line. Mitigations:
    • Fail-open behavior: if System 1 fails or returns invalid ranges, return the original output.
    • When filtering triggers, we preserve access to the full output via tempfile pointers.

📋 Implementation Plan

System 1 reasoning effort + background bash filtering

Context / Why

System 1 is currently used to LLM-filter large bash outputs so the main model doesn’t get flooded with noisy logs.

Two gaps today:

  1. Reasoning effort is implicit: the System 1 generateText(...) call does not pass provider thinking/reasoning options (OpenAI reasoningEffort, Anthropic thinking budgets, etc.), so it runs at provider defaults (not explicitly “high”).
  2. Background bash outputs bypass System 1: when bash runs with run_in_background: true, System 1 returns early, so later task_await / bash_output results can come back unfiltered and bloat context.

We want System 1 to be:

  • Predictable (deterministic reasoning configuration)
  • Fast by default (System 1 should stay “System 1”)
  • Consistent across foreground + background bash output paths
  • Aligned with how we expose compaction settings (compaction model + compaction reasoning)

Evidence (what exists today)

  • System 1 bash filtering wrapper + generateText(...) call (no providerOptions passed): src/node/services/aiService.ts (~1990–2260).

  • Provider-specific reasoning mapping lives in one place:

    • ThinkingLevel + mappings (OpenAI reasoning_effort): src/common/types/thinking.ts.
    • Provider options builder (buildProviderOptions, sets OpenAI reasoningEffort when thinking != off): src/common/utils/ai/providerOptions.ts.
  • Compaction already has:

    • “Compaction Model” setting in Settings → Models: src/browser/components/Settings/sections/ModelsSection.tsx.
    • Compaction reasoning via Mode Defaults (“Compact” mode): src/browser/components/Settings/sections/ModesSection.tsx + src/browser/utils/messages/compactionOptions.ts.
  • Background bash output retrieval:

    • bash_output tool: src/node/services/tools/bash_output.tsbackgroundProcessManager.getOutput(...).
    • task_await tool formats completed bash output into reportMarkdown: src/node/services/tools/task_await.ts.
  • Dev harness: scripts/system1-noisy-output.ts generates deterministic noisy stdout designed to trip System 1 thresholds (and stay under bash hard limits); extend it to support burst/stream output for background testing.

External docs (for semantics of OpenAI reasoning effort):


Approach A (recommended): Add System 1 reasoning setting (advanced) + filter background outputs

Net LoC estimate (product code only): ~+280 LoC

A1) Product behavior

  • Add a System 1 Reasoning setting (same ThinkingLevel enum: off/low/medium/high/xhigh).
  • Default: off (keeps System 1 fast + avoids hidden cost/latency).
  • When enabled, System 1 should pass provider options built from ThinkingLevel so OpenAI uses explicit reasoningEffort and Anthropic/Gemini use their explicit thinking configs.

Rationale: Compaction is a user-facing quality feature so it makes sense to inherit mode reasoning. System 1 is an internal helper that runs inside tool calls; we should not silently spend “high” reasoning budget unless the user explicitly opts in.

A2) UI + storage plumbing

  1. Add a new persisted key (global, like preferredSystem1Model):
    • PREFERRED_SYSTEM_1_THINKING_LEVEL_KEY = "preferredSystem1ThinkingLevel" in src/common/constants/storage.ts.
  2. Settings UI (experiment gated, next to “System 1” model selector):
    • Update src/browser/components/Settings/sections/ModelsSection.tsx to show a dropdown for System 1 reasoning.
    • Store values as a ThinkingLevel string; default to off. Validate with coerceThinkingLevel.
  3. SendMessageOptions plumbing:
    • Extend SendMessageOptionsSchema in src/common/orpc/schemas/stream.ts with system1ThinkingLevel?: ThinkingLevel.
    • Update src/browser/hooks/useSendMessageOptions.ts to read the persisted setting and include it in options.

A3) Backend: apply System 1 reasoning explicitly

  1. Extend call chain to include system1ThinkingLevel:
    • src/node/services/agentSession.tsaiService.streamMessage(...)
    • src/node/services/aiService.ts signature + internal wiring
  2. Compute an effective System 1 thinking level:
    • const effectiveSystem1ThinkingLevel = enforceThinkingPolicy(effectiveSystem1ModelString, (options.system1ThinkingLevel ?? "off"))
    • (If we support an “inherit” UI option later, interpret it as options.thinkingLevel.)
  3. Ensure System 1 calls stay stateless w.r.t. OpenAI persistence:
    • Build provider options without passing history messages (so previousResponseId won’t be set).
    • const system1ProviderOptions = buildProviderOptions(system1.modelString, effectiveSystem1ThinkingLevel, /*messages*/ undefined, /*lost*/ undefined, muxProviderOptions, workspaceId)
  4. Pass provider options into the System 1 generateText(...) call:
    • Add providerOptions: system1ProviderOptions.
  5. Message transformation for System 1 context should use System 1 thinking:
    • In getContextMessagesForSystem1(...), use anthropicThinkingEnabled: system1ProviderName === "anthropic" && effectiveSystem1ThinkingLevel !== "off".
    • Don’t reuse finalMessages if model matches but thinking differs (especially for Anthropic).

A4) Backend: apply System 1 filtering for background output (bash_output + task_await)

  1. Refactor the existing foreground filtering block in aiService.ts into a reusable helper, e.g.:
    • maybeFilterBashOutputWithSystem1({ output, script, toolName, ... }) -> { output, note? } | null
    • Keep existing thresholds (SYSTEM1_BASH_MIN_LINES, SYSTEM1_BASH_MIN_TOTAL_BYTES) and max kept lines.
  2. Wrap bash_output tool in toolsForStream when experiments.system1 === true:
    • After original execute, if result.success === true and result.output is large, run the helper and return filtered output.
  3. Wrap task_await tool similarly:
    • After original execute, iterate results.
    • For entries where taskId starts with bash::
      • If status === "running" and has output, filter the chunk when it crosses thresholds.
      • If status === "completed" and has reportMarkdown, extract the code-fenced output, filter it, and rebuild reportMarkdown.

Implementation detail to keep this robust:

  • Move formatBashOutputReport(...) out of task_await.ts into a shared util so the wrapper can rebuild markdown without brittle string surgery.

A5) Tests (unit, deterministic)

  • Add pure-function tests around the new glue we introduce:
    • Extract/rebuild helpers for task_await’s bash markdown report.
    • Trigger heuristics (lines/bytes thresholds).
  • Add System 1 wrapper tests via dependency injection:
    • Factor the “call System 1 LLM + parse keep ranges” into a function that accepts a generateText-like dependency so tests can supply a deterministic fake response (no network, no heavy mocks).

A6) Manual verification checklist (use scripts/system1-noisy-output.ts)

  • Update scripts/system1-noisy-output.ts to support burst/stream mode (for background testing):
    • Example flags: --bursts <n> and --sleep-ms <n> (no progress output; keep digits only on the secret line).
    • Ensure each burst independently exceeds System 1 thresholds (>10 lines and/or >4KB) but stays under bash/tool hard limits.
  • Foreground:
    • Run bash tool: bun scripts/system1-noisy-output.ts → output should be filtered and preserve ONLY_RELEVANT_NUMBER=....
  • Background:
    • Run bash with run_in_background: true: bun scripts/system1-noisy-output.ts --bursts 5 --sleep-ms 200.
    • While running, poll via bash_output → chunks should be filtered.
    • After exit, poll via task_await → completed reportMarkdown should be filtered + note appended.
  • Change System 1 Reasoning setting and confirm outbound provider options change accordingly (esp. OpenAI reasoningEffort).

Generated with mux • Model: openai:gpt-5.2 • Thinking: high • Cost: $34.27

@github-actions github-actions Bot added the enhancement New feature or functionality label Jan 22, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8cbbcc9519

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/node/services/aiService.ts Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Addressed feedback: System 1 now rebuilds its context messages using the System 1 model’s provider/modelString (re-transform + cache-control) instead of reusing the main provider’s finalMessages.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c698fe21ea

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/browser/hooks/useSendMessageOptions.ts Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Addressed feedback: guard against corrupted/non-string PREFERRED_SYSTEM_1_MODEL_KEY values in both useSendMessageOptions and getSendOptionsFromStorage so .trim() can’t throw.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ac5415fb0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/browser/hooks/useSendMessageOptions.ts Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Addressed: System 1 system1Model is now migrated + gateway-transformed (same as main model) in both useSendMessageOptions (reactive applyGatewayTransform) and getSendOptionsFromStorage (toGatewayModel). This makes System 1 filtering work for gateway-only setups.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ThomasK33 ThomasK33 force-pushed the system-1-dwqk branch 2 times, most recently from f3b40a8 to 384570b Compare January 22, 2026 20:59
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Please review the changes (System 1 reasoning setting + background bash filtering via bash_output/task_await wrappers).

@ThomasK33 ThomasK33 changed the title 🤖 feat: system 1 filter bash tool output 🤖 feat: system 1 reasoning + background bash filtering Jan 22, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 384570bd33

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/node/services/tools/bashTaskReport.ts Outdated
@ThomasK33 ThomasK33 force-pushed the system-1-dwqk branch 3 times, most recently from 36762f4 to b31327d Compare January 23, 2026 22:15
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1cfc59ef96

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/tools/bashTaskReport.ts Outdated
Change-Id: I6cf05ee13f2aacbbf741a159c3c100083b609358
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: Ida73e1657c7e71d45928c8a28c2ce91860be79b0
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I4da76d0cd272fc548f12a3e78b64b7f7bc66695a
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: Idfcd9d3e34c71caa84bbe7d4a454c53eaefc2264
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Adds scripts/system1-noisy-output.ts: a bun/TypeScript utility that emits controlled noisy output containing a single relevant random number.
Output is tuned to trigger System 1 bash-output filtering without hitting bash tmpfile overflow.

Signed-off-by: Thomas Kosiewski <tk@coder.com>

---
_Generated with [`mux`](https://github.com/coder/mux) • Model: `openai:gpt-5.2` • Thinking: `xhigh`_

Change-Id: Ie6d9980b209976689a3a899ee11e80b68e1cb845
Change-Id: I67896db8333e79c234167fc902f130bd86b52f01
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I3a055d66b2f575e60dd7d15724a939f588523c10
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: Ia1bb7d23f3a23f900a740039af7665d844bb49d3
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: Id1537c1a0d194dedc512b14da3a49b885a89b409
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I88a248067537af3654402948fc59286724835415
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I7f8d72a2d25f84ec58b1aca5917394aaa5950e9a
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I20cc0930292cdb9be62d5ffef333cfd63dd32687
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I2c610f2c94a27d87ac1be6619817d8ea8aa02df2
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: Ib3534cf2d3d5e3b1d66f402e068fc3cde581fd09
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: Ia24ec6edfafe6d11b8230535d577f70d5c3549b0
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I71275129d0764d917e61b08c0367ad2073c8415d
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I66eb23b10b56635b53b7df32ec9524d3e10c20b1
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I05877bc1b1d4ddb1ef699b82385baa28166f80b1
Signed-off-by: Thomas Kosiewski <tk@coder.com>
- Ensure process.env.MODE defaults to test so splash screens are disabled in Jest
- Add Happy DOM bootstrap + globals (Element, NodeFilter, etc) for Radix Portal/Select
- Seed localStorage before App render in createAppHarness and stabilize System 1 policy test

Change-Id: Ib2b098179757ca19ec5ec6f98600d7bfbd823d4d
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: Ia76d386011a340947837a47e28fd63792c17445d
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7f48fce5e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/common/orpc/schemas/api.ts
Change-Id: I2d3d3e4690efb3ff6ad8220e1cb39e696202d214
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I0f5561a3d09b4a83596a0796b03c420d11a05425
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d6592bf81

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/tools/bashTaskReport.ts Outdated
Change-Id: Ib6d1d4ca96a66230074ca757cf9b023fc729e984
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25e952a21c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Change-Id: I2180485fbee77c24e00dab60f22f6e873c40f12b
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 24d8105534

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/aiService.ts Outdated
Change-Id: I2130f40f7a204b37956d05222453ab79b89e3257
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue Jan 25, 2026
Merged via the queue into main with commit f3e97db Jan 25, 2026
38 of 40 checks passed
@ThomasK33 ThomasK33 deleted the system-1-dwqk branch January 25, 2026 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant