Skip to content

Fix API-key and OAuth auth mode persistence — conflict resolution + H1 fix (#395)#397

Merged
quangdang46 merged 58 commits into
masterfrom
merge/pr-395
Jun 6, 2026
Merged

Fix API-key and OAuth auth mode persistence — conflict resolution + H1 fix (#395)#397
quangdang46 merged 58 commits into
masterfrom
merge/pr-395

Conversation

@quangdang46
Copy link
Copy Markdown
Owner

Supersedes #395 — resolves 9 merge conflicts, fixes H1 (route_api_method propagation in SubagentTool), adapts loading.rs to use CASR-based loader.

Changes:

  • Resolved merge conflicts (Cargo.toml, loading.rs, copy_selection.rs, onboarding_flow_control.rs, session_picker.rs, live_provider_probes.rs, provider_e2e.rs, README.md)
  • H1 fix: session.route_api_method = parent_session.route_api_method.clone() in task.rs SubagentTool — subagents now inherit auth route from parent
  • Adapted loading.rs to use CASR single-source loader (replaced per-provider loaders)
  • Added load_external_cli_sessions_grouped_multi using CASR
  • Fixed ForeignSession pattern match in session_picker.rs

1jehuang added 30 commits June 5, 2026 01:11
…ed provider_key vocabularies

Claude (and OpenAI) sessions could silently shift from an API key onto
the OAuth subscription. Root cause: two divergent provider_key
vocabularies persist into sessions, and the session-reconstruction
helpers only understood one of them.

- The structured model-route picker (RPC) persists RuntimeKey::stable_id()
  values: claude-oauth / anthropic-api-key / openai-oauth / openai-api-key.
- The legacy /model + login path persists: claude / claude-api / openai /
  openai-api.

model_switch_request_for_session_model and
session_provider_key_matches_provider_name only matched the legacy
vocabulary. A session whose provider_key was 'anthropic-api-key' (without
a separately-persisted route_api_method, e.g. a forked/child/ambient/
overnight session) therefore reconstructed a bare model with no auth
prefix, leaving the Anthropic provider in Auto mode -- which now prefers
OAuth (commit 00e9b9f) -- silently moving an API-key user onto the
subscription.

Fix:
- Add canonical_session_provider_key() to fold the picker vocabulary back
  onto the canonical keys, and apply it in the reconstruction/match
  helpers so either vocabulary recovers the exact OAuth-vs-API-key route.
- Carry route_api_method alongside provider_key when copying a parent
  session to a child (ambient, overnight, fork, selfdev, crash recovery)
  so children reconstruct the full route even without the canonicalizer.

Adds a regression test proving anthropic-api-key/openai-api-key/-oauth
provider keys preserve the auth route without route_api_method.
The worker previously only accepted POST /v1/event; there was no visual
dashboard (just SQL files run by hand). Add a real one.

Headline metric (users.sql + stats.js): total_users = distinct, non-CI
telemetry_id that ever installed OR did meaningful work. Validated with
sqlite edge-case repros (install-only, turn_end-only with lost
session_end, empty open/close, CI). Reported alongside broader tiers
(reached) and narrower tiers (core, installed) plus raw CI-inclusive
totals so no signal is removed.

- src/stats.js: read-only aggregation (counts only, never raw rows) over
  users, DAU/WAU/MAU rollup, installs, D7 retention, engagement quality,
  per-turn, errors, feature adoption, transport, version/os/channel/
  provider/auth/onboarding breakdowns, 60d timeseries, recent feedback.
  One shared MEANINGFUL_SQL predicate so every window agrees.
- src/worker.js: GET / serves the dashboard, GET /v1/stats serves JSON
  gated behind DASHBOARD_TOKEN (deny-by-default), POST /v1/event
  unchanged. CORS widened to GET.
- src/dashboard.js: self-contained HTML/CSS/inline-SVG dashboard (no CDN,
  works under Cloudflare). Tiered layout: hero total-users number, active
  funnel + chart, 'how the number is built' transparency band, then
  acquisition/retention, engagement, reliability, breakdowns, features,
  feedback. Importance shown via hero/key tags/muted diagnostics.
- README + package.json: dashboard usage, DASHBOARD_TOKEN setup, npm run
  users; type:module to silence ESM warning.

Validated: node --check on all modules, getStats end-to-end against a
seeded sqlite D1 shim (total_users=3 with CI excluded), and rendered in a
real browser (token gate + every section + charts).
The native tool smoke only ever drove a single tool-call round-trip, so it
always replayed exactly one thought_signature and passed even when an earlier
function call would drop its signature. The Antigravity/Cloud Code backend
validates *every* functionCall in the replayed history, so the field 400
("Function call is missing a thought_signature ... position N") only
reproduces with a multi-call transcript.

- Extend run_live_native_provider_tool_smoke into two phases: the historical
  single round-trip (gating) plus a best-effort multi-call replay that rebuilds
  a history of two assistant tool_use blocks, each carrying its own signature.
- Delegate run_live_antigravity_native_tool_smoke to the shared probe so
  Antigravity (the runtime that hit this) gets the multi-call coverage too.
- Add an always-on unit guard (build_contents_replays_every_signature_across_
  multi_tool_history) so the serialization regression is caught for free,
  without spending live tokens.
…end-design skill

Two things the prior dashboard commit missed.

1) Restore every metric the old SQL surface (README queries, health.sql,
   dau.sql) exposed that had been dropped:
   - os/arch platform breakdown (was os-only)
   - session starts by UTC hour (usage-timing histogram)
   - pipeline-health diagnostics: lifecycle_ids, session_start_ids,
     lifecycle_ids_without_install, heaviest/top5/total session events
   - meaningful_sessions_30d count
   stats.js gains hours, arch, health, skew, meaningfulSessions queries;
   all validated end-to-end against a seeded sqlite D1 shim.

2) Redesign dashboard.js using the installed anthropics/frontend-design
   skill. The previous version used system fonts and the exact
   purple-gradient-on-dark the skill warns against. New 'Terminal
   Observatory' aesthetic, true to jcode being a CLI agent: JetBrains
   Mono instrument typography (Sora for prose), warm phosphor-amber
   signal color with a single cyan accent, scanline texture, station-
   clock hero number, numbered hairline section dividers, KEY/alert
   accent rails, an amber UTC-hour bar histogram, and a filled cyan
   active-users area chart. Tiered HEADLINE/SIGNAL/DIAGNOSTIC layout so
   the total-users number dominates while every figure stays visible.

Verified in a real browser: token gate, hero, all 8 sections, both
chart types render correctly. node --check passes on all modules.
…s catalog

Add NVIDIA's CUDA-X / GPU accelerated-computing agent skills (cuOpt,
cuPyNumeric, cuDF, CUDA-Q, and the cuTile/TileGym GPU-dev skill) to the
endorsed-skills list, sourced from the official NVIDIA-verified catalog
at github.com/NVIDIA/skills.

- EndorsedSkill gains category + optional install hint fields.
- /skills now groups endorsed skills by category with per-category
  installed counts and shows the 'npx skills add nvidia/skills' install
  command for missing skills, plus the catalog URL.
- Tests cover the new fields and the NVIDIA catalog entries.
Adds an integration test that drives the REAL update-detection core
(newer_binary_available, the function behind server_has_update) and the
reload-target resolver after a normal (non-self-dev) /update channel swap.

Models a shipped user: shared-server tracking stable, daemon running the old
release, /update installs a newer release and advances stable/current/
shared-server. Asserts both that the old daemon reports an update and that it
reloads into the freshly installed release. This documents that normal users
are covered by advance_shared_server_if_tracking_stable + the cross-flavor
reload target.
…y, full active tiers

Surface every remaining signal the original SQL surface (and the schema)
exposed but the dashboard had not yet shown:

- User leaderboard (sec 09): top 20 anonymous ids by lifecycle volume
  with sessions/turns/tokens/tool_calls, version and last-seen. CI and
  non-release ids are tagged and dimmed (the old 'Heavy telemetry IDs'
  query, made visual).
- Token usage (sec 04): full breakdown - input/output/cache_read/
  cache_creation/total, both 30d and all-time (was a single combined
  number).
- Agent autonomy (sec 05): spawned agents, subagent/swarm/background
  tasks + successes, user cancellations, and where agent time goes
  (active/model/tool/blocked/idle), time-to-first-action, avg max
  concurrency. These schema columns were never surfaced before.
- Active-user tiers: DAU/WAU/MAU now show meaningful + raw subvalues,
  not just the headline number.
- Engagement: added time-to-first-tool-success.

stats.js gains tokens, agent, and leaderboard queries (26 queries total,
all validated against the real schema via a seeded sqlite D1 shim).
dashboard.js renumbered to 11 sections with a new leaderboardPanel
renderer and CI/dev tag styling. Verified end-to-end in a real browser:
all sections, the leaderboard table, and both chart types render.
…parts

Live multi-call provider-doctor against gemini-3.1-pro-high surfaced a real
decode abort: the Antigravity/Cloud Code generateContent response occasionally
omits `role` (and sometimes `parts`) on a candidate's `content`, but the
struct required `role`, so the whole turn failed with "missing field
`role`". The response-side role is never read, so default both fields rather
than aborting. Adds two decode regression tests.
The first cut of the multi-call phase only nudged a 2nd tool call after the
model had already answered, so live runs reported multi_tool_replay=skipped and
never actually exercised the multi-functionCall history. Replace it with an
agentic loop driven by a two-file read prompt: each emitted tool call is
replayed (carrying its captured thought_signature) and answered with a
synthetic result, so by the final turn we send two assistant functionCall
blocks and assert the backend accepts the transcript. Surface the
verified/skipped status in the doctor report detail.

Verified live: provider-doctor antigravity -m gemini-3.1-pro-high --tier full
now reports 'multi-call signature replay verified'.
Cold /resume and onboarding/catch-up pickers were dominated by serial
per-file IO+JSON parsing over large session histories (87k jcode
snapshots + hundreds of Codex/Claude transcripts here).

- Add a bounded scoped-thread parallel_map helper in the session picker
  loader and use it for: candidate mtime stat (readdir then parallel
  stat), the jcode summary parse pass (two-phase: parallel fill to
  scan_limit, then parallel saved-gate over the tail), and the external
  Codex/pi/opencode stub parsers.
- Load the catch-up 'seen' state once (CatchupSeenSnapshot) instead of
  re-reading catchup_seen.json per session.
- Onboarding transcript picker now loads only the relevant external CLI
  (load_external_cli_sessions_grouped) instead of the full
  load_sessions_grouped on the UI thread.
- Catch-up picker now opens from cache and refreshes off-thread via the
  shared async picker-load path instead of blocking the live session.

Measured on real data (idle, 4 runs each):
  load_sessions          ~660ms -> ~434ms (~34%)
  load_sessions_grouped  ~685ms -> ~465ms (~32%)
  onboarding picker load ~685ms (UI thread) -> ~14ms scoped CLI load
Minor bump covering the 44 commits since v0.21.0, including:
- Eager token-by-token reasoning streaming and per-line multi-line
  thinking rendering in the TUI.
- Provider fixes: Gemini schema/thought_signature handling, Kimi
  reasoning_content, OpenRouter empty-message guard, Anthropic 1M
  context + split-cache cost accounting, API-key vs OAuth auth mode.
- Swarm: route messages by target, broadcast to whole swarm, inherit
  coordinator model/auth route on spawn.
- Self-dev reload correctness (daemon reloads into advertised binary),
  reload-trace OOM cap, and provider-doctor generic native suites.
- Served telemetry dashboard with accurate user/install metrics and
  /skills + endorsed NVIDIA CUDA-X skills.
Add Anthropic's official frontend-design skill (the best design-focused
agent skill) to the endorsed list under a new 'Anthropic Design'
category, sourced from github.com/anthropics/skills with an install hint.
The guided first-run onboarding flow auto-imports existing external CLI
logins (Claude/Codex/Gemini/Copilot/Cursor/OpenRouter) via
run_external_auth_auto_import_candidates, which bypasses the manual
pending_login path that record_auth_success was wired into. As a result
every auto-imported login -- the happy path of the new onboarding -- was
invisible to the activation funnel, making auth_success undercount badly
(observed: more users reaching first_assistant_response than auth_success
in post-0.17 install cohorts, which is impossible without auth).

Surface coarse (provider, method="import") telemetry labels from the
import outcome and record auth_success for each imported provider in both
the onboarding and manual /login auto-import callers. Domain logic in
jcode-app-core stays telemetry-free; the TUI layer emits the event,
matching existing call sites.
Codex/Claude Code preview loaders parsed the entire JSONL transcript
(often multiple MB, up to tens of MB) on every selection change just to
show the last ~20 messages. In the onboarding resume menu this made
arrow-key navigation lag badly, since each selection spawned a fresh
full-file parse thread. Normal /resume (jcode native sessions) avoids
this path, which is why only onboarding felt slow.

Read only the trailing 512 KiB of the file instead: drop the partial
first line, skip malformed boundary records, and parse the rest. This
turns each preview load from ~140ms into ~1ms regardless of transcript
size. Adds regression tests covering large (>cap) Codex and Claude
transcripts.
The native gmail tool keeps its interface, confirmation gating, access
tiers, and token-lean formatting, but its auth/transport is now pluggable
via GmailBackend (Direct | Composio).

- Direct: existing local Google OAuth tokens.
- Composio: routes the same Gmail REST calls through Composio's
  proxy-execute endpoint, brokered by a Google-verified app. No
  unverified-app warning and no 7-day testing-mode token expiry.

Backend is selected via JCODE_GMAIL_BACKEND=composio + COMPOSIO_API_KEY.
Capability checks (is_configured/can_send/can_delete) are now
backend-aware. Adds unit tests and docs/GMAIL_COMPOSIO_BACKEND.md.
The desktop render loop re-requested a redraw immediately after every
animated frame (welcome-hero reveal, focus pulse, spinners, smooth
scroll, streaming) in both the RedrawRequested handler and the
AboutToWait fallback. Because the surface uses non-blocking Mailbox
presentation, present() returns instantly, so the loop rendered as fast
as the CPU allowed (~300fps on a 60Hz panel) and pinned the main thread
near 100% CPU. That starved input handling and compositor scheduling,
which is the root cause of the laggy/janky animations and scrolling, and
made streaming events queue for 200ms-1s before the UI could process
them.

Schedule a paced redraw (DESKTOP_ANIMATION_FRAME_INTERVAL = 16ms,
serviced via ControlFlow::WaitUntil in AboutToWait) instead of an
immediate request. Measured idle main-thread CPU on the welcome screen
dropped from ~99% to ~0-3%, frame rate from ~305fps to display refresh,
while the stream-e2e benchmark still passes all interaction/no-paint
budgets (max no-paint gap 71ms vs 250ms budget).
The visual dashboard (dashboard.js + stats.js + GET / and GET /v1/stats
routing) was a separately-deployed Cloudflare Worker UI that does not
belong in the jcode repo. Remove it and restore the worker to its
POST /v1/event ingest-only surface.

The telemetry accuracy work it was built on (turn_end meaningfulness, CI
exclusion, the daily_active_users rollup) stays. users.sql remains as a
CLI query alongside dau.sql / health.sql.
The runtime welcome-hero mask ("Hello there") is built once on the first
single-session frame, but build_hero_reveal_texture runs a per-lit-pixel
nearest-stroke search (O(pixels x segments)) on the UI thread, costing
~600ms and stalling the start of the reveal animation.

Split the per-pixel fill across worker threads via std::thread::scope
(rows are independent and read-only over glyph_rgba/segments), reducing
the one-time build cost. Output is bit-identical to the serial path;
small images fall back to serial to avoid spawn overhead. Added parity
and worker-count tests.
…329)

The header and info widget hard-coded 'OpenRouter' for any model routed
through the OpenRouter slot, even when the user switched to a direct
OpenAI-compatible profile such as NVIDIA NIM at runtime. The display name
was resolved from process env vars that only reflect the startup profile,
so a runtime '/model' switch never updated the label.

Add a runtime-aware Provider::display_name() (default = name()) overridden
by OpenRouterProvider (maps profile_id -> 'NVIDIA NIM', etc.) and
MultiProvider (delegates to the active execution runtime). name() stays
the stable machine id ('openrouter') that billing/routing keys off.
format_model_name() in the header now uses the active provider's display
name instead of a fixed 'OpenRouter:' prefix.

Adds regression tests.
The copy-selection drag edge auto-scroll "hot zone" (top/bottom few rows
of the chat pane) fired unconditionally whenever a drag entered the band.
When the transcript was already pinned to the bottom (the common case),
dragging into the bottom rows snapped the selection cursor to the very
last visible line and armed a downward autoscroll, even though there was
nothing more below to scroll into. This made it impossible to precisely
highlight the bottom rows of the transcript: the selection kept jumping
to the end.

Gate each directional hot zone on whether there is actually more
transcript to scroll into that direction (scroll > 0 for up,
visible_end < line_count for down). When there is nothing to scroll, the
edge band stays inert so the selection lands on the exact cell under the
cursor.

Adds a regression test that drags into the bottom hot zone while pinned
to the bottom and asserts no autoscroll arms and the selection lands on
the targeted line.
Adds a 'connect' action to the gmail tool that drives Composio's hosted
Connect Link flow: it creates an auth-link session, opens the Google
consent screen in the browser, polls until the connection is ACTIVE, and
persists the connected account to ~/.jcode/composio_gmail.json so future
sessions are already authorized.

- ComposioConfig gains auth_config_id + persisted-connection fallback.
- GmailClient: connect(), needs_connection(), supports_connect(),
  create_link()/wait_for_connection() against /connected_accounts.
- tool/gmail.rs handles 'connect' before the config gate and hints the
  agent to connect when no account exists yet.
- Tests for connect/needs_connection/effective_user_id; docs updated.
The StreamBuffer previously revealed text the instant a provider delta
arrived (only capping bursts >96 chars per frame). OpenAI emits many tiny
token deltas so this looked smooth, but Anthropic coalesces deltas into
20-40 char bursts with gaps, so each burst popped in at once and the UI
stair-stepped.

Replace the burst cap with a time-paced proportional reveal: text
accumulates in a backlog and drips out at base + gain*backlog chars/sec,
with the per-step elapsed time clamped so idle gaps cannot bank budget
that dumps the next burst. This smooths bursty providers while keeping
fast steady feeds responsive. Remote tick now reveals via
flush_smooth_frame() to match; flush() still drains fully at finalize.
…picker

The first-run onboarding 'continue where you left off' picker previously
surfaced only ONE external CLI: when a user was logged into both Codex
and Claude Code, it picked whichever had the most recent transcript and
hid the other CLI's history entirely.

Now the onboarding picker loads and displays every detected external
CLI's transcripts together in one combined, recency-sorted list:
- Add SessionFilterMode::ExternalClis (Codex OR Claude Code).
- Add load_external_cli_sessions_grouped_multi to load several CLIs.
- onboarding_open_transcript_picker now takes the full detected CLI set;
  the banner reads 'We found your Codex and Claude Code sessions' when
  both are present.

Resume still works off each session's own id/source, so selecting either
CLI's transcript resumes correctly. Adds a regression test seeding both a
Codex and a Claude Code transcript and asserting both appear.
Show the model's live reasoning out of the box. DisplayConfig now defaults
reasoning_display to Current (with show_thinking=true to keep the provider
request + streaming display paths in sync), and the generated default config
documents reasoning_display = "current".
…arallel tool-call probe

- New REASONING_CAPABILITY checkpoint (taxonomy v3), never required for
  user-readiness and excluded from strict coverage. A reasoning word problem
  is sent and the turn is classified streamed/opaque/none from StreamEvent
  signals (ThinkingDelta text, ThinkingSignatureDelta, OpenAIReasoning, and
  Gemini-3 tool thought_signature). Absence records 'none' and passes.
- Shared native tool smoke gains a Phase 3 that asks for two tool calls in a
  single assistant message, replays both tool_use blocks (each with its own
  thought_signature) in one assistant turn and answers both results, recording
  parallel_tool_calls: verified|skipped (best-effort, never fails).
- Wired reasoning into the antigravity, generic-native, and claude drivers;
  skipped on non-full tiers; surfaced in the doctor report detail.
- Unit tests for classification, parallel replay shape, detail strings, and the
  observe-only contract (probe error -> skipped, never failed).
…as reasoning signal

A Gemini-3 thoughtSignature that was not consumed by a following functionCall
(e.g. a pure-text reasoning turn) was silently dropped. Emit it as a
ThinkingSignatureDelta instead so reasoning-aware consumers (and the new
provider-doctor reasoning probe) can observe that the model reasoned even when
no reasoning text and no tool call were produced.
…add scroll diag instrumentation

Building a fresh FontSystem every frame (rescanning all system fonts) inside the
inline-code/math pill geometry builder caused multi-ms per-frame scroll spikes
over code blocks. Reuse a thread-local measurement FontSystem instead.
1jehuang and others added 27 commits June 5, 2026 14:58
Standalone pictographic emoji/symbols (🔄 ⬜ → ✓ etc.) render identically under
Basic and Advanced cosmic-text shaping, so escalating the whole visible-window
buffer to Advanced shaping for them was pure per-frame scroll overhead on
emoji-rich transcripts. Only sequences that truly need shaping (variation
selectors, ZWJ, regional-indicator flag pairs) and lines carrying inline-code/
math spans still use Advanced. Cuts worst-case scroll-frame shaping cost.
'jcode server reload' (run by installers and the TUI's stale-server reload path)
now repairs the shared-server channel before sending the forced reload. The
running daemon resolves its reload target from that channel; if it still points
at the daemon's own old binary (the 'current client, stale server' state after a
no-op /update), a forced reload would just re-exec the same old binary. Repairs
shared-server -> stable when stable is strictly newer (never downgrades,
preserves a fresher self-dev pin).

Adds scripts/stale_server_upgrade_sandbox.sh: a live end-to-end sandbox that
starts a REAL released v0.14.6 daemon and runs the new client's
'jcode server reload', asserting the daemon upgrades to the new release.
Verified locally: v0.14.6 daemon -> v0.22 after reload, deterministic across
runs, fully isolated from the real global daemon via JCODE_SOCKET.
Configuring agents.swarm_model with an explicit auth-route prefix (e.g.
openai-api:gpt-5.5, openai-oauth:..., claude-api:..., claude-oauth:...)
now pins spawned swarm agents to that exact model + provider + auth route
instead of inheriting the coordinator's model. The prefix is split into a
bare model plus stable provider_key/route_api_method ids that round-trip
through ModelRouteApiMethod::parse on session restore.

Lets users force spawned agents onto a specific API-key route (e.g. GPT-5.5
via the OpenAI API) regardless of what the coordinator is running.
cosmic-text/rustybuzz/ttf-parser/swash/yazi/fontdb do all desktop transcript
glyph shaping and are 15-40x slower at opt-level=0, making debug/selfdev
scrolling of real emoji/markdown-heavy transcripts janky (p99 ~238ms) even
though release was smooth. Pin these stable third-party crates to opt-level=3
in dev/selfdev/test (same one-time-compile trick already used for
jcode-tui-anim). Debug-build scroll p99 drops 238ms -> 8.4ms with no impact on
recompile speed of jcode's own crates.
…iling

Profiles a realistic mix of user actions (smooth/whole-line scroll, selection
drag, composer typing, model-picker and session-switcher toggles, window resize,
and streaming growth) against the user's largest real on-disk transcripts, each
phase measured as per-frame CPU p50/p95/p99/max with a 120fps budget check.
Complements --real-transcript-scroll-benchmark for broad interaction coverage.
…emental wrap

The streaming_growth phase re-wrapped the entire transcript every frame, which
production avoids by caching the wrapped static base and only appending the
wrapped streaming tail. Mirror that here: wrap the static body once, then per
frame truncate to the static base and append the tail. Drops measured
streaming_growth p99 ~72ms -> ~18ms, reflecting the real production path.
…ines

Production caches the raw (unwrapped) styled body lines across resizes and only
re-runs the width-dependent wrap, via single_session_rendered_body_lines_from_raw_ref.
Mirror that in the resize phase instead of regenerating raw markdown lines every
frame. Measured window_resize p99 ~64ms -> ~28ms, matching the real path.
…aming

The streaming text loop re-ran text_content.find(...) over the ENTIRE
accumulated response on every TextDelta until a wrapped-tool-call marker was
found. For normal answers (no marker) that scanned everything every token:
O(response) per delta, O(response^2) over a full streamed answer.

Scan only the newly appended delta plus a short overlap window (so a marker
straddling the append boundary is still detected), giving O(delta) per token.
Add unit tests asserting equivalence to a full rescan across chunk sizes,
unicode, and the boundary-straddle case.
…uffer

parse_next_event reassigned self.buffer = self.buffer[pos+2..].to_string() for
every SSE event, copying and reallocating the entire remaining buffer each time.
When one network chunk batches many SSE events this is O(buffer^2). Use
String::drain(..pos+2) to remove the consumed prefix in place. Pure
behavior-preserving refactor.
prepare_body_incremental recounted user messages in messages[..prev_msg_count]
on every incremental append to seed prompt_num. Appending one message at a time
over a long session made that cumulative O(n^2). prev.user_prompt_texts is
extended in lockstep with each rendered user message, so its length already is
the prior user-prompt count; use it directly for O(1) seeding.
rebuild_items scanned every filtered session ref once per server group to
collect that group's sessions: O(groups * filtered_refs). With many remote
server groups and many sessions this scaled poorly on every search keystroke.
Bucket the filtered refs by group_idx in a single O(filtered_refs) pass, then
emit groups in order (O(groups)). Behavior (grouping, ordering, saved-id
filtering) is preserved.
Follow-up to the previous fix that stopped the edge auto-scroll hot zone
from snapping the selection to the last line while pinned. That left a
gap: dragging *past* the last line (down into the empty area below the
content-sized chat pane) no longer extended the selection at all, because
that overshoot row maps to no line and copy_point_from_screen returned
None. Native terminal/browser selection treats dragging past the last
line as "select through the end of that line".

Add copy_pane_drag_point(), which clamps vertical overshoot to the
nearest in-bounds line edge: a drag below the last visible line snaps to
the end of that line, and a drag above the first visible line snaps to
its start. A direct hit on a real line still yields precise per-cell
selection. Use it for both Drag and Up so the boundary line is fully
covered during the drag and on release.

Adds a regression test that anchors on the last content line, drags
straight down past the bottom of the pane with the cursor x only partway
through the line, and asserts the whole last line (through its end) is
selected without arming autoscroll or scrolling.
In 'current' reasoning-display mode the live dim/italic reasoning used to
vanish in a single frame when the answer committed or a tool ran, snapping
the transcript upward. Instead, on close the reasoning block is sliced out
of the streaming buffer into a dedicated collapsing 'reasoning' display
message that height-collapses (ease-out, oldest line first) toward a
one-line '▸ thought for Xs' summary, leaving a trace behind.

- New ReasoningCollapse state + begin/advance/finalize on App.
- Renders via a new 'reasoning' display role (dim+italic, sentinel-stripped).
- Redraw loop (local + remote tick, turn loop) advances the animation;
  redraw policy keeps frames live while collapsing.
- Reduced-motion / low-power tiers snap straight to the summary.
- Guards drop the animation safely on transcript reset/replace.
- Tests: block parsing, summary labels, monotone collapse, finalize,
  reduced-motion snap, and end-to-end dim/italic render of the role.
bump_display_messages_version recomputed display_user_message_count and
display_edit_tool_message_count by scanning all display messages twice on every
mutation. Appending one message at a time over a long session made counter
maintenance cumulatively O(M^2).

The hot append path now folds the single new message into the cached counters
(O(1)) and bumps the version without a full rescan; rarer bulk/remove/replace
paths still recompute fully. Add a test asserting the incrementally-maintained
counters match a full recompute after interleaved pushes and removes.
…_FUNCTION_CALL

Gemini-3 thinking models intermittently emit Python-style pseudo-code (e.g.
print(default_api.read(...))) instead of a clean functionCall, which the Cloud
Code backend rejects with finish_reason=MALFORMED_FUNCTION_CALL and empty
content. Previously the runtime ended the turn with a silent empty MessageEnd,
so the agent looked like it stalled with no answer. For gemini-3.1-pro-high this
hit roughly half of tool turns.

Three layered mitigations (per Gemini function-calling guidance / field reports):
1. Prevention: when tools are advertised, append a 'Function calling' guard to
   the Gemini system prompt forbidding code/namespaces (build_system_instruction_with_tool_guard).
2. Transparent retry: detect a malformed empty turn (is_retryable_empty_turn) and
   re-request up to twice before surfacing anything, so the agent never sees the
   blip. Retries force function-calling mode ANY so the model must emit a real
   functionCall instead of pseudo-code.
3. Surfacing: if output is still empty after retries, emit an actionable error
   (with the finish_reason and finishMessage) instead of a silent empty turn.

Also surfaces the previously-hidden finishMessage for diagnosis. Measured on the
live Antigravity backend: gemini-3.1-pro-high tool-call success went from ~50%
to ~7/8 (remaining miss was a probe-deadline timeout, not malformed). Unit tests
cover the guard and the retry classifier.
…pt per mouse move

Selection hit-testing (single_session_visible_body -> body viewport ->
single_session_rendered_body_lines_for_tick) re-parsed markdown and re-wrapped
the ENTIRE transcript on every selection mouse-move during a drag, an
O(transcript) cost per pointer event.

Add a thread-local single-entry memo keyed by the existing body cache key and
return the wrapped lines as a shared Rc, so the viewport only clones the visible
slice instead of the whole transcript. The render hot path keeps its separate
Canvas-side cache, so this only accelerates input/scroll-metric/geometry callers.

Measured on real transcripts (debug build): per-mouse-move selection hit-test
p99 ~121ms -> ~0.06ms. Adds a selection_input_hittest benchmark phase that
isolates this cost, plus a debug-only env gate for A/B measurement.
gather_recent_sessions fully parsed every session JSON file (the sessions dir
can hold tens of thousands) just to drop those older than the 24h cutoff and
keep the 20 most recent: O(all_sessions * parse) per ambient cycle.

Pre-filter candidate files by filesystem mtime (with a 1h margin for write/clock
skew) before loading, sort newest-first, and only parse up to a bounded budget
(4x the limit) before the existing id-based sort/truncate. Behavior is
preserved; work drops from O(all_sessions) to O(recent_sessions).
picker_fuzzy_score re-lowercased and re-collected the filter pattern into a
Vec<char> on every call, i.e. once per entry inside the per-keystroke filter
loop (O(entries * pattern)). Hoist pattern normalization out via
picker_fuzzy_pattern + picker_fuzzy_score_with_pattern and normalize once per
filter pass. Scoring behavior is unchanged.
search_matched_session_refs cloned the cached match set into candidates and
then cloned the new matches back into the cache: two full-list clones per
narrowing keystroke. Take the cached refs in place via mem::take (it is about to
be overwritten anyway), eliminating the candidates clone. Behavior unchanged.
…ection text

copy_selection_status built the entire selected string via
current_copy_selection_text just to report char/line counts in the status line.
This ran on every render frame while in copy mode, including every drag move, so
a large selection (e.g. select-all) re-allocated and re-joined the whole
transcript text each frame.

Add copy_selection_metrics (and a raw-lines fast path) that counts chars/lines
using the same slicing logic without allocating the joined string, and use it
for the status line. Add a test asserting the metrics exactly match the built
selection text's char/line counts.
…t O(L^2))

For each image/mermaid placeholder, body prep scanned forward through all
following blank lines to compute the placeholder height. A message with many
placeholders each followed by long blank runs made this O(wrapped_lines^2).

Precompute, in a single reverse pass, the blank-run length starting at every
line; the placeholder height is then an O(1) lookup. Extracted into a shared
compute_image_regions helper used by both wrap_lines and wrap_lines_with_map.
Behavior is identical (height = 1 + trailing blank run).
…todo progress

swarm list previously returned only a shallow roster (name, role, status,
files, age). Enrich each agent row with:

- live activity (processing + current tool name)
- provider/model
- token churn over a recent ~10s window + cumulative tokens
- turn count
- todo progress (completed/total)
- contextual idle/active duration label (idle Ns when ready, Ns when running)
- completion report when finished

Token churn and turn count are tracked in a new lock-free per-session
metrics registry (jcode-base::session_metrics) rather than on the Agent
struct, because swarm list reads stats while an agent may hold its own
Mutex<Agent> lock mid-turn (try_lock fails exactly when churn is most
interesting). Metrics are recorded from the streaming turn loop and run_turn,
and forgotten on session disconnect.

handle_comm_list now joins swarm membership with live session state and
todos, sharing the runtime-extras gathering helper in comm_sync.
…story

The 'current' reasoning collapse only ran in the live streaming path. When
the transcript was re-rendered from stored history (self-dev reload, resume,
remote sync, compaction-window expand), the shared history renderer replayed
every persisted reasoning trace in full regardless of reasoning_display mode,
so after the collapse animation finished a reload would bring all the
reasoning back.

format_reasoning_markup now honors the active mode:
- Off:     persisted reasoning is hidden entirely.
- Current: the block folds to a single '▸ thought (N lines)' trace line,
           matching the live collapse end state.
- Full:    classic full replay (unchanged).

Adds reasoning_summary_line_markup helper + tests for all three modes.
@quangdang46 quangdang46 merged commit 0ababdd into master Jun 6, 2026
4 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants