Fix API-key and OAuth auth mode persistence across providers by quangdang46 · Pull Request #392 · quangdang46/jcode

quangdang46 · 2026-06-05T18:32:49Z

Closed in favor of #394 (same changes with resolved merge conflicts and review fixes).

…ed provider_key vocabularies Claude (and OpenAI) sessions could silently shift from an API key onto the OAuth subscription. Root cause: two divergent provider_key vocabularies persist into sessions, and the session-reconstruction helpers only understood one of them. - The structured model-route picker (RPC) persists RuntimeKey::stable_id() values: claude-oauth / anthropic-api-key / openai-oauth / openai-api-key. - The legacy /model + login path persists: claude / claude-api / openai / openai-api. model_switch_request_for_session_model and session_provider_key_matches_provider_name only matched the legacy vocabulary. A session whose provider_key was 'anthropic-api-key' (without a separately-persisted route_api_method, e.g. a forked/child/ambient/ overnight session) therefore reconstructed a bare model with no auth prefix, leaving the Anthropic provider in Auto mode -- which now prefers OAuth (commit 00e9b9f) -- silently moving an API-key user onto the subscription. Fix: - Add canonical_session_provider_key() to fold the picker vocabulary back onto the canonical keys, and apply it in the reconstruction/match helpers so either vocabulary recovers the exact OAuth-vs-API-key route. - Carry route_api_method alongside provider_key when copying a parent session to a child (ambient, overnight, fork, selfdev, crash recovery) so children reconstruct the full route even without the canonicalizer. Adds a regression test proving anthropic-api-key/openai-api-key/-oauth provider keys preserve the auth route without route_api_method.

The worker previously only accepted POST /v1/event; there was no visual dashboard (just SQL files run by hand). Add a real one. Headline metric (users.sql + stats.js): total_users = distinct, non-CI telemetry_id that ever installed OR did meaningful work. Validated with sqlite edge-case repros (install-only, turn_end-only with lost session_end, empty open/close, CI). Reported alongside broader tiers (reached) and narrower tiers (core, installed) plus raw CI-inclusive totals so no signal is removed. - src/stats.js: read-only aggregation (counts only, never raw rows) over users, DAU/WAU/MAU rollup, installs, D7 retention, engagement quality, per-turn, errors, feature adoption, transport, version/os/channel/ provider/auth/onboarding breakdowns, 60d timeseries, recent feedback. One shared MEANINGFUL_SQL predicate so every window agrees. - src/worker.js: GET / serves the dashboard, GET /v1/stats serves JSON gated behind DASHBOARD_TOKEN (deny-by-default), POST /v1/event unchanged. CORS widened to GET. - src/dashboard.js: self-contained HTML/CSS/inline-SVG dashboard (no CDN, works under Cloudflare). Tiered layout: hero total-users number, active funnel + chart, 'how the number is built' transparency band, then acquisition/retention, engagement, reliability, breakdowns, features, feedback. Importance shown via hero/key tags/muted diagnostics. - README + package.json: dashboard usage, DASHBOARD_TOKEN setup, npm run users; type:module to silence ESM warning. Validated: node --check on all modules, getStats end-to-end against a seeded sqlite D1 shim (total_users=3 with CI excluded), and rendered in a real browser (token gate + every section + charts).

The native tool smoke only ever drove a single tool-call round-trip, so it always replayed exactly one thought_signature and passed even when an earlier function call would drop its signature. The Antigravity/Cloud Code backend validates *every* functionCall in the replayed history, so the field 400 ("Function call is missing a thought_signature ... position N") only reproduces with a multi-call transcript. - Extend run_live_native_provider_tool_smoke into two phases: the historical single round-trip (gating) plus a best-effort multi-call replay that rebuilds a history of two assistant tool_use blocks, each carrying its own signature. - Delegate run_live_antigravity_native_tool_smoke to the shared probe so Antigravity (the runtime that hit this) gets the multi-call coverage too. - Add an always-on unit guard (build_contents_replays_every_signature_across_ multi_tool_history) so the serialization regression is caught for free, without spending live tokens.

…end-design skill Two things the prior dashboard commit missed. 1) Restore every metric the old SQL surface (README queries, health.sql, dau.sql) exposed that had been dropped: - os/arch platform breakdown (was os-only) - session starts by UTC hour (usage-timing histogram) - pipeline-health diagnostics: lifecycle_ids, session_start_ids, lifecycle_ids_without_install, heaviest/top5/total session events - meaningful_sessions_30d count stats.js gains hours, arch, health, skew, meaningfulSessions queries; all validated end-to-end against a seeded sqlite D1 shim. 2) Redesign dashboard.js using the installed anthropics/frontend-design skill. The previous version used system fonts and the exact purple-gradient-on-dark the skill warns against. New 'Terminal Observatory' aesthetic, true to jcode being a CLI agent: JetBrains Mono instrument typography (Sora for prose), warm phosphor-amber signal color with a single cyan accent, scanline texture, station- clock hero number, numbered hairline section dividers, KEY/alert accent rails, an amber UTC-hour bar histogram, and a filled cyan active-users area chart. Tiered HEADLINE/SIGNAL/DIAGNOSTIC layout so the total-users number dominates while every figure stays visible. Verified in a real browser: token gate, hero, all 8 sections, both chart types render correctly. node --check passes on all modules.

…s catalog Add NVIDIA's CUDA-X / GPU accelerated-computing agent skills (cuOpt, cuPyNumeric, cuDF, CUDA-Q, and the cuTile/TileGym GPU-dev skill) to the endorsed-skills list, sourced from the official NVIDIA-verified catalog at github.com/NVIDIA/skills. - EndorsedSkill gains category + optional install hint fields. - /skills now groups endorsed skills by category with per-category installed counts and shows the 'npx skills add nvidia/skills' install command for missing skills, plus the catalog URL. - Tests cover the new fields and the NVIDIA catalog entries.

Adds an integration test that drives the REAL update-detection core (newer_binary_available, the function behind server_has_update) and the reload-target resolver after a normal (non-self-dev) /update channel swap. Models a shipped user: shared-server tracking stable, daemon running the old release, /update installs a newer release and advances stable/current/ shared-server. Asserts both that the old daemon reports an update and that it reloads into the freshly installed release. This documents that normal users are covered by advance_shared_server_if_tracking_stable + the cross-flavor reload target.

…y, full active tiers Surface every remaining signal the original SQL surface (and the schema) exposed but the dashboard had not yet shown: - User leaderboard (sec 09): top 20 anonymous ids by lifecycle volume with sessions/turns/tokens/tool_calls, version and last-seen. CI and non-release ids are tagged and dimmed (the old 'Heavy telemetry IDs' query, made visual). - Token usage (sec 04): full breakdown - input/output/cache_read/ cache_creation/total, both 30d and all-time (was a single combined number). - Agent autonomy (sec 05): spawned agents, subagent/swarm/background tasks + successes, user cancellations, and where agent time goes (active/model/tool/blocked/idle), time-to-first-action, avg max concurrency. These schema columns were never surfaced before. - Active-user tiers: DAU/WAU/MAU now show meaningful + raw subvalues, not just the headline number. - Engagement: added time-to-first-tool-success. stats.js gains tokens, agent, and leaderboard queries (26 queries total, all validated against the real schema via a seeded sqlite D1 shim). dashboard.js renumbered to 11 sections with a new leaderboardPanel renderer and CI/dev tag styling. Verified end-to-end in a real browser: all sections, the leaderboard table, and both chart types render.

…parts Live multi-call provider-doctor against gemini-3.1-pro-high surfaced a real decode abort: the Antigravity/Cloud Code generateContent response occasionally omits `role` (and sometimes `parts`) on a candidate's `content`, but the struct required `role`, so the whole turn failed with "missing field `role`". The response-side role is never read, so default both fields rather than aborting. Adds two decode regression tests.

The first cut of the multi-call phase only nudged a 2nd tool call after the model had already answered, so live runs reported multi_tool_replay=skipped and never actually exercised the multi-functionCall history. Replace it with an agentic loop driven by a two-file read prompt: each emitted tool call is replayed (carrying its captured thought_signature) and answered with a synthetic result, so by the final turn we send two assistant functionCall blocks and assert the backend accepts the transcript. Surface the verified/skipped status in the doctor report detail. Verified live: provider-doctor antigravity -m gemini-3.1-pro-high --tier full now reports 'multi-call signature replay verified'.

Cold /resume and onboarding/catch-up pickers were dominated by serial per-file IO+JSON parsing over large session histories (87k jcode snapshots + hundreds of Codex/Claude transcripts here). - Add a bounded scoped-thread parallel_map helper in the session picker loader and use it for: candidate mtime stat (readdir then parallel stat), the jcode summary parse pass (two-phase: parallel fill to scan_limit, then parallel saved-gate over the tail), and the external Codex/pi/opencode stub parsers. - Load the catch-up 'seen' state once (CatchupSeenSnapshot) instead of re-reading catchup_seen.json per session. - Onboarding transcript picker now loads only the relevant external CLI (load_external_cli_sessions_grouped) instead of the full load_sessions_grouped on the UI thread. - Catch-up picker now opens from cache and refreshes off-thread via the shared async picker-load path instead of blocking the live session. Measured on real data (idle, 4 runs each): load_sessions ~660ms -> ~434ms (~34%) load_sessions_grouped ~685ms -> ~465ms (~32%) onboarding picker load ~685ms (UI thread) -> ~14ms scoped CLI load

Minor bump covering the 44 commits since v0.21.0, including: - Eager token-by-token reasoning streaming and per-line multi-line thinking rendering in the TUI. - Provider fixes: Gemini schema/thought_signature handling, Kimi reasoning_content, OpenRouter empty-message guard, Anthropic 1M context + split-cache cost accounting, API-key vs OAuth auth mode. - Swarm: route messages by target, broadcast to whole swarm, inherit coordinator model/auth route on spawn. - Self-dev reload correctness (daemon reloads into advertised binary), reload-trace OOM cap, and provider-doctor generic native suites. - Served telemetry dashboard with accurate user/install metrics and /skills + endorsed NVIDIA CUDA-X skills.

Add Anthropic's official frontend-design skill (the best design-focused agent skill) to the endorsed list under a new 'Anthropic Design' category, sourced from github.com/anthropics/skills with an install hint.

The guided first-run onboarding flow auto-imports existing external CLI logins (Claude/Codex/Gemini/Copilot/Cursor/OpenRouter) via run_external_auth_auto_import_candidates, which bypasses the manual pending_login path that record_auth_success was wired into. As a result every auto-imported login -- the happy path of the new onboarding -- was invisible to the activation funnel, making auth_success undercount badly (observed: more users reaching first_assistant_response than auth_success in post-0.17 install cohorts, which is impossible without auth). Surface coarse (provider, method="import") telemetry labels from the import outcome and record auth_success for each imported provider in both the onboarding and manual /login auto-import callers. Domain logic in jcode-app-core stays telemetry-free; the TUI layer emits the event, matching existing call sites.

* fix(provider): keep API-key vs OAuth auth mode across the two persisted provider_key vocabularies Claude (and OpenAI) sessions could silently shift from an API key onto the OAuth subscription. Root cause: two divergent provider_key vocabularies persist into sessions, and the session-reconstruction helpers only understood one of them. - The structured model-route picker (RPC) persists RuntimeKey::stable_id() values: claude-oauth / anthropic-api-key / openai-oauth / openai-api-key. - The legacy /model + login path persists: claude / claude-api / openai / openai-api. model_switch_request_for_session_model and session_provider_key_matches_provider_name only matched the legacy vocabulary. A session whose provider_key was 'anthropic-api-key' (without a separately-persisted route_api_method, e.g. a forked/child/ambient/ overnight session) therefore reconstructed a bare model with no auth prefix, leaving the Anthropic provider in Auto mode -- which now prefers OAuth (commit 00e9b9f) -- silently moving an API-key user onto the subscription. Fix: - Add canonical_session_provider_key() to fold the picker vocabulary back onto the canonical keys, and apply it in the reconstruction/match helpers so either vocabulary recovers the exact OAuth-vs-API-key route. - Carry route_api_method alongside provider_key when copying a parent session to a child (ambient, overnight, fork, selfdev, crash recovery) so children reconstruct the full route even without the canonicalizer. Adds a regression test proving anthropic-api-key/openai-api-key/-oauth provider keys preserve the auth route without route_api_method. * telemetry: add served dashboard with accurate 'total users' headline The worker previously only accepted POST /v1/event; there was no visual dashboard (just SQL files run by hand). Add a real one. Headline metric (users.sql + stats.js): total_users = distinct, non-CI telemetry_id that ever installed OR did meaningful work. Validated with sqlite edge-case repros (install-only, turn_end-only with lost session_end, empty open/close, CI). Reported alongside broader tiers (reached) and narrower tiers (core, installed) plus raw CI-inclusive totals so no signal is removed. - src/stats.js: read-only aggregation (counts only, never raw rows) over users, DAU/WAU/MAU rollup, installs, D7 retention, engagement quality, per-turn, errors, feature adoption, transport, version/os/channel/ provider/auth/onboarding breakdowns, 60d timeseries, recent feedback. One shared MEANINGFUL_SQL predicate so every window agrees. - src/worker.js: GET / serves the dashboard, GET /v1/stats serves JSON gated behind DASHBOARD_TOKEN (deny-by-default), POST /v1/event unchanged. CORS widened to GET. - src/dashboard.js: self-contained HTML/CSS/inline-SVG dashboard (no CDN, works under Cloudflare). Tiered layout: hero total-users number, active funnel + chart, 'how the number is built' transparency band, then acquisition/retention, engagement, reliability, breakdowns, features, feedback. Importance shown via hero/key tags/muted diagnostics. - README + package.json: dashboard usage, DASHBOARD_TOKEN setup, npm run users; type:module to silence ESM warning. Validated: node --check on all modules, getStats end-to-end against a seeded sqlite D1 shim (total_users=3 with CI excluded), and rendered in a real browser (token gate + every section + charts). * test(provider-doctor): cover multi-call thought_signature replay The native tool smoke only ever drove a single tool-call round-trip, so it always replayed exactly one thought_signature and passed even when an earlier function call would drop its signature. The Antigravity/Cloud Code backend validates *every* functionCall in the replayed history, so the field 400 ("Function call is missing a thought_signature ... position N") only reproduces with a multi-call transcript. - Extend run_live_native_provider_tool_smoke into two phases: the historical single round-trip (gating) plus a best-effort multi-call replay that rebuilds a history of two assistant tool_use blocks, each carrying its own signature. - Delegate run_live_antigravity_native_tool_smoke to the shared probe so Antigravity (the runtime that hit this) gets the multi-call coverage too. - Add an always-on unit guard (build_contents_replays_every_signature_across_ multi_tool_history) so the serialization regression is caught for free, without spending live tokens. * telemetry dashboard: restore all legacy metrics + redesign with frontend-design skill Two things the prior dashboard commit missed. 1) Restore every metric the old SQL surface (README queries, health.sql, dau.sql) exposed that had been dropped: - os/arch platform breakdown (was os-only) - session starts by UTC hour (usage-timing histogram) - pipeline-health diagnostics: lifecycle_ids, session_start_ids, lifecycle_ids_without_install, heaviest/top5/total session events - meaningful_sessions_30d count stats.js gains hours, arch, health, skew, meaningfulSessions queries; all validated end-to-end against a seeded sqlite D1 shim. 2) Redesign dashboard.js using the installed anthropics/frontend-design skill. The previous version used system fonts and the exact purple-gradient-on-dark the skill warns against. New 'Terminal Observatory' aesthetic, true to jcode being a CLI agent: JetBrains Mono instrument typography (Sora for prose), warm phosphor-amber signal color with a single cyan accent, scanline texture, station- clock hero number, numbered hairline section dividers, KEY/alert accent rails, an amber UTC-hour bar histogram, and a filled cyan active-users area chart. Tiered HEADLINE/SIGNAL/DIAGNOSTIC layout so the total-users number dominates while every figure stays visible. Verified in a real browser: token gate, hero, all 8 sections, both chart types render correctly. node --check passes on all modules. * feat(skills): endorse NVIDIA CUDA-X skills from official NVIDIA/skills catalog Add NVIDIA's CUDA-X / GPU accelerated-computing agent skills (cuOpt, cuPyNumeric, cuDF, CUDA-Q, and the cuTile/TileGym GPU-dev skill) to the endorsed-skills list, sourced from the official NVIDIA-verified catalog at github.com/NVIDIA/skills. - EndorsedSkill gains category + optional install hint fields. - /skills now groups endorsed skills by category with per-category installed counts and shows the 'npx skills add nvidia/skills' install command for missing skills, plus the catalog URL. - Tests cover the new fields and the NVIDIA catalog entries. * test(reload): prove normal-user /update upgrades the daemon end-to-end Adds an integration test that drives the REAL update-detection core (newer_binary_available, the function behind server_has_update) and the reload-target resolver after a normal (non-self-dev) /update channel swap. Models a shipped user: shared-server tracking stable, daemon running the old release, /update installs a newer release and advances stable/current/ shared-server. Asserts both that the old daemon reports an update and that it reloads into the freshly installed release. This documents that normal users are covered by advance_shared_server_if_tracking_stable + the cross-flavor reload target. * telemetry dashboard: add user leaderboard, token usage, agent autonomy, full active tiers Surface every remaining signal the original SQL surface (and the schema) exposed but the dashboard had not yet shown: - User leaderboard (sec 09): top 20 anonymous ids by lifecycle volume with sessions/turns/tokens/tool_calls, version and last-seen. CI and non-release ids are tagged and dimmed (the old 'Heavy telemetry IDs' query, made visual). - Token usage (sec 04): full breakdown - input/output/cache_read/ cache_creation/total, both 30d and all-time (was a single combined number). - Agent autonomy (sec 05): spawned agents, subagent/swarm/background tasks + successes, user cancellations, and where agent time goes (active/model/tool/blocked/idle), time-to-first-action, avg max concurrency. These schema columns were never surfaced before. - Active-user tiers: DAU/WAU/MAU now show meaningful + raw subvalues, not just the headline number. - Engagement: added time-to-first-tool-success. stats.js gains tokens, agent, and leaderboard queries (26 queries total, all validated against the real schema via a seeded sqlite D1 shim). dashboard.js renumbered to 11 sections with a new leaderboardPanel renderer and CI/dev tag styling. Verified end-to-end in a real browser: all sections, the leaderboard table, and both chart types render. * fix(gemini): tolerate generateContent candidate content without role/parts Live multi-call provider-doctor against gemini-3.1-pro-high surfaced a real decode abort: the Antigravity/Cloud Code generateContent response occasionally omits `role` (and sometimes `parts`) on a candidate's `content`, but the struct required `role`, so the whole turn failed with "missing field `role`". The response-side role is never read, so default both fields rather than aborting. Adds two decode regression tests. * test(provider-doctor): drive a real multi-call signature-replay loop The first cut of the multi-call phase only nudged a 2nd tool call after the model had already answered, so live runs reported multi_tool_replay=skipped and never actually exercised the multi-functionCall history. Replace it with an agentic loop driven by a two-file read prompt: each emitted tool call is replayed (carrying its captured thought_signature) and answered with a synthetic result, so by the final turn we send two assistant functionCall blocks and assert the backend accepts the transcript. Surface the verified/skipped status in the doctor report detail. Verified live: provider-doctor antigravity -m gemini-3.1-pro-high --tier full now reports 'multi-call signature replay verified'. * perf(resume): parallelize session loading; scope onboarding picker Cold /resume and onboarding/catch-up pickers were dominated by serial per-file IO+JSON parsing over large session histories (87k jcode snapshots + hundreds of Codex/Claude transcripts here). - Add a bounded scoped-thread parallel_map helper in the session picker loader and use it for: candidate mtime stat (readdir then parallel stat), the jcode summary parse pass (two-phase: parallel fill to scan_limit, then parallel saved-gate over the tail), and the external Codex/pi/opencode stub parsers. - Load the catch-up 'seen' state once (CatchupSeenSnapshot) instead of re-reading catchup_seen.json per session. - Onboarding transcript picker now loads only the relevant external CLI (load_external_cli_sessions_grouped) instead of the full load_sessions_grouped on the UI thread. - Catch-up picker now opens from cache and refreshes off-thread via the shared async picker-load path instead of blocking the live session. Measured on real data (idle, 4 runs each): load_sessions ~660ms -> ~434ms (~34%) load_sessions_grouped ~685ms -> ~465ms (~32%) onboarding picker load ~685ms (UI thread) -> ~14ms scoped CLI load * chore(release): bump version to 0.22.0 Minor bump covering the 44 commits since v0.21.0, including: - Eager token-by-token reasoning streaming and per-line multi-line thinking rendering in the TUI. - Provider fixes: Gemini schema/thought_signature handling, Kimi reasoning_content, OpenRouter empty-message guard, Anthropic 1M context + split-cache cost accounting, API-key vs OAuth auth mode. - Swarm: route messages by target, broadcast to whole swarm, inherit coordinator model/auth route on spawn. - Self-dev reload correctness (daemon reloads into advertised binary), reload-trace OOM cap, and provider-doctor generic native suites. - Served telemetry dashboard with accurate user/install metrics and /skills + endorsed NVIDIA CUDA-X skills. * feat(skills): endorse Anthropic frontend-design skill Add Anthropic's official frontend-design skill (the best design-focused agent skill) to the endorsed list under a new 'Anthropic Design' category, sourced from github.com/anthropics/skills with an install hint. * fix(onboarding): record auth_success for auto-imported logins The guided first-run onboarding flow auto-imports existing external CLI logins (Claude/Codex/Gemini/Copilot/Cursor/OpenRouter) via run_external_auth_auto_import_candidates, which bypasses the manual pending_login path that record_auth_success was wired into. As a result every auto-imported login -- the happy path of the new onboarding -- was invisible to the activation funnel, making auth_success undercount badly (observed: more users reaching first_assistant_response than auth_success in post-0.17 install cohorts, which is impossible without auth). Surface coarse (provider, method="import") telemetry labels from the import outcome and record auth_success for each imported provider in both the onboarding and manual /login auto-import callers. Domain logic in jcode-app-core stays telemetry-free; the TUI layer emits the event, matching existing call sites. --------- Co-authored-by: jeremy <94247773+1jehuang@users.noreply.github.com> Co-authored-by: quangdang46 <quangdang46@users.noreply.github.com>

1jehuang added 13 commits June 5, 2026 01:11

feat(skills): endorse Anthropic frontend-design skill

0611ae8

Add Anthropic's official frontend-design skill (the best design-focused agent skill) to the endorsed list under a new 'Anthropic Design' category, sourced from github.com/anthropics/skills with an install hint.

quangdang46 mentioned this pull request Jun 5, 2026

Merge PR #392: Fix API-key and OAuth auth mode persistence + review fixes #394

Merged

quangdang46 closed this Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix API-key and OAuth auth mode persistence across providers#392

Fix API-key and OAuth auth mode persistence across providers#392
quangdang46 wants to merge 13 commits into
quangdang46:masterfrom
1jehuang:master

quangdang46 commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

quangdang46 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

quangdang46 commented Jun 5, 2026 •

edited

Loading