Fix API-key and OAuth auth mode persistence across providers#392
Closed
quangdang46 wants to merge 13 commits into
Closed
Fix API-key and OAuth auth mode persistence across providers#392quangdang46 wants to merge 13 commits into
quangdang46 wants to merge 13 commits into
Conversation
…ed provider_key vocabularies Claude (and OpenAI) sessions could silently shift from an API key onto the OAuth subscription. Root cause: two divergent provider_key vocabularies persist into sessions, and the session-reconstruction helpers only understood one of them. - The structured model-route picker (RPC) persists RuntimeKey::stable_id() values: claude-oauth / anthropic-api-key / openai-oauth / openai-api-key. - The legacy /model + login path persists: claude / claude-api / openai / openai-api. model_switch_request_for_session_model and session_provider_key_matches_provider_name only matched the legacy vocabulary. A session whose provider_key was 'anthropic-api-key' (without a separately-persisted route_api_method, e.g. a forked/child/ambient/ overnight session) therefore reconstructed a bare model with no auth prefix, leaving the Anthropic provider in Auto mode -- which now prefers OAuth (commit 00e9b9f) -- silently moving an API-key user onto the subscription. Fix: - Add canonical_session_provider_key() to fold the picker vocabulary back onto the canonical keys, and apply it in the reconstruction/match helpers so either vocabulary recovers the exact OAuth-vs-API-key route. - Carry route_api_method alongside provider_key when copying a parent session to a child (ambient, overnight, fork, selfdev, crash recovery) so children reconstruct the full route even without the canonicalizer. Adds a regression test proving anthropic-api-key/openai-api-key/-oauth provider keys preserve the auth route without route_api_method.
The worker previously only accepted POST /v1/event; there was no visual dashboard (just SQL files run by hand). Add a real one. Headline metric (users.sql + stats.js): total_users = distinct, non-CI telemetry_id that ever installed OR did meaningful work. Validated with sqlite edge-case repros (install-only, turn_end-only with lost session_end, empty open/close, CI). Reported alongside broader tiers (reached) and narrower tiers (core, installed) plus raw CI-inclusive totals so no signal is removed. - src/stats.js: read-only aggregation (counts only, never raw rows) over users, DAU/WAU/MAU rollup, installs, D7 retention, engagement quality, per-turn, errors, feature adoption, transport, version/os/channel/ provider/auth/onboarding breakdowns, 60d timeseries, recent feedback. One shared MEANINGFUL_SQL predicate so every window agrees. - src/worker.js: GET / serves the dashboard, GET /v1/stats serves JSON gated behind DASHBOARD_TOKEN (deny-by-default), POST /v1/event unchanged. CORS widened to GET. - src/dashboard.js: self-contained HTML/CSS/inline-SVG dashboard (no CDN, works under Cloudflare). Tiered layout: hero total-users number, active funnel + chart, 'how the number is built' transparency band, then acquisition/retention, engagement, reliability, breakdowns, features, feedback. Importance shown via hero/key tags/muted diagnostics. - README + package.json: dashboard usage, DASHBOARD_TOKEN setup, npm run users; type:module to silence ESM warning. Validated: node --check on all modules, getStats end-to-end against a seeded sqlite D1 shim (total_users=3 with CI excluded), and rendered in a real browser (token gate + every section + charts).
The native tool smoke only ever drove a single tool-call round-trip, so it
always replayed exactly one thought_signature and passed even when an earlier
function call would drop its signature. The Antigravity/Cloud Code backend
validates *every* functionCall in the replayed history, so the field 400
("Function call is missing a thought_signature ... position N") only
reproduces with a multi-call transcript.
- Extend run_live_native_provider_tool_smoke into two phases: the historical
single round-trip (gating) plus a best-effort multi-call replay that rebuilds
a history of two assistant tool_use blocks, each carrying its own signature.
- Delegate run_live_antigravity_native_tool_smoke to the shared probe so
Antigravity (the runtime that hit this) gets the multi-call coverage too.
- Add an always-on unit guard (build_contents_replays_every_signature_across_
multi_tool_history) so the serialization regression is caught for free,
without spending live tokens.
…end-design skill
Two things the prior dashboard commit missed.
1) Restore every metric the old SQL surface (README queries, health.sql,
dau.sql) exposed that had been dropped:
- os/arch platform breakdown (was os-only)
- session starts by UTC hour (usage-timing histogram)
- pipeline-health diagnostics: lifecycle_ids, session_start_ids,
lifecycle_ids_without_install, heaviest/top5/total session events
- meaningful_sessions_30d count
stats.js gains hours, arch, health, skew, meaningfulSessions queries;
all validated end-to-end against a seeded sqlite D1 shim.
2) Redesign dashboard.js using the installed anthropics/frontend-design
skill. The previous version used system fonts and the exact
purple-gradient-on-dark the skill warns against. New 'Terminal
Observatory' aesthetic, true to jcode being a CLI agent: JetBrains
Mono instrument typography (Sora for prose), warm phosphor-amber
signal color with a single cyan accent, scanline texture, station-
clock hero number, numbered hairline section dividers, KEY/alert
accent rails, an amber UTC-hour bar histogram, and a filled cyan
active-users area chart. Tiered HEADLINE/SIGNAL/DIAGNOSTIC layout so
the total-users number dominates while every figure stays visible.
Verified in a real browser: token gate, hero, all 8 sections, both
chart types render correctly. node --check passes on all modules.
…s catalog Add NVIDIA's CUDA-X / GPU accelerated-computing agent skills (cuOpt, cuPyNumeric, cuDF, CUDA-Q, and the cuTile/TileGym GPU-dev skill) to the endorsed-skills list, sourced from the official NVIDIA-verified catalog at github.com/NVIDIA/skills. - EndorsedSkill gains category + optional install hint fields. - /skills now groups endorsed skills by category with per-category installed counts and shows the 'npx skills add nvidia/skills' install command for missing skills, plus the catalog URL. - Tests cover the new fields and the NVIDIA catalog entries.
Adds an integration test that drives the REAL update-detection core (newer_binary_available, the function behind server_has_update) and the reload-target resolver after a normal (non-self-dev) /update channel swap. Models a shipped user: shared-server tracking stable, daemon running the old release, /update installs a newer release and advances stable/current/ shared-server. Asserts both that the old daemon reports an update and that it reloads into the freshly installed release. This documents that normal users are covered by advance_shared_server_if_tracking_stable + the cross-flavor reload target.
…y, full active tiers Surface every remaining signal the original SQL surface (and the schema) exposed but the dashboard had not yet shown: - User leaderboard (sec 09): top 20 anonymous ids by lifecycle volume with sessions/turns/tokens/tool_calls, version and last-seen. CI and non-release ids are tagged and dimmed (the old 'Heavy telemetry IDs' query, made visual). - Token usage (sec 04): full breakdown - input/output/cache_read/ cache_creation/total, both 30d and all-time (was a single combined number). - Agent autonomy (sec 05): spawned agents, subagent/swarm/background tasks + successes, user cancellations, and where agent time goes (active/model/tool/blocked/idle), time-to-first-action, avg max concurrency. These schema columns were never surfaced before. - Active-user tiers: DAU/WAU/MAU now show meaningful + raw subvalues, not just the headline number. - Engagement: added time-to-first-tool-success. stats.js gains tokens, agent, and leaderboard queries (26 queries total, all validated against the real schema via a seeded sqlite D1 shim). dashboard.js renumbered to 11 sections with a new leaderboardPanel renderer and CI/dev tag styling. Verified end-to-end in a real browser: all sections, the leaderboard table, and both chart types render.
…parts Live multi-call provider-doctor against gemini-3.1-pro-high surfaced a real decode abort: the Antigravity/Cloud Code generateContent response occasionally omits `role` (and sometimes `parts`) on a candidate's `content`, but the struct required `role`, so the whole turn failed with "missing field `role`". The response-side role is never read, so default both fields rather than aborting. Adds two decode regression tests.
The first cut of the multi-call phase only nudged a 2nd tool call after the model had already answered, so live runs reported multi_tool_replay=skipped and never actually exercised the multi-functionCall history. Replace it with an agentic loop driven by a two-file read prompt: each emitted tool call is replayed (carrying its captured thought_signature) and answered with a synthetic result, so by the final turn we send two assistant functionCall blocks and assert the backend accepts the transcript. Surface the verified/skipped status in the doctor report detail. Verified live: provider-doctor antigravity -m gemini-3.1-pro-high --tier full now reports 'multi-call signature replay verified'.
Cold /resume and onboarding/catch-up pickers were dominated by serial per-file IO+JSON parsing over large session histories (87k jcode snapshots + hundreds of Codex/Claude transcripts here). - Add a bounded scoped-thread parallel_map helper in the session picker loader and use it for: candidate mtime stat (readdir then parallel stat), the jcode summary parse pass (two-phase: parallel fill to scan_limit, then parallel saved-gate over the tail), and the external Codex/pi/opencode stub parsers. - Load the catch-up 'seen' state once (CatchupSeenSnapshot) instead of re-reading catchup_seen.json per session. - Onboarding transcript picker now loads only the relevant external CLI (load_external_cli_sessions_grouped) instead of the full load_sessions_grouped on the UI thread. - Catch-up picker now opens from cache and refreshes off-thread via the shared async picker-load path instead of blocking the live session. Measured on real data (idle, 4 runs each): load_sessions ~660ms -> ~434ms (~34%) load_sessions_grouped ~685ms -> ~465ms (~32%) onboarding picker load ~685ms (UI thread) -> ~14ms scoped CLI load
Minor bump covering the 44 commits since v0.21.0, including: - Eager token-by-token reasoning streaming and per-line multi-line thinking rendering in the TUI. - Provider fixes: Gemini schema/thought_signature handling, Kimi reasoning_content, OpenRouter empty-message guard, Anthropic 1M context + split-cache cost accounting, API-key vs OAuth auth mode. - Swarm: route messages by target, broadcast to whole swarm, inherit coordinator model/auth route on spawn. - Self-dev reload correctness (daemon reloads into advertised binary), reload-trace OOM cap, and provider-doctor generic native suites. - Served telemetry dashboard with accurate user/install metrics and /skills + endorsed NVIDIA CUDA-X skills.
Add Anthropic's official frontend-design skill (the best design-focused agent skill) to the endorsed list under a new 'Anthropic Design' category, sourced from github.com/anthropics/skills with an install hint.
The guided first-run onboarding flow auto-imports existing external CLI logins (Claude/Codex/Gemini/Copilot/Cursor/OpenRouter) via run_external_auth_auto_import_candidates, which bypasses the manual pending_login path that record_auth_success was wired into. As a result every auto-imported login -- the happy path of the new onboarding -- was invisible to the activation funnel, making auth_success undercount badly (observed: more users reaching first_assistant_response than auth_success in post-0.17 install cohorts, which is impossible without auth). Surface coarse (provider, method="import") telemetry labels from the import outcome and record auth_success for each imported provider in both the onboarding and manual /login auto-import callers. Domain logic in jcode-app-core stays telemetry-free; the TUI layer emits the event, matching existing call sites.
quangdang46
added a commit
that referenced
this pull request
Jun 5, 2026
* fix(provider): keep API-key vs OAuth auth mode across the two persisted provider_key vocabularies Claude (and OpenAI) sessions could silently shift from an API key onto the OAuth subscription. Root cause: two divergent provider_key vocabularies persist into sessions, and the session-reconstruction helpers only understood one of them. - The structured model-route picker (RPC) persists RuntimeKey::stable_id() values: claude-oauth / anthropic-api-key / openai-oauth / openai-api-key. - The legacy /model + login path persists: claude / claude-api / openai / openai-api. model_switch_request_for_session_model and session_provider_key_matches_provider_name only matched the legacy vocabulary. A session whose provider_key was 'anthropic-api-key' (without a separately-persisted route_api_method, e.g. a forked/child/ambient/ overnight session) therefore reconstructed a bare model with no auth prefix, leaving the Anthropic provider in Auto mode -- which now prefers OAuth (commit 00e9b9f) -- silently moving an API-key user onto the subscription. Fix: - Add canonical_session_provider_key() to fold the picker vocabulary back onto the canonical keys, and apply it in the reconstruction/match helpers so either vocabulary recovers the exact OAuth-vs-API-key route. - Carry route_api_method alongside provider_key when copying a parent session to a child (ambient, overnight, fork, selfdev, crash recovery) so children reconstruct the full route even without the canonicalizer. Adds a regression test proving anthropic-api-key/openai-api-key/-oauth provider keys preserve the auth route without route_api_method. * telemetry: add served dashboard with accurate 'total users' headline The worker previously only accepted POST /v1/event; there was no visual dashboard (just SQL files run by hand). Add a real one. Headline metric (users.sql + stats.js): total_users = distinct, non-CI telemetry_id that ever installed OR did meaningful work. Validated with sqlite edge-case repros (install-only, turn_end-only with lost session_end, empty open/close, CI). Reported alongside broader tiers (reached) and narrower tiers (core, installed) plus raw CI-inclusive totals so no signal is removed. - src/stats.js: read-only aggregation (counts only, never raw rows) over users, DAU/WAU/MAU rollup, installs, D7 retention, engagement quality, per-turn, errors, feature adoption, transport, version/os/channel/ provider/auth/onboarding breakdowns, 60d timeseries, recent feedback. One shared MEANINGFUL_SQL predicate so every window agrees. - src/worker.js: GET / serves the dashboard, GET /v1/stats serves JSON gated behind DASHBOARD_TOKEN (deny-by-default), POST /v1/event unchanged. CORS widened to GET. - src/dashboard.js: self-contained HTML/CSS/inline-SVG dashboard (no CDN, works under Cloudflare). Tiered layout: hero total-users number, active funnel + chart, 'how the number is built' transparency band, then acquisition/retention, engagement, reliability, breakdowns, features, feedback. Importance shown via hero/key tags/muted diagnostics. - README + package.json: dashboard usage, DASHBOARD_TOKEN setup, npm run users; type:module to silence ESM warning. Validated: node --check on all modules, getStats end-to-end against a seeded sqlite D1 shim (total_users=3 with CI excluded), and rendered in a real browser (token gate + every section + charts). * test(provider-doctor): cover multi-call thought_signature replay The native tool smoke only ever drove a single tool-call round-trip, so it always replayed exactly one thought_signature and passed even when an earlier function call would drop its signature. The Antigravity/Cloud Code backend validates *every* functionCall in the replayed history, so the field 400 ("Function call is missing a thought_signature ... position N") only reproduces with a multi-call transcript. - Extend run_live_native_provider_tool_smoke into two phases: the historical single round-trip (gating) plus a best-effort multi-call replay that rebuilds a history of two assistant tool_use blocks, each carrying its own signature. - Delegate run_live_antigravity_native_tool_smoke to the shared probe so Antigravity (the runtime that hit this) gets the multi-call coverage too. - Add an always-on unit guard (build_contents_replays_every_signature_across_ multi_tool_history) so the serialization regression is caught for free, without spending live tokens. * telemetry dashboard: restore all legacy metrics + redesign with frontend-design skill Two things the prior dashboard commit missed. 1) Restore every metric the old SQL surface (README queries, health.sql, dau.sql) exposed that had been dropped: - os/arch platform breakdown (was os-only) - session starts by UTC hour (usage-timing histogram) - pipeline-health diagnostics: lifecycle_ids, session_start_ids, lifecycle_ids_without_install, heaviest/top5/total session events - meaningful_sessions_30d count stats.js gains hours, arch, health, skew, meaningfulSessions queries; all validated end-to-end against a seeded sqlite D1 shim. 2) Redesign dashboard.js using the installed anthropics/frontend-design skill. The previous version used system fonts and the exact purple-gradient-on-dark the skill warns against. New 'Terminal Observatory' aesthetic, true to jcode being a CLI agent: JetBrains Mono instrument typography (Sora for prose), warm phosphor-amber signal color with a single cyan accent, scanline texture, station- clock hero number, numbered hairline section dividers, KEY/alert accent rails, an amber UTC-hour bar histogram, and a filled cyan active-users area chart. Tiered HEADLINE/SIGNAL/DIAGNOSTIC layout so the total-users number dominates while every figure stays visible. Verified in a real browser: token gate, hero, all 8 sections, both chart types render correctly. node --check passes on all modules. * feat(skills): endorse NVIDIA CUDA-X skills from official NVIDIA/skills catalog Add NVIDIA's CUDA-X / GPU accelerated-computing agent skills (cuOpt, cuPyNumeric, cuDF, CUDA-Q, and the cuTile/TileGym GPU-dev skill) to the endorsed-skills list, sourced from the official NVIDIA-verified catalog at github.com/NVIDIA/skills. - EndorsedSkill gains category + optional install hint fields. - /skills now groups endorsed skills by category with per-category installed counts and shows the 'npx skills add nvidia/skills' install command for missing skills, plus the catalog URL. - Tests cover the new fields and the NVIDIA catalog entries. * test(reload): prove normal-user /update upgrades the daemon end-to-end Adds an integration test that drives the REAL update-detection core (newer_binary_available, the function behind server_has_update) and the reload-target resolver after a normal (non-self-dev) /update channel swap. Models a shipped user: shared-server tracking stable, daemon running the old release, /update installs a newer release and advances stable/current/ shared-server. Asserts both that the old daemon reports an update and that it reloads into the freshly installed release. This documents that normal users are covered by advance_shared_server_if_tracking_stable + the cross-flavor reload target. * telemetry dashboard: add user leaderboard, token usage, agent autonomy, full active tiers Surface every remaining signal the original SQL surface (and the schema) exposed but the dashboard had not yet shown: - User leaderboard (sec 09): top 20 anonymous ids by lifecycle volume with sessions/turns/tokens/tool_calls, version and last-seen. CI and non-release ids are tagged and dimmed (the old 'Heavy telemetry IDs' query, made visual). - Token usage (sec 04): full breakdown - input/output/cache_read/ cache_creation/total, both 30d and all-time (was a single combined number). - Agent autonomy (sec 05): spawned agents, subagent/swarm/background tasks + successes, user cancellations, and where agent time goes (active/model/tool/blocked/idle), time-to-first-action, avg max concurrency. These schema columns were never surfaced before. - Active-user tiers: DAU/WAU/MAU now show meaningful + raw subvalues, not just the headline number. - Engagement: added time-to-first-tool-success. stats.js gains tokens, agent, and leaderboard queries (26 queries total, all validated against the real schema via a seeded sqlite D1 shim). dashboard.js renumbered to 11 sections with a new leaderboardPanel renderer and CI/dev tag styling. Verified end-to-end in a real browser: all sections, the leaderboard table, and both chart types render. * fix(gemini): tolerate generateContent candidate content without role/parts Live multi-call provider-doctor against gemini-3.1-pro-high surfaced a real decode abort: the Antigravity/Cloud Code generateContent response occasionally omits `role` (and sometimes `parts`) on a candidate's `content`, but the struct required `role`, so the whole turn failed with "missing field `role`". The response-side role is never read, so default both fields rather than aborting. Adds two decode regression tests. * test(provider-doctor): drive a real multi-call signature-replay loop The first cut of the multi-call phase only nudged a 2nd tool call after the model had already answered, so live runs reported multi_tool_replay=skipped and never actually exercised the multi-functionCall history. Replace it with an agentic loop driven by a two-file read prompt: each emitted tool call is replayed (carrying its captured thought_signature) and answered with a synthetic result, so by the final turn we send two assistant functionCall blocks and assert the backend accepts the transcript. Surface the verified/skipped status in the doctor report detail. Verified live: provider-doctor antigravity -m gemini-3.1-pro-high --tier full now reports 'multi-call signature replay verified'. * perf(resume): parallelize session loading; scope onboarding picker Cold /resume and onboarding/catch-up pickers were dominated by serial per-file IO+JSON parsing over large session histories (87k jcode snapshots + hundreds of Codex/Claude transcripts here). - Add a bounded scoped-thread parallel_map helper in the session picker loader and use it for: candidate mtime stat (readdir then parallel stat), the jcode summary parse pass (two-phase: parallel fill to scan_limit, then parallel saved-gate over the tail), and the external Codex/pi/opencode stub parsers. - Load the catch-up 'seen' state once (CatchupSeenSnapshot) instead of re-reading catchup_seen.json per session. - Onboarding transcript picker now loads only the relevant external CLI (load_external_cli_sessions_grouped) instead of the full load_sessions_grouped on the UI thread. - Catch-up picker now opens from cache and refreshes off-thread via the shared async picker-load path instead of blocking the live session. Measured on real data (idle, 4 runs each): load_sessions ~660ms -> ~434ms (~34%) load_sessions_grouped ~685ms -> ~465ms (~32%) onboarding picker load ~685ms (UI thread) -> ~14ms scoped CLI load * chore(release): bump version to 0.22.0 Minor bump covering the 44 commits since v0.21.0, including: - Eager token-by-token reasoning streaming and per-line multi-line thinking rendering in the TUI. - Provider fixes: Gemini schema/thought_signature handling, Kimi reasoning_content, OpenRouter empty-message guard, Anthropic 1M context + split-cache cost accounting, API-key vs OAuth auth mode. - Swarm: route messages by target, broadcast to whole swarm, inherit coordinator model/auth route on spawn. - Self-dev reload correctness (daemon reloads into advertised binary), reload-trace OOM cap, and provider-doctor generic native suites. - Served telemetry dashboard with accurate user/install metrics and /skills + endorsed NVIDIA CUDA-X skills. * feat(skills): endorse Anthropic frontend-design skill Add Anthropic's official frontend-design skill (the best design-focused agent skill) to the endorsed list under a new 'Anthropic Design' category, sourced from github.com/anthropics/skills with an install hint. * fix(onboarding): record auth_success for auto-imported logins The guided first-run onboarding flow auto-imports existing external CLI logins (Claude/Codex/Gemini/Copilot/Cursor/OpenRouter) via run_external_auth_auto_import_candidates, which bypasses the manual pending_login path that record_auth_success was wired into. As a result every auto-imported login -- the happy path of the new onboarding -- was invisible to the activation funnel, making auth_success undercount badly (observed: more users reaching first_assistant_response than auth_success in post-0.17 install cohorts, which is impossible without auth). Surface coarse (provider, method="import") telemetry labels from the import outcome and record auth_success for each imported provider in both the onboarding and manual /login auto-import callers. Domain logic in jcode-app-core stays telemetry-free; the TUI layer emits the event, matching existing call sites. --------- Co-authored-by: jeremy <94247773+1jehuang@users.noreply.github.com> Co-authored-by: quangdang46 <quangdang46@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closed in favor of #394 (same changes with resolved merge conflicts and review fixes).