Skip to content

Merge PR #392: Fix API-key and OAuth auth mode persistence + review fixes#394

Merged
quangdang46 merged 14 commits into
masterfrom
merge/pr-392
Jun 5, 2026
Merged

Merge PR #392: Fix API-key and OAuth auth mode persistence + review fixes#394
quangdang46 merged 14 commits into
masterfrom
merge/pr-392

Conversation

@quangdang46
Copy link
Copy Markdown
Owner

This PR replaces #392 with resolved merge conflicts and review fixes.

Changes from #392 (1jehuang):

  • Fix API-key and OAuth auth mode persistence across providers
  • Self-dev daemon reloads into the advertised binary
  • Swarm translates runtime provider key to valid --provider on spawn
  • Telemetry dashboard with full metrics
  • Provider doctor tests for multi-call signature replay
  • Parallelized session loading
  • Endorsed NVIDIA CUDA-X and Anthropic frontend-design skills

Additional fixes:

  • H1: route_api_method propagation in create_transfer_session_from_parent
  • M1: Log panicked workers in parallel_map instead of silently dropping
  • M2: Remove error detail leakage in telemetry stats endpoint
  • M3: Remove token from URL query string in dashboard

1jehuang and others added 14 commits June 5, 2026 01:11
…ed provider_key vocabularies

Claude (and OpenAI) sessions could silently shift from an API key onto
the OAuth subscription. Root cause: two divergent provider_key
vocabularies persist into sessions, and the session-reconstruction
helpers only understood one of them.

- The structured model-route picker (RPC) persists RuntimeKey::stable_id()
  values: claude-oauth / anthropic-api-key / openai-oauth / openai-api-key.
- The legacy /model + login path persists: claude / claude-api / openai /
  openai-api.

model_switch_request_for_session_model and
session_provider_key_matches_provider_name only matched the legacy
vocabulary. A session whose provider_key was 'anthropic-api-key' (without
a separately-persisted route_api_method, e.g. a forked/child/ambient/
overnight session) therefore reconstructed a bare model with no auth
prefix, leaving the Anthropic provider in Auto mode -- which now prefers
OAuth (commit 00e9b9f) -- silently moving an API-key user onto the
subscription.

Fix:
- Add canonical_session_provider_key() to fold the picker vocabulary back
  onto the canonical keys, and apply it in the reconstruction/match
  helpers so either vocabulary recovers the exact OAuth-vs-API-key route.
- Carry route_api_method alongside provider_key when copying a parent
  session to a child (ambient, overnight, fork, selfdev, crash recovery)
  so children reconstruct the full route even without the canonicalizer.

Adds a regression test proving anthropic-api-key/openai-api-key/-oauth
provider keys preserve the auth route without route_api_method.
The worker previously only accepted POST /v1/event; there was no visual
dashboard (just SQL files run by hand). Add a real one.

Headline metric (users.sql + stats.js): total_users = distinct, non-CI
telemetry_id that ever installed OR did meaningful work. Validated with
sqlite edge-case repros (install-only, turn_end-only with lost
session_end, empty open/close, CI). Reported alongside broader tiers
(reached) and narrower tiers (core, installed) plus raw CI-inclusive
totals so no signal is removed.

- src/stats.js: read-only aggregation (counts only, never raw rows) over
  users, DAU/WAU/MAU rollup, installs, D7 retention, engagement quality,
  per-turn, errors, feature adoption, transport, version/os/channel/
  provider/auth/onboarding breakdowns, 60d timeseries, recent feedback.
  One shared MEANINGFUL_SQL predicate so every window agrees.
- src/worker.js: GET / serves the dashboard, GET /v1/stats serves JSON
  gated behind DASHBOARD_TOKEN (deny-by-default), POST /v1/event
  unchanged. CORS widened to GET.
- src/dashboard.js: self-contained HTML/CSS/inline-SVG dashboard (no CDN,
  works under Cloudflare). Tiered layout: hero total-users number, active
  funnel + chart, 'how the number is built' transparency band, then
  acquisition/retention, engagement, reliability, breakdowns, features,
  feedback. Importance shown via hero/key tags/muted diagnostics.
- README + package.json: dashboard usage, DASHBOARD_TOKEN setup, npm run
  users; type:module to silence ESM warning.

Validated: node --check on all modules, getStats end-to-end against a
seeded sqlite D1 shim (total_users=3 with CI excluded), and rendered in a
real browser (token gate + every section + charts).
The native tool smoke only ever drove a single tool-call round-trip, so it
always replayed exactly one thought_signature and passed even when an earlier
function call would drop its signature. The Antigravity/Cloud Code backend
validates *every* functionCall in the replayed history, so the field 400
("Function call is missing a thought_signature ... position N") only
reproduces with a multi-call transcript.

- Extend run_live_native_provider_tool_smoke into two phases: the historical
  single round-trip (gating) plus a best-effort multi-call replay that rebuilds
  a history of two assistant tool_use blocks, each carrying its own signature.
- Delegate run_live_antigravity_native_tool_smoke to the shared probe so
  Antigravity (the runtime that hit this) gets the multi-call coverage too.
- Add an always-on unit guard (build_contents_replays_every_signature_across_
  multi_tool_history) so the serialization regression is caught for free,
  without spending live tokens.
…end-design skill

Two things the prior dashboard commit missed.

1) Restore every metric the old SQL surface (README queries, health.sql,
   dau.sql) exposed that had been dropped:
   - os/arch platform breakdown (was os-only)
   - session starts by UTC hour (usage-timing histogram)
   - pipeline-health diagnostics: lifecycle_ids, session_start_ids,
     lifecycle_ids_without_install, heaviest/top5/total session events
   - meaningful_sessions_30d count
   stats.js gains hours, arch, health, skew, meaningfulSessions queries;
   all validated end-to-end against a seeded sqlite D1 shim.

2) Redesign dashboard.js using the installed anthropics/frontend-design
   skill. The previous version used system fonts and the exact
   purple-gradient-on-dark the skill warns against. New 'Terminal
   Observatory' aesthetic, true to jcode being a CLI agent: JetBrains
   Mono instrument typography (Sora for prose), warm phosphor-amber
   signal color with a single cyan accent, scanline texture, station-
   clock hero number, numbered hairline section dividers, KEY/alert
   accent rails, an amber UTC-hour bar histogram, and a filled cyan
   active-users area chart. Tiered HEADLINE/SIGNAL/DIAGNOSTIC layout so
   the total-users number dominates while every figure stays visible.

Verified in a real browser: token gate, hero, all 8 sections, both
chart types render correctly. node --check passes on all modules.
…s catalog

Add NVIDIA's CUDA-X / GPU accelerated-computing agent skills (cuOpt,
cuPyNumeric, cuDF, CUDA-Q, and the cuTile/TileGym GPU-dev skill) to the
endorsed-skills list, sourced from the official NVIDIA-verified catalog
at github.com/NVIDIA/skills.

- EndorsedSkill gains category + optional install hint fields.
- /skills now groups endorsed skills by category with per-category
  installed counts and shows the 'npx skills add nvidia/skills' install
  command for missing skills, plus the catalog URL.
- Tests cover the new fields and the NVIDIA catalog entries.
Adds an integration test that drives the REAL update-detection core
(newer_binary_available, the function behind server_has_update) and the
reload-target resolver after a normal (non-self-dev) /update channel swap.

Models a shipped user: shared-server tracking stable, daemon running the old
release, /update installs a newer release and advances stable/current/
shared-server. Asserts both that the old daemon reports an update and that it
reloads into the freshly installed release. This documents that normal users
are covered by advance_shared_server_if_tracking_stable + the cross-flavor
reload target.
…y, full active tiers

Surface every remaining signal the original SQL surface (and the schema)
exposed but the dashboard had not yet shown:

- User leaderboard (sec 09): top 20 anonymous ids by lifecycle volume
  with sessions/turns/tokens/tool_calls, version and last-seen. CI and
  non-release ids are tagged and dimmed (the old 'Heavy telemetry IDs'
  query, made visual).
- Token usage (sec 04): full breakdown - input/output/cache_read/
  cache_creation/total, both 30d and all-time (was a single combined
  number).
- Agent autonomy (sec 05): spawned agents, subagent/swarm/background
  tasks + successes, user cancellations, and where agent time goes
  (active/model/tool/blocked/idle), time-to-first-action, avg max
  concurrency. These schema columns were never surfaced before.
- Active-user tiers: DAU/WAU/MAU now show meaningful + raw subvalues,
  not just the headline number.
- Engagement: added time-to-first-tool-success.

stats.js gains tokens, agent, and leaderboard queries (26 queries total,
all validated against the real schema via a seeded sqlite D1 shim).
dashboard.js renumbered to 11 sections with a new leaderboardPanel
renderer and CI/dev tag styling. Verified end-to-end in a real browser:
all sections, the leaderboard table, and both chart types render.
…parts

Live multi-call provider-doctor against gemini-3.1-pro-high surfaced a real
decode abort: the Antigravity/Cloud Code generateContent response occasionally
omits `role` (and sometimes `parts`) on a candidate's `content`, but the
struct required `role`, so the whole turn failed with "missing field
`role`". The response-side role is never read, so default both fields rather
than aborting. Adds two decode regression tests.
The first cut of the multi-call phase only nudged a 2nd tool call after the
model had already answered, so live runs reported multi_tool_replay=skipped and
never actually exercised the multi-functionCall history. Replace it with an
agentic loop driven by a two-file read prompt: each emitted tool call is
replayed (carrying its captured thought_signature) and answered with a
synthetic result, so by the final turn we send two assistant functionCall
blocks and assert the backend accepts the transcript. Surface the
verified/skipped status in the doctor report detail.

Verified live: provider-doctor antigravity -m gemini-3.1-pro-high --tier full
now reports 'multi-call signature replay verified'.
Cold /resume and onboarding/catch-up pickers were dominated by serial
per-file IO+JSON parsing over large session histories (87k jcode
snapshots + hundreds of Codex/Claude transcripts here).

- Add a bounded scoped-thread parallel_map helper in the session picker
  loader and use it for: candidate mtime stat (readdir then parallel
  stat), the jcode summary parse pass (two-phase: parallel fill to
  scan_limit, then parallel saved-gate over the tail), and the external
  Codex/pi/opencode stub parsers.
- Load the catch-up 'seen' state once (CatchupSeenSnapshot) instead of
  re-reading catchup_seen.json per session.
- Onboarding transcript picker now loads only the relevant external CLI
  (load_external_cli_sessions_grouped) instead of the full
  load_sessions_grouped on the UI thread.
- Catch-up picker now opens from cache and refreshes off-thread via the
  shared async picker-load path instead of blocking the live session.

Measured on real data (idle, 4 runs each):
  load_sessions          ~660ms -> ~434ms (~34%)
  load_sessions_grouped  ~685ms -> ~465ms (~32%)
  onboarding picker load ~685ms (UI thread) -> ~14ms scoped CLI load
Minor bump covering the 44 commits since v0.21.0, including:
- Eager token-by-token reasoning streaming and per-line multi-line
  thinking rendering in the TUI.
- Provider fixes: Gemini schema/thought_signature handling, Kimi
  reasoning_content, OpenRouter empty-message guard, Anthropic 1M
  context + split-cache cost accounting, API-key vs OAuth auth mode.
- Swarm: route messages by target, broadcast to whole swarm, inherit
  coordinator model/auth route on spawn.
- Self-dev reload correctness (daemon reloads into advertised binary),
  reload-trace OOM cap, and provider-doctor generic native suites.
- Served telemetry dashboard with accurate user/install metrics and
  /skills + endorsed NVIDIA CUDA-X skills.
Add Anthropic's official frontend-design skill (the best design-focused
agent skill) to the endorsed list under a new 'Anthropic Design'
category, sourced from github.com/anthropics/skills with an install hint.
The guided first-run onboarding flow auto-imports existing external CLI
logins (Claude/Codex/Gemini/Copilot/Cursor/OpenRouter) via
run_external_auth_auto_import_candidates, which bypasses the manual
pending_login path that record_auth_success was wired into. As a result
every auto-imported login -- the happy path of the new onboarding -- was
invisible to the activation funnel, making auth_success undercount badly
(observed: more users reaching first_assistant_response than auth_success
in post-0.17 install cohorts, which is impossible without auth).

Surface coarse (provider, method="import") telemetry labels from the
import outcome and record auth_success for each imported provider in both
the onboarding and manual /login auto-import callers. Domain logic in
jcode-app-core stays telemetry-free; the TUI layer emits the event,
matching existing call sites.
- Resolve 5 merge conflicts in loading.rs (took master's CASR-based loader)
- Fix H1: route_api_method propagation in create_transfer_session_from_parent
- Fix M1: log panicked workers in parallel_map instead of silently dropping
- Fix M2: remove error detail leakage in telemetry stats endpoint
- Fix M3: remove token from URL query string in dashboard
- Remove orphaned pi session code from conflict resolution
- Update load_external_cli_sessions_grouped to use CASR loader
@quangdang46 quangdang46 merged commit 72fba9e into master Jun 5, 2026
5 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants