Skip to content

fix(telemetry): Telemetry command-run instrumentation pass (#1446, #1447, #1448, #1439)#71

Draft
aidandaly24 wants to merge 1 commit into
mainfrom
fix/1446-1447
Draft

fix(telemetry): Telemetry command-run instrumentation pass (#1446, #1447, #1448, #1439)#71
aidandaly24 wants to merge 1 commit into
mainfrom
fix/1446-1447

Conversation

@aidandaly24

Copy link
Copy Markdown
Owner

Refs aws#1446
Refs aws#1447
Refs aws#1448
Refs aws#1439

Issues

Root cause

verified above

Feature request to instrument eval/observability commands. Telemetry is only emitted via withCommandRunTelemetry/runCliCommand -> recordCommandRun -> client.emit('cli.command_run',...) (src/cli/telemetry/cli-command-run.ts:41). COMMAND_SCHEMAS already defines shapes (src/cli/telemetry/schemas/command-run.ts: 'run.eval':245, 'pause.online-eval'/'resume.online-eval':258-259, 'pause.online-insights'/'resume.online-insights':260-261, 'traces.list'/'traces.get':262-263), but handlers were never wired: run/command.tsx:170 calls handleRunEval directly; TUI RunEvalFlow.tsx:149 unwrapped; pause/command.tsx registerOnlineEvalSubcommand(l.50)/registerOnlineInsightsSubcommand(l.141) call handlePauseResume directly (only registerABTestSubcommand l.85 records); traces/command.tsx:35,107 call handlers directly. logs/command.tsx:35,64 already wrap, so 'logs' is done. No commit/PR references aws#1447.

Telemetry is opt-in per command; the fetch.access schema (command-run.ts:173,255) was scaffolded but neither emit site (CLI command.tsx action lines 21-95, TUI useFetchAccessFlow.ts) was ever wired, so cli.command_run is never recorded for fetch. Verified: no telemetry import/call in either file, no global Commander hook in cli.ts, and 'fetch.access' is never passed to any emit/record/wrapper outside the schema definition.

process.exit in CLI command actions bypasses main finally that runs shutdown.

The fix

Add a config entry to COMMAND_SCHEMAS in src/cli/telemetry/schemas/command-run.ts -- a small ConfigAttrs with a config_action enum ('get'|'list'|'set') derived from key/value presence (do NOT log key or value: PII/secret risk). Then wrap the action in src/cli/commands/config/command.ts:26-30 as const result = await withCommandRunTelemetry('config', attrs, () => resolveAction(key,value)()); keeping the existing printResult()/process.exit(1). No design decision is actually needed: withCommandRunTelemetry already handles {success:false} Results natively (cli-command-run.ts:107-118 routes !result.success through classifyError(result.error) to record exit_reason='failure'), and the config handlers in actions.ts all return ConfigResult unions and never throw -- so the failure path is captured automatically. Mirror the status command which uses this exact pattern (status/command.tsx:95-98). This satisfies both DoD items (schema + instrumentation).

Mechanical wiring; schemas already exist. (1) run/command.tsx run-eval action (~l.170): wrap handleRunEval in withCommandRunTelemetry('run.eval', {evaluator_count, ref_type, has_assertions, has_expected_trajectory, has_expected_response}) derived from options; mirror in RunEvalFlow.tsx:149 for the TUI path. (2) pause/command.tsx + resume/command.tsx: wrap handlePauseResume in registerOnlineEvalSubcommand/registerOnlineInsightsSubcommand with withCommandRunTelemetry('pause.online-eval'|'resume.online-eval'|'pause.online-insights'|'resume.online-insights', {ref_type}) — both files share these factory functions, so one edit covers pause+resume. (3) traces/command.tsx list/get (~l.35, l.107): wrap handleTracesList/handleTracesGet in withCommandRunTelemetry('traces.list'|'traces.get', {}), optionally upgrade NoAttrs (command-run.ts:262-263) to a small shape (has_runtime/has_since/limit). 'logs' is already done. Design decision: confirm with the author whether 'logs' refers to logs/logs.evals (already instrumented) and whether traces warrants richer attrs than NoAttrs.

Instrument both fetch entry points with the existing helpers, mirroring validate (src/cli/commands/validate/command.tsx:14 uses withCommandRunTelemetry('validate', {}, ...)). (1) CLI: in src/cli/commands/fetch/command.tsx wrap the handleFetchAccess flow in runCliCommand('fetch.access', !!options.json, async () => { ...; return { resource_type: standardize(ResourceType, options.type ?? 'gateway') }; }) — runCliCommand fits because the action owns its own process.exit calls (lines 34, 63, 69); note the action's current early-return-without-exit on JSON success (lines 68-69) means the body needs minor restructuring to return attrs through the wrapper rather than falling through. (2) TUI: in src/cli/tui/screens/fetch-access/useFetchAccessFlow.ts wrap the fetch operation in withCommandRunTelemetry('fetch.access', { resource_type: standardize(ResourceType, resource.resourceType) }, ...) at the fetch call site (~lines 106-133). standardize is exported at common-shapes.ts:20. No schema change needed — fetch.access/FetchAccessAttrs already exist.

Move audit message into flush and add shared finalize before process.exit.

Files touched: src/cli/commands/config/command.ts (action at lines 26-30, wrap resolveAction in withCommandRunTelemetry); src/cli/telemetry/schemas/command-run.ts (add a ConfigAttrs near line 209 and a config: key into COMMAND_SCHEMAS ~215-302); src/cli/telemetry/schemas/common-shapes.ts (add a ConfigAction enum used by ConfigAttrs)

src/cli/commands/run/command.tsx (run-eval action ~line 170); src/cli/tui/screens/run-eval/RunEvalFlow.tsx (~line 149); src/cli/commands/pause/command.tsx (registerOnlineEvalSubcommand ~line 50, registerOnlineInsightsSubcommand ~line 141 — shared with resume via the same factory functions); src/cli/commands/resume/command.tsx (registerResume reuses those factories); src/cli/commands/traces/command.tsx (list ~line 35, get ~line 107); optional attr-shape upgrade in src/cli/telemetry/schemas/command-run.ts lines 262-263. All schema slots already exist.

src/cli/commands/fetch/command.tsx (the .action handler, lines 21-95 — add runCliCommand wrapper, restructure to return resource_type attrs); src/cli/tui/screens/fetch-access/useFetchAccessFlow.ts (add withCommandRunTelemetry wrapper at the fetch-operation call site, ~lines 106-133). No change needed in src/cli/telemetry/schemas/command-run.ts — fetch.access/FetchAccessAttrs already exist (lines 173, 255).

src/cli/telemetry/sinks/filesystem-sink.ts and src/cli/telemetry/cli-command-run.ts

Validation evidence

The fix was verified by reproducing the original symptom and re-running after the change:

Built OK first try -> dist/cli/index.mjs. Reproduced BOTH halves of the symptom by rebuilding the ORIGINAL (stashed-fix) binary and the FIXED binary and diffing behavior with AGENTCORE_TELEMETRY_AUDIT=1.

(1) Finalize ordering / dropped tail output, on the runCliCommand path agentcore feedback "test feedback acfix-1446-1447" --json: ORIGINAL build wrote the cli.command_run event into the audit .jsonl (attrs: command=feedback, exit_reason=failure) but printed NO [audit mode] Telemetry written to ... line to stdout, because process.exit(1) in runCliCommand fired before TelemetryClientAccessor.shutdown(). FIXED build records the SAME event AND prints [audit mode] Telemetry written to /tmp/audit-fix-fb/.agentcore/telemetry/feedback-...jsonl before exiting; exit code preserved. The shutdown+notices now run inside finalizeAndExit() which executes prior to process.exit (cli.ts registers postCommandFinalize; runCliCommand/config/fetch/pause/run/traces all return finalizeAndExit(code)).

(2) Coverage: ORIGINAL agentcore config telemetry.enabled false created NO telemetry dir at all (zero cli.command_run). FIXED records exactly one event per invocation: config (no key) -> config_action=list, exit_reason=success; config telemetry.enabled (unset get) -> config_action=get, exit_reason=failure; config telemetry.enabled false -> config_action=set, exit_reason=success. Inspected attrs: only the derived config_action is recorded, NO key/value PII (PII keys present: []). Success/failur

Test suite: green.


Staged on the fork as a draft for human review. Promote to aws/agentcore-cli after vetting.

…39) + command_run instrumentation for config (1446), run-eval/pause/resume online-eval+online-insights/traces (1447), and the fetch-access TUI path (1448). fetch-access CLI and run.job/ab-test were already instrumented at HEAD.
@github-actions github-actions Bot added the size/m PR size: M label Jun 25, 2026
@github-actions

Copy link
Copy Markdown

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 37.18% 13609 / 36597
🔵 Statements 36.45% 14468 / 39688
🔵 Functions 31.83% 2339 / 7348
🔵 Branches 31.09% 9002 / 28948
Generated in workflow #125 for commit 4c9dec5 by the Vitest Coverage Report Action

@github-actions github-actions Bot added agentcore-harness-reviewing AgentCore Harness review in progress and removed agentcore-harness-reviewing AgentCore Harness review in progress labels Jun 25, 2026
@aidandaly24 aidandaly24 changed the title fix(cli): Telemetry command-run instrumentation pass: wrap the un-i... (#1446, #1447, #1448, #1439) fix(telemetry): Telemetry command-run instrumentation pass (#1446, #1447, #1448, #1439) Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant