Date: 2026-02-08 Scope: Read-only audit of current PerfMon implementation and minimal path to live telemetry.
- PerfMon is mounted in the top header controls in
ui/src/App.tsx:369as<PerfMonPanel isLive={isAgentLive} />. - The header is sticky and high stacking context (
ui/src/App.tsx:265), with a shared tooltip provider wrapping controls (ui/src/App.tsx:267,ui/src/App.tsx:394). isAgentLiveis derived from websocket agent state inui/src/App.tsx:87:wsState.agentStatus === 'running' || wsState.agentStatus === 'paused'
- The panel open state is local in
ui/src/components/PerfMonPanel.tsx:25. - Open/close behavior:
- Button toggles state (
ui/src/components/PerfMonPanel.tsx:58) - Outside click closes (
ui/src/components/PerfMonPanel.tsx:32) Escapecloses (ui/src/components/PerfMonPanel.tsx:38)
- Button toggles state (
- Panel mount behavior:
- Not portaled; rendered as absolute element under button (
ui/src/components/PerfMonPanel.tsx:80) - Uses
z-40inside header context.
- Not portaled; rendered as absolute element under button (
- Tooltip behavior:
- PerfMon button uses Radix tooltip (
ui/src/components/PerfMonPanel.tsx:69) - Tooltip content is portaled in shared wrapper (
ui/src/components/ui/tooltip.tsx:42,ui/src/components/ui/tooltip.tsx:44).
- PerfMon button uses Radix tooltip (
ui/src/hooks/usePerfMonMock.ts returns:
tokenscurrentRuntotalSessionusagePercent
cpupercent
memoryusedtotalpercent
gpuavailablepercentusedtotal
Generation model:
- Deterministic smooth wave + clamp (
const wave,const clamp) inui/src/hooks/usePerfMonMock.ts:3,ui/src/hooks/usePerfMonMock.ts:6. - No random noise spikes.
- GPU availability flips periodically (
ui/src/hooks/usePerfMonMock.ts:53).
Update cadence:
1swhile running,3swhile idle (ui/src/hooks/usePerfMonMock.ts:85).- Interval-based updates (
ui/src/hooks/usePerfMonMock.ts:94).
PerfMonPanelconsumesusePerfMonMock(isLive)directly (ui/src/components/PerfMonPanel.tsx:27).isLiveaffects:- mock update rate (through hook)
- button/panel badge variant and label (
Live/Idle) (ui/src/components/PerfMonPanel.tsx:72,ui/src/components/PerfMonPanel.tsx:83).
Assumption:
- “Agent running” is inferred only from websocket
agentStatusinApp.
- Client hook:
useProjectWebSocket(projectName)inui/src/hooks/useWebSocket.ts:61. - Connects to
/ws/projects/{project_name}(ui/src/hooks/useWebSocket.ts:88). - Current state already tracked in UI hook:
progress,agentStatus,isConnected,activeAgents,orchestratorStatus,devServerStatus(ui/src/hooks/useWebSocket.ts:33,ui/src/hooks/useWebSocket.ts:39,ui/src/hooks/useWebSocket.ts:41,ui/src/hooks/useWebSocket.ts:46,ui/src/hooks/useWebSocket.ts:54).
- Message handlers exist for:
progress(ui/src/hooks/useWebSocket.ts:104)agent_status(ui/src/hooks/useWebSocket.ts:116)orchestrator_update(ui/src/hooks/useWebSocket.ts:284)dev_server_status(ui/src/hooks/useWebSocket.ts:321)
- WebSocket endpoint in
server/main.py:167, delegated toproject_websocket(server/main.py:170). - Handler in
server/websocket.py:719. - Existing push sources:
- progress polling task (
server/websocket.py:685,server/websocket.py:843) - agent output callback (
server/websocket.py:758, registered atserver/websocket.py:810) - agent status callback (
server/websocket.py:795, registered atserver/websocket.py:811) - dev server status callback (
server/websocket.py:827, registered atserver/websocket.py:840)
- progress polling task (
- Agent status endpoint:
- UI call
getAgentStatusatui/src/lib/api.ts:232 - Backend route
server/routers/agent.py:70 - Polling hook every 3s at
ui/src/hooks/useProjects.ts:140,ui/src/hooks/useProjects.ts:145
- UI call
- Features polling every 5s at
ui/src/hooks/useProjects.ts:82,ui/src/hooks/useProjects.ts:87 - Project stats endpoint exists (
/stats) for pass/in-progress totals only (server/routers/projects.py:367).
WSMessageTypeandWSMessageinui/src/lib/types.ts:243,ui/src/lib/types.ts:318do not include perf telemetry fields.
- Prefer existing project WebSocket path over new polling endpoint.
- Reason:
- Current page already keeps one WS open.
- Existing architecture already emits multiple live message types over this channel.
- Lower surface area than adding new endpoint + polling hook + cache invalidation path.
Message type: perf_metrics
{
"type": "perf_metrics",
"timestamp": "2026-02-08T20:15:30.123Z",
"project": "my-project",
"run": {
"status": "running",
"pid": 12345,
"started_at": "2026-02-08T20:10:00Z",
"run_id": "12345-2026-02-08T20:10:00Z"
},
"tokens": {
"current_run": 1420,
"total_session": 9180,
"available": false
},
"cpu": {
"percent": 37.2
},
"memory": {
"used_gb": 6.2,
"total_gb": 16.0,
"percent": 38.8
},
"gpu": {
"available": false,
"percent": null,
"vram_used_gb": null,
"vram_total_gb": null
}
}- UI behavior:
- If no live payload yet, show empty placeholders (
Not available) without errors. - If stale payload (e.g. >10s old), mark as stale and dim values.
- If no live payload yet, show empty placeholders (
- Dev fallback to mock only:
import.meta.env.DEV && import.meta.env.VITE_PERFMON_MOCK !== '0'- Use mock when live payload absent or disabled by backend.
- Production:
- No automatic fake fallback; show unavailable states instead.
- Host-level metrics may expose machine characteristics.
- In remote mode (
AUTOFORGE_ALLOW_REMOTEinserver/main.py:97), exposure risk is higher. - Keep payload coarse and project-scoped, avoid file paths/usernames/process cmdline leakage.
- Ensure project-name validation and existing WS auth/path checks remain the gate.
- Target 1s updates while running, 3s when idle to match current UI cadence expectations.
- Avoid rerender churn:
- do not update PerfMon state if values change minimally
- optionally pause updates when panel is closed (or keep lower-frequency store updates).
- CPU/memory are feasible with existing server dependency
psutil. - GPU/VRAM portability is inconsistent across OS/vendors.
- Contract should support
gpu.available=falseand null GPU values.
- Add perf telemetry message types to shared UI contracts in
ui/src/lib/types.ts(WSPerfMetricsMessage, union update). - Extend websocket UI state in
ui/src/hooks/useWebSocket.tswithperfMetricsand addcase 'perf_metrics'. - Add backend schema class(es) in
server/schemas.pyfor perf payload (optional but recommended for contract clarity). - In
server/websocket.py, add a lightweight perf sampling loop insideproject_websocketthat emitsperf_metricsat 1s/3s cadence. - Source initial live fields from existing manager state:
status,pid,started_atfrom process manager already used inserver/routers/agent.py:78.- CPU/memory from
psutil. - GPU optional/null when unavailable.
- Update
ui/src/components/PerfMonPanel.tsxto consume live metrics from WS first, with mock fallback in dev only. - Add UI stale/unavailable states and keep existing
Live/Idlebadge behavior. - Add a focused test for panel rendering with telemetry payload shape (or hook-level test), without adding a new test framework.
Preferred scope: add server websocket perf message + UI wiring in one small PR.
Rationale:
- Delivers true live data immediately.
- Reuses established transport and state patterns already central to this screen.
- Keeps diff localized to websocket contract + PerfMon consumption code, avoiding new endpoint and polling complexity.