Skip to content

feat: add comprehensive ROCm GPU metrics support#12

Merged
simonCatBot merged 14 commits intomasterfrom
feature/gpu-metrics
Mar 31, 2026
Merged

feat: add comprehensive ROCm GPU metrics support#12
simonCatBot merged 14 commits intomasterfrom
feature/gpu-metrics

Conversation

@simonCatBot
Copy link
Copy Markdown
Owner

This PR adds detailed AMD GPU monitoring using ROCm tools.

Features

  • Automatic ROCm Detection: Finds rocminfo/rocm-smi in PATH or /opt/rocm
  • Detailed GPU Information:
    • GPU name, vendor, marketing name
    • GFX version (e.g., gfx1151)
    • Compute units count
    • Max clock frequency
  • Real-time Metrics:
    • GPU usage percentage
    • Temperature
    • VRAM usage (total/used)
    • Power consumption (watts)
    • Clock speed
  • Enhanced UI: New GpuMetricsPanel component replaces Processes section

Changes

New Files

  • src/lib/system/rocm.ts - ROCm detection and metrics module
  • src/components/GpuMetricsPanel.tsx - GPU metrics UI component

Modified Files

  • src/app/api/system/metrics/route.ts - Integrates ROCm data with fallback
  • src/components/SystemMetricsDashboard.tsx - Shows GPU details instead of processes

Screenshots

The GPU panel displays:

  • GPU usage with color-coded alerts (red at 80%+)
  • Temperature with alerts at 85°C+
  • VRAM usage bar
  • Clock speed in MHz/GHz
  • Power consumption in watts
  • Compute units
  • ROCm badge and metadata

Testing

Tested on AMD RYZEN AI MAX+ PRO 395 with Radeon 8060S.

If ROCm is not installed, falls back to systeminformation library.

Security Notes

  • Requires rocminfo/rocm-smi to be executable by the rocCLAW process
  • No elevated permissions needed
  • Read-only access to GPU metrics

@simonCatBot simonCatBot force-pushed the feature/gpu-metrics branch 4 times, most recently from 67cc0d3 to b884efe Compare March 30, 2026 19:14
This commit adds detailed AMD GPU monitoring using ROCm tools:

- Detects rocminfo/rocm-smi in PATH or /opt/rocm
- Extracts GPU details from rocm-smi -a including:
  - Device ID, Driver version, VBIOS version, Device revision
  - Subsystem ID, GUID, PCI Bus address
  - GFX version, compute units, max/current clock speeds
  - GPU usage %, temperature, VRAM usage, power consumption
- Creates GpuMetricsPanel component for detailed GPU display
- Updates SystemMetricsDashboard with prominent GPU card:
  - GPU card with visual usage and VRAM progress bars
  - Hardware details section showing Device ID, Driver, VBIOS
  - Shows temperature, current clock speed, power stats
  - Removed System Info section
- Falls back to systeminformation library if ROCm not available

The GPU card now displays:
- Device ID (e.g., 0x1586)
- Driver version (e.g., 6.17.0-1012-oem)
- VBIOS version (e.g., 113-STRXLGEN-001)
- Device revision, Subsystem ID
- PCI Bus address
- GPU usage % and VRAM usage % with progress bars
- Current clock speed (updated from rocm-smi)
- Temperature, power consumption
- GFX version, compute units

Files added:
- src/lib/system/rocm.ts: ROCm detection and metrics module
- src/components/GpuMetricsPanel.tsx: GPU metrics UI component

Files modified:
- src/app/api/system/metrics/route.ts: integrate extended ROCm data
- src/components/SystemMetricsDashboard.tsx: new layout with GPU details
@simonCatBot simonCatBot force-pushed the feature/gpu-metrics branch from b884efe to 191efb4 Compare March 30, 2026 19:16
@simonCatBot
Copy link
Copy Markdown
Owner Author

/rerun

@simonCatBot simonCatBot merged commit 5b6ce88 into master Mar 31, 2026
4 checks passed
@simonCatBot simonCatBot deleted the feature/gpu-metrics branch March 31, 2026 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant