System Trace Dashboard

Real-time infrastructure monitoring dashboard with agent-based metrics, host monitoring, inventory management, and system logging.

Backend: FastAPI + SQLite + psutil
Frontend: Static HTML/CSS/JS + Chart.js (CDN)
Agents: Lightweight Python agents for Linux hosts (Ubuntu, Debian, RHEL, CentOS, Rocky Linux)
Anomaly detection: Rolling z-score over recent metric history

Project Structure

ai-system-health-dashboard/
├── agents/                  # Per-OS agent scripts (centos/debian/rhel/rocky/ubuntu)
├── app/                     # FastAPI backend + static frontend
│   ├── static/
│   │   ├── assets/          # Per-page CSS + JS
│   │   └── *.html
│   └── *.py                 # main, models, storage, auth, config, anomaly, metrics, protocols
├── data/                    # Runtime DBs + logs (gitignored)
├── docs/
│   ├── guides/              # All .md deployment/fix/config guides
│   └── screenshots/         # README screenshots
├── scripts/
│   ├── deploy/              # Per-host deploy_*.sh scripts
│   ├── fix/                 # fix_*.sh / fix_*.py scripts
│   ├── util/                # One-off utility/migration scripts
│   ├── manage_users.py
│   ├── take_screenshots.py
│   └── ...
├── .env.example
├── .gitignore               # Now also ignores data/*.json, cookies.txt, scratch scripts
├── README.md                # Updated with Project Structure section + correct paths
└── requirements.txt

UI Pages

Page	URL	Description
Dashboard	`/`	Overview, Problems, Hosts, Maps
Host Monitor	`/host?id=<id>`	Per-host real-time charts (CPU, RAM, Disk, Network, GPU)
Overview	`/overview`	System overview
Inventory	`/inventory`	Asset/equipment tracking (rack, shelf, S/N)
Configuration	`/configuration`	System configuration
Hosts Management	`/hosts`	Discovered hosts and agent deployment
System Logs	`/logs`	Filterable system event log
User Management	`/users`	User accounts
User Groups	`/user-groups`	Group-based access control

Screenshots

Login

Dashboard

Hosts

Maps

Inventory

System Logs

Users

Host Monitor

Changelog

Feb 22, 2026

Network Device Discovery & Icons

Enhanced auto-discovery (/api/discovery/start) to classify device types using SNMP sysDescr keyword matching and TCP port fingerprinting
Detected device types: rack-server, switch, router, firewall, patch-panel, linux, windows, network, other
dashboard.js: Added DEVICE_TYPE_META map — device type icons and colours shown in host table and SVG map nodes
hosts.html: Edit modal type dropdown updated with all new device types and icons
index.html: Host type datalist updated with new device types
maps.html: Rebuilt with dynamic SVG topology map and device inventory grid; auto-refreshes every 30 seconds

User Management

Users page (/users): Action buttons split into separate columns — Edit | Reset Password | Delete | Activate
Reset Password button: Opens a modal to set a new password for any user (admin only); passwords are hashed correctly
Fixed editUser onclick — was passing user ID instead of full user object to modal
Fixed editGroup onclick — same bug fixed for user groups page
Fixed create_user API — was using broken pbkdf2_hmac with hardcoded salt; now uses hash_password() so created users can log in
Fixed update_user API — same password hashing bug fixed; changing a user's password via UI now works correctly

User Groups

Fixed HTTP 400 on save — allowed_hosts list was passed raw to SQLite; now JSON-serialised before storage

Role-Based Sidebar

dashboard.js: Non-admin (user role) users no longer see Users and User groups sidebar links on any page
If those are the only items in the Administration group, the entire group header is hidden too
Applied globally via dashboard.js — no per-page changes needed

Agent & Host Monitor

Fixed maps.html CSS/JS structure — .topoMenuItem styles were inside <script> tag causing lint errors
host.js: Added 60-second grace period on page load before flagging "Agent offline" as critical — prevents false alarm before first agent report
Started and enabled system-trace-agent systemd service (was installed but inactive)

Feb 22, 2026 (later)

Auto-seed Default Admin User

main.py: On startup, if the users table is empty, the app automatically creates a default admin / admin account
A warning is logged reminding the operator to change the password after first login
Ensures a fresh clone + run is immediately usable without manual DB setup

Restore & Recovery Notes

.env is gitignored — after a fresh clone, copy .env.example to .env and set SESSION_SECRET_KEY
data/auth.db users are not committed — on first run the app now auto-creates admin/admin
To add more users after setup: python scripts/manage_users.py create --username <name> --role admin --password <pass>

Key Features

Host Monitor (`/host?id=<id>`)

Real-time charts updating every ~3 seconds (CPU, RAM, Disk, Network, GPU)
Frontend rolling ring buffer — 120-point history (2 min at 1s polling)
Multi-GPU support — per-GPU utilization %, VRAM used/total, temperature tiles and charts (via nvidia-smi)
Problem detection — header blinks red when CPU ≥ 75%, RAM ≥ 80%, Disk ≥ 85%, GPU ≥ 85%, or agent offline
Per-host error history panel — shows active problems + last 50 log entries for that host
Protocol checks: ICMP, HTTP, HTTPS, SSH, DNS, SNMP, NTP

System Logs (`/logs`)

Auto-logs threshold breaches from agent metrics (CPU, RAM, Disk, GPU util/temp)
Frontend also reports problems to /api/logs when detected on the Host Monitor page
Filter by level (Critical / Warning / Info) and hostname
Summary tiles: Critical / Warning / Info / Total counts
Auto-refreshes every 5 seconds
Clear all logs button (admin)

Inventory (`/inventory`)

Track servers, racks, switches, PDUs, and other equipment
Fields: Name, Category, Rack, Shelf/Unit, S/N (serial number), Qty, Notes
Summary tiles: Total Items, Categories, Racks, Last Refresh
Search across all fields
Admin-only add/remove

Agents

Post metrics every 3 seconds (down from 30s)
CPU sampling interval: 0.5s
GPU metrics via nvidia-smi — multi-GPU list with percent, VRAM, temperature
Supported: Ubuntu, Debian, RHEL, CentOS, Rocky Linux

System Log API

GET /api/logs — list logs, filterable by level and hostname
POST /api/logs — write a log entry
DELETE /api/logs — clear all logs (admin)
GET /api/logs/host/{hostname} — per-host log history
Stored in data/system_logs.json, rolling cap of 2000 entries

Quickstart

1) Create a virtualenv + install deps

cd ai-system-health-dashboard
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2) Run

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

3) Open

Dashboard: http://localhost:8000/
UI examples: http://localhost:8000/static/ui-examples.html
API docs: http://localhost:8000/docs

You can also set UI preferences via URL params:

/?theme=light&layout=stack&density=compact

Configuration

Environment variables (optional):

SAMPLE_INTERVAL_SECONDS (default: 1.0)
HISTORY_SECONDS (default: 600)
ANOMALY_WINDOW_SECONDS (default: 120)
ANOMALY_Z_THRESHOLD (default: 3.0)

SQLite persistence:

METRICS_DB_PATH (default: data/metrics.db) — set to an empty string to disable persistence
SQLITE_RETENTION_SECONDS (default: 86400) — how long to keep samples in SQLite (set 0 to keep forever)

Example:

SAMPLE_INTERVAL_SECONDS=0.5 HISTORY_SECONDS=900 SQLITE_RETENTION_SECONDS=172800 uvicorn app.main:app --reload

Authentication (login)

This dashboard includes a simple login system (cookie sessions) with a SQLite users database.

Required env var

SESSION_SECRET_KEY — required. Set this to a long random string.
- A starter .env file is included (gitignored). Edit it before running.

Optional env vars

AUTH_DB_PATH (default: data/auth.db)
SESSION_MAX_AGE_SECONDS (default: 86400)
SESSION_COOKIE_NAME (default: system-trace_session)
SESSION_COOKIE_SAMESITE (default: strict)
SESSION_COOKIE_SECURE (default: 0) — set to 1 only when serving over HTTPS

Roles

viewer: can view dashboard + non-admin APIs
admin: can also call /api/admin/* endpoints

Inventory (assets/equipment)

This repo includes a simple Inventory page for tracking generic assets (e.g. switches, PDUs, storage systems).

Page: GET /inventory
APIs:
- GET /api/inventory — list inventory items (any authenticated user)
- POST /api/admin/inventory — add item (admin only)
- DELETE /api/admin/inventory/{item_id} — remove item (admin only)

Inventory items are stored in SQLite alongside the metrics DB (same METRICS_DB_PATH).

Problems + host events

The dashboard has a Problems view (/#problems) that shows:

Current problems derived from host checks + protocol checks
Recent structured per-host events (failures/recoveries)

API:

GET /api/events/recent — last ~500 in-memory host events (resets on server restart)

Screenshots (snapshots)

Project UI screenshots are stored under docs/screenshots/.

To regenerate them locally, use the helper script:

scripts/take_screenshots.py

It requires Playwright (Chromium). The script reads credentials from env vars so you don't hardcode passwords:

System Trace_USER
System Trace_PASS

Create users (manual SQLite)

Users are stored in the users table inside AUTH_DB_PATH (default data/auth.db).

If you don't have the sqlite3 CLI installed, you can use the included helper script instead:

cd ai-system-health-dashboard
python scripts/manage_users.py list
python scripts/manage_users.py create --username admin --role admin --password 'changeme'
python scripts/manage_users.py set-password --username admin --password 'newpass'
python scripts/manage_users.py set-role --username viewer --role admin

Generate a PBKDF2 hash for a password:

cd ai-system-health-dashboard
python -c "from app.auth_storage import auth_storage; print(auth_storage.hash_password('your-password-here'))"

Insert the user (example admin):

sqlite3 data/auth.db "INSERT INTO users (username, password_hash, role, is_active, created_at) VALUES ('admin', '<PASTE_HASH_HERE>', 'admin', 1, strftime('%s','now'));"

If the DB file doesn't exist yet, start the server once (it will create the table), then insert the user.

API

Metrics & Monitoring

GET /api/metrics/latest — latest sample
- Unauthenticated: 401
- viewer / admin: 200
GET /api/metrics/history?seconds=300
GET /api/insights
WebSocket: ws://localhost:8000/ws/metrics

User Management (Admin Only)

GET /api/users — List all users
POST /api/users — Create new user
PUT /api/users/{user_id} — Update user
DELETE /api/users/{user_id} — Delete user
GET /api/user-groups — List all user groups
POST /api/user-groups — Create user group
PUT /api/user-groups/{group_id} — Update user group
DELETE /api/user-groups/{group_id} — Delete user group

Host Discovery & Deployment

GET /api/discovery/results — Get discovery results
GET /api/agent/files/{os_type} — Get agent files for OS type
POST /api/agent/deploy/{host_ip} — Deploy agent to host

Admin (SQLite):

GET /api/admin/db — DB stats (rows, time range, file size)
- Unauthenticated: 401
- viewer: 403
- admin: 200
POST /api/admin/db/prune — delete rows older than SQLITE_RETENTION_SECONDS
POST /api/admin/db/vacuum — rebuild DB file (may pause briefly)

Network Monitoring & Agent Deployment

Auto-Discovery & Multi-Platform Agents

The dashboard now includes comprehensive network monitoring and agent deployment capabilities:

Auto-Discovery: Scans network ranges to discover hosts
Multi-Platform Support: Ubuntu, Debian, RHEL, CentOS, Rocky Linux
Non-Root Deployment: Secure agent deployment with minimal privileges
SNMP Monitoring: Full SNMP v2c support with automatic configuration
NTP Synchronization: Time service management and monitoring

Quick Start with Agent Deployment

1. Auto-Discover Hosts

python scripts/auto_discover_hosts.py

2. Deploy Non-Root Agent (Recommended)

bash scripts/deploy/deploy_non_root_centos_docker.sh

3. Access Hosts Management

Hosts Dashboard: http://localhost:8001/hosts
View discovered hosts and deployment status
Generate deployment commands automatically

Agent Features

System Metrics: CPU, memory, disk, network monitoring
SNMP Integration: Full v2c support with automatic configuration
NTP Sync: Time service configuration and monitoring
Service Management: Systemd integration with auto-restart
Security: Non-root deployment with minimal sudo privileges
Firewall Config: Automatic port opening (SNMP:161, NTP:123)

Supported Operating Systems

OS	Package Manager	SNMP Service	NTP Service
Ubuntu	apt	snmpd	ntp
Debian	apt	snmpd	ntp
RHEL	dnf	snmpd	chronyd
CentOS	yum	snmpd	chronyd
Rocky Linux	dnf	snmpd	chronyd

Configuration

Environment variables for network monitoring:

SNMP_HOST - SNMP target host (default: empty)
SNMP_PORT - SNMP port (default: 161)
SNMP_COMMUNITY - SNMP community string (default: public)
SNMP_TIMEOUT_SECONDS - SNMP timeout (default: 5)
ICMP_HOST - ICMP target host (default: 1.1.1.1)
ICMP_TIMEOUT_SECONDS - ICMP timeout (default: 3)
NTP_SERVER - NTP server (default: pool.ntp.org)
PROTOCOL_CHECK_INTERVAL_SECONDS - Check interval (default: 30)

Example configuration:

SNMP_HOST=<YOUR_AGENT_HOST>
SNMP_COMMUNITY=public
ICMP_HOST=<YOUR_AGENT_HOST>
NTP_SERVER=pool.ntp.org
PROTOCOL_CHECK_INTERVAL_SECONDS=30

Troubleshooting

SNMP Issues

# Test SNMP connectivity
snmpwalk -v2c -c public <YOUR_AGENT_HOST> 1.3.6.1.2.1.1.1.0

# Check SNMP service
systemctl status snmpd

NTP Issues

# Check NTP status
chronyc sources
# or
ntpq -p

# Force NTP sync
chronyc -a makestep

Agent Issues

# Check agent service
systemctl status system-trace-agent

# Check agent logs
journalctl -u system-trace-agent -f

# Test agent manually
sudo -u system-trace-agent python3 /home/system-trace-agent/system-trace-agent/system-trace_agent.py

Documentation

Agent Deployment Guide: docs/guides/AGENT_DEPLOYMENT_GUIDE.md
Non-Root Deployment: docs/guides/NON_ROOT_DEPLOYMENT_GUIDE.md
Network Monitoring: docs/guides/NETWORK_MONITORING_GUIDE.md
SNMP Configuration: docs/guides/SNMP_CONFIGURATION.md
CentOS/Docker Deployment: docs/guides/CENTOS_DOCKER_DEPLOYMENT.md

Scripts Available

Script	Purpose
`scripts/auto_discover_hosts.py`	Network host discovery
`scripts/deploy_agents_non_root.py`	Non-root agent deployment
`scripts/quick_deploy_agent.py`	Quick single-host deployment
`scripts/setup_snmp.py`	SNMP configuration
`scripts/test_snmp_devices.py`	SNMP connectivity testing
`scripts/manage_users.py`	User management CLI
`scripts/take_screenshots.py`	Capture UI screenshots
`scripts/deploy/`	Per-host deploy shell scripts
`scripts/fix/`	One-off fix scripts

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
agents		agents
app		app
docs		docs
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

System Trace Dashboard

Project Structure

UI Pages

Screenshots

Login

Dashboard

Hosts

Maps

Inventory

System Logs

Users

Host Monitor

Changelog

Feb 22, 2026

Network Device Discovery & Icons

User Management

User Groups

Role-Based Sidebar

Agent & Host Monitor

Feb 22, 2026 (later)

Auto-seed Default Admin User

Restore & Recovery Notes

Key Features

Host Monitor (/host?id=<id>)

System Logs (/logs)

Inventory (/inventory)

Agents

System Log API

Quickstart

1) Create a virtualenv + install deps

2) Run

3) Open

Configuration

Authentication (login)

Required env var

Optional env vars

Roles

Inventory (assets/equipment)

Problems + host events

Screenshots (snapshots)

Create users (manual SQLite)

API

Metrics & Monitoring

User Management (Admin Only)

Host Discovery & Deployment

Network Monitoring & Agent Deployment

Auto-Discovery & Multi-Platform Agents

Quick Start with Agent Deployment

1. Auto-Discover Hosts

2. Deploy Non-Root Agent (Recommended)

3. Access Hosts Management

Agent Features

Supported Operating Systems

Configuration

Troubleshooting

SNMP Issues

NTP Issues

Agent Issues

Documentation

Scripts Available

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Host Monitor (`/host?id=<id>`)

System Logs (`/logs`)

Inventory (`/inventory`)

Packages