Observability: Logs, Metrics, Traces

Status: active

When to use this runbook: deploying log aggregation for a Powernode environment, adding a new service to log routing, debugging why expected logs aren't showing up in Grafana.

Stack overview
Loki + Promtail deployment
Grafana datasource wiring
Retention
Log labels and queries
Application logging conventions
Metrics (Prometheus)
Troubleshooting

Stack overview

Powernode ships configuration scaffolding for a Grafana-Loki-Promtail-Prometheus stack. Operators run the actual containers themselves — the repo does not deploy them.

Component	Repo config	Role
Loki	`configs/logging/loki-config.yml`	Log storage + query (port 3100)
Promtail	`configs/logging/promtail-config.yml`	Log shipper, reads the systemd journal + service log files (port 9080)
Grafana	`configs/monitoring/grafana-datasources.yml`, `configs/monitoring/grafana-dashboards.yml`	UI + alerting
Prometheus	(operator-deployed)	Metrics scrape + storage (port 9090, default Grafana datasource)

The configs assume single-node defaults (replication_factor: 1, filesystem storage). For multi-host fleets, scale via the Loki microservices mode — repo configs are operator-extensible.

Loki + Promtail deployment

Running the stack

Loki, Promtail, and Grafana are third-party services, independent of Powernode's own systemd deployment — run them however you like (containers are simplest). A minimal single-host Compose file for just these tools:

services:
  loki:
    image: grafana/loki:2.9.0
    ports:
      - "3100:3100"
    volumes:
      - ./configs/logging/loki-config.yml:/etc/loki/local-config.yaml:ro
      - loki-data:/tmp/loki
    command: -config.file=/etc/loki/local-config.yaml
    restart: unless-stopped

  promtail:
    image: grafana/promtail:2.9.0
    volumes:
      - ./configs/logging/promtail-config.yml:/etc/promtail/config.yml:ro
      - /var/log:/var/log:ro
      - /var/log/journal:/var/log/journal:ro
      - /etc/machine-id:/etc/machine-id:ro
    command: -config.file=/etc/promtail/config.yml
    depends_on:
      - loki
    restart: unless-stopped

volumes:
  loki-data:

Bring it up:

docker compose -f docker-compose.observability.yml up -d
docker compose -f docker-compose.observability.yml logs -f loki | head -30   # smoke

What Promtail collects

Promtail's powernode-journal job keeps only powernode-*.service units from the systemd journal (via a relabel_configs keep), so the platform's own logs flow in automatically — no per-service opt-in. Separate file/syslog jobs cover apt-installed dependencies (nginx, PostgreSQL, Redis). This scoping avoids shipping unrelated journal noise to Loki and ballooning storage.

Grafana datasource wiring

The shipped configs/monitoring/grafana-datasources.yml provisions Prometheus as the default datasource. Loki is commented out — uncomment when deploying Loki:

# Edit configs/monitoring/grafana-datasources.yml — uncomment:
  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    isDefault: false
    version: 1
    editable: true

Mount the file into your Grafana container at /etc/grafana/provisioning/datasources/datasources.yml and restart Grafana to pick it up.

Retention

Layer	Default	Configured in	How to change
Loki logs	7 days	`configs/logging/loki-config.yml` (`retention_period: 168h`)	Edit value, restart Loki
Loki compactor sweep	every 10 min	same file (`compaction_interval`)	rarely changed
Promtail positions	persistent	`/tmp/positions.yaml` inside Promtail container	use a named volume to survive restarts

For compliance regimes that require longer retention (PCI: 1 year minimum), increase retention_period and ensure storage volume is sized accordingly. The compactor will free disk space automatically once retention_delete_delay: 2h has elapsed.

Log labels and queries

Promtail's relabel rules surface these labels for every Powernode journal line:

Label	Source	Example
`unit`	systemd unit (`__journal__systemd_unit`)	`powernode-backend@default.service`
`level`	journal priority (`__journal_priority_keyword`)	`err`
`host`	journal hostname	`powernode-hub`
`job`	static job label	`powernode`

Common LogQL queries (paste into Grafana → Explore → Loki):

# All ERROR lines from backend in the last hour
{unit=~"powernode-backend.*"} |= "ERROR"

# Worker job failures
{unit=~"powernode-worker.*"} |~ "Failed .* Job after"

# Backend 5xx
{unit=~"powernode-backend.*"} |~ "Completed 5\\d\\d"

# Audit log writes from the model layer
{unit=~"powernode-backend.*"} |= "AuditLog" |= "created"

# Report request lifecycle for a specific id
{unit=~"powernode-(backend|worker).*"} |= "019e3c6c-9e1a"

Application logging conventions

Powernode services follow consistent log emission to make LogQL queries reliable:

Rails backend uses Rails.logger only (per feedback_clean_implementations and frontend/CLAUDE.md — no puts/print). Output is JSON-ish lines on stdout.
Worker uses BaseJob helpers (log_info, log_error) that emit structured fields including job class + JID.
Frontend (browser) uses logger from @/shared/utils/logger — no console.log in production (caught by scripts/cleanup-all-console-logs.sh).
Request IDs: each HTTP request gets a request.uuid Rails sets. Include it when logging from a request path so cross-service traces correlate.

If you add a new component that should ship logs to Loki:

If it runs as a powernode-* systemd unit, journald collects it automatically. Otherwise add a dedicated scrape job (journal matches, file __path__, or syslog) in promtail-config.yml.
Ensure stdout/stderr is unbuffered (Ruby: STDOUT.sync = true; Node: process.stdout.write is line-buffered when TTY).
Use a structured format (JSON or key=value pairs) so LogQL can | logfmt-parse fields.

Metrics (Prometheus)

The repo ships configs/monitoring/grafana-dashboards.yml and a grafana-dashboards/ directory. Prometheus scrape config is operator-owned — point Prometheus at:

Status: not yet implemented — there is no yabeda-rails/yabeda-prometheus-backed /metrics endpoint today, and no yabeda gem (active or commented) exists in server/Gemfile. The actual APM/monitoring gems are sentry-ruby/sentry-rails, skylight (optional), and OpenTelemetry (opt-in via OTEL_ENABLED=true + bundle install --with opentelemetry). The Rails-app Prometheus /metrics path below is planned; adding the yabeda gems is the intended path to enable it.

Endpoint	What it exposes
`http://backend:3000/metrics`	Rails app metrics (request counts, latency histograms via `yabeda-rails`) — planned; requires adding the `yabeda-prometheus` gem to `server/Gemfile` and re-bundling. Not present today (see status callout above).
`http://worker:4567/metrics`	Worker HTTP API metrics (job dispatch counts, queue depth)
`cAdvisor`, `node_exporter`	Standard host + container metrics — deploy via the same observability compose file

Troubleshooting

"I don't see any logs in Grafana"

Is Loki receiving them?

curl -s http://localhost:3100/ready
curl -s http://localhost:3100/metrics | grep ingester_streams_total

ingester_streams_total should be non-zero and growing.

Is Promtail scraping?
```
curl -s http://localhost:9080/targets | head -30
```
Expected: the powernode-journal job plus the file/syslog jobs, all up.

Are the journal units flowing?

journalctl -u 'powernode-*' -n 5 --no-pager   # confirm units log to journald

Datasource configured in Grafana? Grafana → Configuration → Data Sources → Loki should show "Data source is working".

"Some lines have no labels"

The powernode-journal job only keeps powernode-*.service units. Logs from the file/syslog jobs carry their own job label — query by job (e.g. {job="nginx"}) instead.

"Disk filling up on Loki host"

The compactor needs time to free space (retention_delete_delay: 2h). If disk is filling faster than retention deletes free, either:

Reduce retention_period
Increase the host volume
Add more aggressive label filters in promtail-config.yml to drop noisy logs at the ingest boundary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observability: Logs, Metrics, Traces

Contents

Stack overview

Loki + Promtail deployment

Running the stack

What Promtail collects

Grafana datasource wiring

Retention

Log labels and queries

Application logging conventions

Metrics (Prometheus)

Troubleshooting

"I don't see any logs in Grafana"

"Some lines have no labels"

"Disk filling up on Loki host"

See also

FilesExpand file tree

observability.md

Latest commit

History

observability.md

File metadata and controls

Observability: Logs, Metrics, Traces

Contents

Stack overview

Loki + Promtail deployment

Running the stack

What Promtail collects

Grafana datasource wiring

Retention

Log labels and queries

Application logging conventions

Metrics (Prometheus)

Troubleshooting

"I don't see any logs in Grafana"

"Some lines have no labels"

"Disk filling up on Loki host"

See also