This guide covers the optional observability stack for monitoring, logging, and tracing 01Agents.
The observability stack consists of:
- Prometheus & Grafana: Metrics collection and visualization.
- Loki: Log aggregation.
- Tempo: Distributed tracing.
- OpenTelemetry (OTEL): Standardized telemetry collection.
helm upgrade --install prometheus oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set prometheus.service.type=NodePort \
--set prometheus.service.nodePort=30090 \
--set grafana.service.type=NodePort \
--set grafana.service.nodePort=30080 \
--set grafana.adminPassword=admin \
--waitTo scrape metrics from the agents, add the following scrape job:
helm upgrade --install prometheus oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack -n monitoring --set-file prometheus.prometheusSpec.additionalScrapeConfigs=helm-chart/additional-scrape-configs.yaml --create-namespacehelm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install loki grafana/loki \
--namespace logging \
--create-namespace \
--set loki.auth_enabled=false \
--set deploymentMode=SingleBinary \
--set singleBinary.replicas=1 \
--set loki.commonConfig.replication_factor=1 \
--set loki.storage.type=filesystem \
--set minio.enabled=false \
--set backend.replicas=0 \
--set read.replicas=0 \
--set write.replicas=0 \
--set loki.useTestSchema=trueEnable metrics generator by creating helm-chart/tempo-values.yaml:
tempo:
metricsGenerator:
enabled: true
remoteWriteUrl: "http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090/api/v1/write"
storage:
trace:
backend: localInstall/Upgrade Tempo:
helm upgrade --install tempo grafana/tempo --namespace tracing -f helm-chart/tempo-values.yaml --create-namespacehelm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
helm install opentelemetry-operator open-telemetry/opentelemetry-operator \
--namespace opentelemetry \
--create-namespace \
--set manager.collectorImage.repository=otel/opentelemetry-collector-contrib \
--set admissionWebhooks.certManager.enabled=false \
--set admissionWebhooks.autoGenerateCert.enabled=trueCreate helm-chart/otel-collector.yaml (see original README for content) and apply it:
kubectl apply -f helm-chart/otel-collector.yamlAdd these in Grafana (Connections → Data Sources):
| Datasource | URL |
|---|---|
| Prometheus | http://prometheus-kube-prometheus-prometheus.monitoring:9090/ |
| Tempo | http://tempo.tracing.svc.cluster.local:3200 |
| Loki | http://loki-gateway.logging.svc.cluster.local |
You can import the Level-1 Agent dashboard by following these steps:
- Open Grafana.
- Go to Dashboards → New → Import.
- Copy and paste the content of level-1-agent.json into the "Import via panel json" box, or upload the file.
- Click Load and then Import.
Telemetry features are disabled by default. They should only be enabled after the respective observability components (OTEL Collector, Tempo, Loki) have been deployed as described in sections 1-5 above.
To enable tracing for LangChain/LangGraph workflows using the OTEL Collector:
- Ensure the OTEL Collector is running and reachable.
- Update
values.yamlfor thel1andl2agents:
env:
LANGSMITH_TRACING: "true"
LANGSMITH_OTEL_ENABLED: "true"
LANGSMITH_OTEL_ONLY: "true"
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector-collector.opentelemetry.svc.cluster.local:4318
OTEL_EXPORTER_OTLP_PROTOCOL: http/protobufTo enable Traceloop for LLM instrumentation:
- Ensure your OTEL Collector is configured to receive Traceloop data.
- Update
values.yamlfor the agents:
env:
TRACELOOP_ENABLED: "true"
TRACELOOP_BASE_URL: http://otel-collector-collector.opentelemetry.svc.cluster.local:4318Note
When LANGSMITH_OTEL_ENABLED is true, LangChain traces are sent to the OTEL collector. You can use Traceloop independently or together depending on your requirements.
To send application logs to Loki via the OTEL Collector:
env:
OTEL_ENABLED: "true"
OTEL_SERVICE_NAME: l1-agent # or l2-agent
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT: http://otel-collector-collector.opentelemetry.svc.cluster.local:4318/v1/logs# OTEL Collector
kubectl port-forward svc/otel-collector-collector -n opentelemetry 4318:4318
# Grafana (Login: admin/admin)
kubectl port-forward svc/prometheus-grafana -n monitoring 3000:80
# Prometheus
kubectl port-forward svc/prometheus-kube-prometheus-prometheus -n monitoring 9090:9090