Skip to content

[BUG]: run_worker wraps DD_AGENT_HOST in IPv6 brackets, producing an invalid OTLP metrics URL that crashes the worker #313

@parveshsaini

Description

@parveshsaini

Summary

run_worker builds the Temporal worker's OTLP metrics endpoint by unconditionally wrapping DD_AGENT_HOST in IPv6 literal brackets:

# agentex/src/temporal/run_worker.py, in run_worker()
host_url = os.environ.get("DD_AGENT_HOST")
metrics_url = f"http://[{host_url}]:4317" if host_url else None

Brackets are only valid around an IPv6 literal (RFC 3986). For the normal cases — a hostname (localhost), a Kubernetes service name (datadog-agent), or an IPv4 literal (10.0.0.5) — this yields a malformed URL like http://[localhost]:4317.

That metrics_url flows into TemporalClientFactory.create_client_from_env(...)create_client(...)OpenTelemetryConfig(url=metrics_url). The Temporal SDK validates the URL eagerly when building the Runtime, and because that construction happens inside the try block in create_client, the resulting error is re-raised as TemporalConnectionErrorso the worker fails to start entirely whenever DD_AGENT_HOST is set to anything other than a bare IPv6 literal.

Other call sites in the repo already use DD_AGENT_HOST without brackets (e.g. app.py's statsd_host=os.getenv("DD_AGENT_HOST", "localhost"), plus utils/db_metrics.py and utils/cache_metrics.py), so the bracketing here looks unintentional.

Steps to reproduce

Construct the metrics URL exactly as run_worker does and feed it to the same consumer (OpenTelemetryConfig):

from temporalio.runtime import OpenTelemetryConfig, Runtime, TelemetryConfig

for host in ["localhost", "datadog-agent", "10.0.0.5"]:
    url = f"http://[{host}]:4317"   # current run_worker behavior
    try:
        Runtime(telemetry=TelemetryConfig(metrics=OpenTelemetryConfig(url=url)))
        print(host, "OK")
    except Exception as e:
        print(host, "FAILED:", e)

(temporalio 1.23.0 — the version pinned in uv.lock.)

Actual

localhost     FAILED: Invalid OTel URL: invalid IPv6 address
datadog-agent FAILED: Invalid OTel URL: invalid IPv6 address
10.0.0.5      FAILED: Invalid OTel URL: invalid IPv6 address

The worker raises TemporalConnectionError on startup. Only a genuine IPv6 literal (DD_AGENT_HOST=::1http://[::1]:4317) is accepted.

Expected

The worker starts and exports metrics. The bracket-less URL is accepted for all the common cases:

http://localhost:4317      OK
http://datadog-agent:4317  OK
http://10.0.0.5:4317       OK

Suggested direction

Bracket the host only when it's actually an IPv6 literal (i.e. it contains a colon), otherwise use it as-is — matching how DD_AGENT_HOST is used elsewhere in the repo. Roughly:

host = f"[{host_url}]" if ":" in host_url else host_url
metrics_url = f"http://{host}:4317" if host_url else None

Environment

  • agentex/src/temporal/run_worker.py (run_worker), current main
  • temporalio 1.23.0 (per uv.lock; reproduced on Linux/WSL Ubuntu, Python 3.12)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions