Fix DagProcessor crash: add missing name_is_otel_safe() guard to gauge() and timer()#68284
Open
ersalil wants to merge 3 commits into
Open
Fix DagProcessor crash: add missing name_is_otel_safe() guard to gauge() and timer()#68284ersalil wants to merge 3 commits into
ersalil wants to merge 3 commits into
Conversation
… gauge() and timer()
kaxil
reviewed
Jun 9, 2026
kaxil
approved these changes
Jun 10, 2026
kaxil
left a comment
Member
There was a problem hiding this comment.
Static checks are failing:
shared/observability/src/airflow_shared/observability/metrics/otel_logger.py:203: error: Argument 1 to "test" of "ListValidator" has incompatible type "str | None"; expected "str" [arg-type]
return bool(stat) and self.metrics_validator.test(stat) and name_is_otel_safe(self.prefix, stat)
^~~~
shared/observability/src/airflow_shared/observability/metrics/otel_logger.py:203: error: Argument 2 to "name_is_otel_safe" has incompatible type "str | None"; expected "str" [arg-type]
return bool(stat) and self.metrics_validator.test(stat) and name_is_otel_safe(self.prefix, stat)
^~~~
Found 2 errors in 1 file (checked 20 source files)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SafeOtelLogger.gauge()andtimer()were missing thename_is_otel_safe()guard thatincr(),decr(), andtiming()all have. A DAG filename containing a space (e.g.PBI_SKU_Performance copy.py) caused the DagProcessor to crash on every loop iteration when OTel metrics were enabled.Why
The DagProcessor emits a gauge metric for every known DAG file on every parsing loop:
gauge()only validated against the allow/block list (metrics_validator.test()), which does not check OTel naming rules. A filename with a space passed that check and reachedmeter.create_gauge()inside the OTel SDK, which raised a plain Exception that propagated uncaught and crashed the process:Note: the OTel SDK error message says "63 characters" but this is a stale message — the actual SDK validation regex allows up to 255 characters (matching OTEL_NAME_MAX_LENGTH). The real rejection reason is the space character, which is not a valid OTel instrument name character. See open-telemetry/opentelemetry-python#3442.
The same gap existed in
timer(): it had no validation at all before being passed to_OtelTimer, which would then callrecord_histogram_value()with the invalid name at.stop()time.Fix
Added
metrics_validator.test(stat) and name_is_otel_safe(self.prefix, stat)togauge()(both thestatandback_compat_namepaths) andtimer(), making all five recording methods consistent:incr()decr()timing()gauge()timer()Testing
test_gauge_with_invalid_stat_names_skipped_without_raising— covers the exact reported scenario (space in filename) and non-ASCII, both reproducing the DagProcessor crash path.test_timer_with_invalid_stat_name_does_not_record— covers non-ASCII and space for the timer path.closes: #68282
Was generative AI tooling used to co-author this PR?
This PR was prepared with Gen-AI assistance (Claude). I reviewed all generated code.