Skip to content

fix(config): graceful failure fallback to default#98

Open
duncanista wants to merge 5 commits intomainfrom
jordan.gonzalez/config/graceful-failure-per-field
Open

fix(config): graceful failure fallback to default#98
duncanista wants to merge 5 commits intomainfrom
jordan.gonzalez/config/graceful-failure-per-field

Conversation

@duncanista
Copy link
Contributor

What does this PR do?

Adds per-field graceful deserialization to the datadog-agent-config crate. Previously, if a single config field had the wrong type (e.g., DD_FLUSH_TIMEOUT=abc or flush_timeout: [1,2,3] in YAML), the entire config
source would fail and fall back to Config::default() — losing every valid field alongside the bad one. Now, only the invalid field falls back to its default while all other correctly-typed fields are preserved.

The fix introduces one generic deserializer function, deserialize_with_default<T: Deserialize + Default>, which catches serde errors at the field level and returns T::default(). This is applied via
#[serde(deserialize_with)] to all previously-unprotected fields across EnvConfig, YamlConfig, and their nested structs. Several existing deserializers (deserialize_optional_duration_from_seconds,
deserialize_optional_duration_from_microseconds, deserialize_trace_propagation_style, deserialize_key_value_pair_array_to_hashmap, deserialize_service_mapping, deserialize_apm_replace_rules) were also hardened to
catch errors instead of propagating them.

Motivation

In production, a single misconfigured environment variable or YAML field should not wipe out the entire agent configuration. The agent should log a warning for the bad field and continue with defaults for that
field only, while respecting all other valid configuration.

Additional Notes

  • No structural changes to EnvConfig, YamlConfig, nested structs, Config, merge_config, or ConfigSource::load. The change is purely additive #[serde(deserialize_with)] annotations and error-catching in existing
    deserializer functions.
  • All existing deserializers remain in use — no dead code introduced.
  • The test_parse_duration_from_microseconds test was updated: negative and float values for microsecond durations now gracefully return None instead of erroring, consistent with the graceful-failure goal.

Describe how to test/QA your changes

Two comprehensive regression tests were added:

  • test_all_yaml_fields_wrong_type_fallback_to_default — Creates a YAML file where every field (including deeply nested OTLP config) is set to the wrong type ([1, 2, 3]). Asserts load() succeeds and config ==
    Config::default().
  • test_all_env_fields_wrong_type_fallback_to_default — Sets every non-string env var to an invalid value ("not_a_number", "not_a_bool", "not_json{{", etc.). Asserts load() succeeds and every non-string field
    equals its default.

Both tests serve as canaries: when adding a new field, adding it to the test YAML/env with a wrong type will immediately fail if the field lacks a graceful deserializer.

All 78 tests pass (cargo test -p datadog-agent-config --lib).

@duncanista duncanista requested review from a team as code owners March 12, 2026 18:53
@duncanista duncanista requested review from apiarian-datadog, Copilot and shreyamalpani and removed request for a team March 12, 2026 18:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the resilience of datadog-agent-config config loading by making deserialization failures non-fatal at the field level, so that a single mistyped YAML/env value no longer causes the entire source to fall back to Config::default().

Changes:

  • Introduces a generic deserialize_with_default<T: Deserialize + Default> helper and applies it broadly via #[serde(deserialize_with)] on YAML/env config fields.
  • Hardens several existing deserializers to catch and ignore serde/type errors instead of propagating them.
  • Adds “canary” regression tests ensuring YAML/env loads succeed even when many fields have wrong types.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/datadog-agent-config/yaml.rs Applies per-field graceful deserialization across YAML structs and adds a comprehensive wrong-type YAML regression test.
crates/datadog-agent-config/env.rs Applies graceful deserialization for select env fields and adds a comprehensive wrong-type env regression test.
crates/datadog-agent-config/mod.rs Adds deserialize_with_default and hardens multiple deserializers to ignore type/serde errors.
crates/datadog-agent-config/service_mapping.rs Makes service mapping deserialization ignore type errors (returns empty map).
crates/datadog-agent-config/apm_replace_rule.rs Makes APM replace rules deserialization ignore type errors (returns None).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@duncanista duncanista changed the title fix(config): graceful failure fix(config): graceful failure fallback to default Mar 12, 2026
Copy link
Contributor

@apiarian-datadog apiarian-datadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some notes and questions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants