Skip to content

[FLINK-38946][flink-metrics] Allow AWS PrivateLink/GCP Cloud Private Service Connect for Datadog integration#27450

Draft
stoiev wants to merge 1 commit intoapache:masterfrom
stoiev:patch-2
Draft

[FLINK-38946][flink-metrics] Allow AWS PrivateLink/GCP Cloud Private Service Connect for Datadog integration#27450
stoiev wants to merge 1 commit intoapache:masterfrom
stoiev:patch-2

Conversation

@stoiev
Copy link
Copy Markdown
Contributor

@stoiev stoiev commented Jan 20, 2026

What is the purpose of the change

It aligns the Flink integration with Datadog's supported API host standards, unblocking usage of private networking support in Clouds providers, such as AWS PrivateLink and GCP Private Service Connect Documentation,

Brief change log

In DatadogHttpClient.java, the base URLs should be updated from: https://app.datadoghq.%s/... to https://api.datadoghq.%s/

Verifying this change

(example:)

  • TODO: Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

@stoiev stoiev changed the title [FLINK-38946][flink-metrics] Allow AWS VPC Endpoint/GCP Cloud Private Service Connect for Datadog integration [FLINK-38946][flink-metrics] Allow AWS PrivateLink/GCP Cloud Private Service Connect for Datadog integration Jan 20, 2026
@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Jan 20, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@davidradl
Copy link
Copy Markdown
Contributor

@stoiev I am curious are there scenarios where the old URL can and should be used ? A change like this will break all users of the old URL. I was thinking that you would tolerate multiple endpoints, maybe by indicating in Flink config what type of connection is required then the Flink code can use the appropriate URL.

@petems
Copy link
Copy Markdown

petems commented Jan 23, 2026

@stoiev I am curious are there scenarios where the old URL can and should be used ? A change like this will break all users of the old URL. I was thinking that you would tolerate multiple endpoints, maybe by indicating in Flink config what type of connection is required then the Flink code can use the appropriate URL.

The change to api. over app. was way back in 2018, and afaik the app. is just a subdomain for the front-end part of the platform in each DC.

So from that perspective, this shouldn't cause any issues in terms of backwards incompatibility (Don't quote me tho, I'm customer-facing Datadog eng, but I'll seeing if I can double check with someone on the platform team on that front),

Technically I guess it's a "breaking change" if someone's specifically whitelisted app for Splink and then they get changed by this...

PS. Sorry to barge into another PR, I'm just in the same boat helping a customer with some similar DC config issues, as well as seeing if I can show some love to some of the older docs and setup for Datadog parts of Flink 👍🏻

@stoiev
Copy link
Copy Markdown
Contributor Author

stoiev commented Feb 1, 2026

@stoiev I am curious are there scenarios where the old URL can and should be used ? A change like this will break all users of the old URL. I was thinking that you would tolerate multiple endpoints, maybe by indicating in Flink config what type of connection is required then the Flink code can use the appropriate URL.

The change to api. over app. was way back in 2018, and afaik the app. is just a subdomain for the front-end part of the platform in each DC.

So from that perspective, this shouldn't cause any issues in terms of backwards incompatibility (Don't quote me tho, I'm customer-facing Datadog eng, but I'll seeing if I can double check with someone on the platform team on that front),

Technically I guess it's a "breaking change" if someone's specifically whitelisted app for Splink and then they get changed by this...

PS. Sorry to barge into another PR, I'm just in the same boat helping a customer with some similar DC config issues, as well as seeing if I can show some love to some of the older docs and setup for Datadog parts of Flink 👍🏻

Thanks @petems! You got the point.

@davidradl, it’s difficult to guarantee that this won't cause issues in every possible scenario; for instance, some users might restrict network traffic based on specific domains. However, since the protocol remains the same, the Flink/Datadog integration should be unaffected as long as the network is functional..

Even if we include a configuration variable, I would argue for shifting the default to api.datadoghq.%s, as it is the only API domain currently supported and documented by Datadog

Also, I’m aware there is already a PR to add this configuration. I opened this new one thinking a simple URL change might be easier to ship, but I am happy to continue with the configuration PR instead if it's prefereable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants