Skip to content

Conversation

@shanduur
Copy link
Member

@shanduur shanduur commented Jan 8, 2026

Enables support for trustd behind load balancer by providing SNI.

(cherry picked from commit b593bc6)
(cherry picked from commit 33f336d)

@github-project-automation github-project-automation bot moved this to To Do in Planning Jan 8, 2026
@shanduur shanduur force-pushed the feat/grpc-authority branch 2 times, most recently from 2aea006 to 288dc2d Compare January 8, 2026 10:34
@shanduur shanduur force-pushed the feat/grpc-authority branch from 288dc2d to 9443235 Compare January 8, 2026 10:44
))

grpcOpts := []grpc.DialOption{
grpc.WithAuthority(ParseAuthority(host)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this wouldn't work, as we have a list of endpoints, and pass a single authority with it.

Either the round-robin should return a proper authority, or something else should be going on here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smira is there any difference in the way Talos populates the value for the address parameter for the NewConnection method depending on the number of endpoints that would make this fail if there is more than one endpoint to balance the load with ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, reading through the code of the gRPC client, and looking at our code, we are setting the ServerName correctly in our resolver, so it should be propagated as intended:

Adding the WithAuthority here will override that - and this is to be done by the user. We need a new configuration (new document? field in existing document?) that will allow setting that field (?).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to describe the setup I've successfully tested this with.
I have an existing "admin" cluster made up of Talos nodes.

Within that cluster I have deployed :

  • a Gateway API (Traefik) with a NodePort service listening on ports 32443 and 32501
  • the Kamaji controller
  • a Kamaji TenantControlPlane named tenant-one with talos-csr-signer as an extra container (as described here)
  • gateway and routes to redirect requests for host api.tenant-one.ingress.example.com on both ports to either the Kamaji control plane or CSR signer for that tenant

I also have :

  • a reserved public IP
  • a wildcard DNS entry for *.ingress.example.com pointing to that IP
  • a load balancer attached to that IP, listening on ports 443 and 50001 and forwarding requests to the admin cluster nodes on the nodeports mentioned previously

I'm now trying to add Talos nodes to that tenant-one Kamaji control-plane with the following cluster configuration :

cluster:
  clusterName: tenant-one
  controlPlane:
    endpoint: https://api.tenant-one.ingress.example.com:443

With Talos v1.10.9 (without the proposed changes) when the node is initializing it sends a request to the resolved IP for api.tenant-one.ingress.example.com on port 50001. As that requests doesn't include the configured host name, the Gateway API controller doesn't present the right certificate and Talos can't validate the CA.

With a custom Talos image with the hard-coded stringapi.tenant-one.ingress.example.comas authority, the node successfully gets its certificate from the CSR signer.

Even if I haven't tested this setup with more than one IP resolving the host or with Talos control-plane nodes (and not Kamaji's CSR signer), I don't see why it would not work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a reproducer - minimal set of manifests that we can test this with? This might be easier than I initially thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smasset-orange please also keep in mind that this whole Kamaji thing is a bit of a hack introduced to make Talos workers connect, but there is no official support for it on our side.

I don't know why this would or would not set an authority, but I guess this should be verified with a unit-test with a gRPC server and gRPC client using round-robin and asserting a proper authority is either set or not.

We can't really accept random changes to this code as this might break "normal" flow of Talos outside of Kamaji case.

@shanduur shanduur force-pushed the feat/grpc-authority branch from 9443235 to acf4f2b Compare January 8, 2026 11:31
Enables support for trustd behind load balancer by providing SNI.

(cherry picked from commit b593bc6)
(cherry picked from commit 33f336d)

Signed-off-by: Sébastien Masset <86793256+smasset-orange@users.noreply.github.com>
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
@shanduur shanduur force-pushed the feat/grpc-authority branch from acf4f2b to 3bafbf6 Compare January 8, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To Do

Development

Successfully merging this pull request may close these issues.

3 participants