-
-
Notifications
You must be signed in to change notification settings - Fork 767
feat: add authority in GRPC client initialization #12556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2aea006 to
288dc2d
Compare
288dc2d to
9443235
Compare
| )) | ||
|
|
||
| grpcOpts := []grpc.DialOption{ | ||
| grpc.WithAuthority(ParseAuthority(host)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this wouldn't work, as we have a list of endpoints, and pass a single authority with it.
Either the round-robin should return a proper authority, or something else should be going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smira is there any difference in the way Talos populates the value for the address parameter for the NewConnection method depending on the number of endpoints that would make this fail if there is more than one endpoint to balance the load with ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, reading through the code of the gRPC client, and looking at our code, we are setting the ServerName correctly in our resolver, so it should be propagated as intended:
- https://github.com/siderolabs/talos/blob/v1.12.1/pkg/machinery/client/resolver/roundrobin.go#L81-L84
- https://github.com/grpc/grpc-go/blob/v1.77.0/clientconn.go#L1024-L1042
Adding the WithAuthority here will override that - and this is to be done by the user. We need a new configuration (new document? field in existing document?) that will allow setting that field (?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to describe the setup I've successfully tested this with.
I have an existing "admin" cluster made up of Talos nodes.
Within that cluster I have deployed :
- a Gateway API (Traefik) with a NodePort service listening on ports 32443 and 32501
- the Kamaji controller
- a Kamaji TenantControlPlane named
tenant-onewith talos-csr-signer as an extra container (as described here) - gateway and routes to redirect requests for host
api.tenant-one.ingress.example.comon both ports to either the Kamaji control plane or CSR signer for that tenant
I also have :
- a reserved public IP
- a wildcard DNS entry for
*.ingress.example.compointing to that IP - a load balancer attached to that IP, listening on ports 443 and 50001 and forwarding requests to the admin cluster nodes on the nodeports mentioned previously
I'm now trying to add Talos nodes to that tenant-one Kamaji control-plane with the following cluster configuration :
cluster:
clusterName: tenant-one
controlPlane:
endpoint: https://api.tenant-one.ingress.example.com:443
With Talos v1.10.9 (without the proposed changes) when the node is initializing it sends a request to the resolved IP for api.tenant-one.ingress.example.com on port 50001. As that requests doesn't include the configured host name, the Gateway API controller doesn't present the right certificate and Talos can't validate the CA.
With a custom Talos image with the hard-coded stringapi.tenant-one.ingress.example.comas authority, the node successfully gets its certificate from the CSR signer.
Even if I haven't tested this setup with more than one IP resolving the host or with Talos control-plane nodes (and not Kamaji's CSR signer), I don't see why it would not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a reproducer - minimal set of manifests that we can test this with? This might be easier than I initially thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smasset-orange please also keep in mind that this whole Kamaji thing is a bit of a hack introduced to make Talos workers connect, but there is no official support for it on our side.
I don't know why this would or would not set an authority, but I guess this should be verified with a unit-test with a gRPC server and gRPC client using round-robin and asserting a proper authority is either set or not.
We can't really accept random changes to this code as this might break "normal" flow of Talos outside of Kamaji case.
9443235 to
acf4f2b
Compare
acf4f2b to
3bafbf6
Compare
Enables support for trustd behind load balancer by providing SNI.
(cherry picked from commit b593bc6)
(cherry picked from commit 33f336d)