Add ControllerGetNodeInfo and NodeGetID RPCs (alpha)#603
Add ControllerGetNodeInfo and NodeGetID RPCs (alpha)#603huww98 wants to merge 1 commit intocontainer-storage-interface:masterfrom
Conversation
|
There's a lot of k8s stuff in this PR desc, probably better suited for a
KEP that this PR might reference. This (CSI spec) is not a subproject of
k8s, it's CO agnostic, or at least it tries to be. Appreciate the context
though.
Reading this through it's not immediately clear to me why this isn't
solveable some other way- what alternatives were considered and why were
they not sufficient?
James DeFelice
…On Sat, Mar 21, 2026, 10:27 AM 胡玮文 ***@***.***> wrote:
*What type of PR is this?*
/kind feature
*What this PR does / why we need it*:
Proposal: Add ControllerGetNodeInfo and NodeGetID RPCs Motivation
Some user have strict security requirements that prohibit distributing
cloud API credentials to node components. However, the current NodeGetInfo
RPC requires cloud API access to retrieve:
1. *Topology information* (zone, region, supported disk categories)
2. *max_volumes_per_node* (requires querying currently attached disks
that may not be managed by CSI)
This proposal introduces two new RPCs to enable node registration without
requiring cloud API credentials on the node side.
Proposed Changes New RPCs 1. NodeGetID (Node Service)
- *Input*: None
- *Output*: node_id (e.g., cloud instance ID)
- *Purpose*: Returns only the node identifier, which can be obtained
locally (e.g., from instance metadata service) without cloud API
credentials.
2. ControllerGetNodeInfo (Controller Service)
- *Input*: node_id, published_volume_ids (volumes CO believes are
published, including uncertain status)
- *Output*: accessible_topology, max_volumes_per_node
- *Purpose*: Fetches node topology and capacity information from the
controller side, where cloud API credentials are already available.
New Capabilities
- NodeServiceCapability.RPC.GET_ID - Indicates support for NodeGetID
- ControllerServiceCapability.RPC.GET_NODE_INFO - Indicates support
for ControllerGetNodeInfo
Kubernetes Integration
The following diagram illustrates how these new RPCs integrate with
Kubernetes components (proposed changes to external-attacher sidecar):
sequenceDiagram
box rgba(255,0,0,0.1) Node Side
participant kubelet
participant csi-node as CSI Node Plugin
end
box Controller Side
participant attacher as external-attacher
participant csi-ctrl as CSI Controller Plugin
participant scheduler as Scheduler
end
attacher->>attacher: Watch CSINode/VolumeAttachment
kubelet->>+csi-node: NodeGetCapabilities
csi-node-->>-kubelet: GET_ID capability
attacher->>+csi-ctrl: ControllerGetCapabilities
csi-ctrl-->>-attacher: GET_NODE_INFO capability
kubelet->>+csi-node: NodeGetID
csi-node->>csi-node: Get instance ID from metadata
csi-node-->>-kubelet: node_id = instance-xxx
kubelet->>attacher: Annotate CSINode with node_id
attacher->>+csi-ctrl: ControllerGetNodeInfo(node_id, published_volume_ids)
csi-ctrl->>csi-ctrl: Query cloud APIs for topology & capacity
csi-ctrl-->>-attacher: topology, max_volumes_per_node
attacher->>scheduler: Update CSINode (remove annotation)
Note over attacher,csi-ctrl: Volume limit reached scenario
attacher->>+csi-ctrl: ControllerPublishVolume
csi-ctrl-->>-attacher: RESOURCE_EXHAUSTED
attacher->>+csi-ctrl: ControllerGetNodeInfo(node_id, published_volume_ids)
csi-ctrl-->>-attacher: Updated max_volumes_per_node
attacher->>kubelet: Update CSINode
kubelet->>kubelet: Fail affected Pods
Loading Why Integrate with external-attacher?
1.
*Reuses existing watches*: external-attacher already watches
VolumeAttachment resources, providing the list of attached volumes needed
for accurate max_volumes_per_node calculation.
2.
*Consistency guarantee*: The attacher can pause attach/detach
operations during ControllerGetNodeInfo calls to ensure consistent
results.
*Note*: If the SP does not support PUBLISH_UNPUBLISH_VOLUME, the
ControllerGetNodeInfo RPC can still be used to fetch static topology
information at node registration time with empty published_volume_ids.
Backward Compatibility
- Both RPCs are marked as *alpha* (option (alpha_method) = true)
- Existing NodeGetInfo RPC remains unchanged
- COs can detect support via capability discovery:
- If both GET_ID (node) and GET_NODE_INFO (controller) are
supported, use the new flow
- Otherwise, fall back to the traditional NodeGetInfo flow
Required Changes in Other Components
This CSI spec change requires corresponding updates in the following
Kubernetes components:
1. kubelet (kubernetes/kubernetes)
- Add support for calling NodeGetID RPC when the node plugin
advertises GET_ID capability
- When NodeGetID is used instead of NodeGetInfo, kubelet should:
- Store the node_id in CSINode annotation (e.g.,
csi.volume.kubernetes.io/nodeid.{driver}
<http://csi.volume.kubernetes.io/nodeid.%7Bdriver%7D>)
- NOT populate topology and allocatable count in CSINode (leave for
external-attacher)
- Maintain backward compatibility: use NodeGetInfo if GET_ID is not
supported
2. external-attacher (kubernetes-csi/external-attacher)
- Watch for CSINode annotations indicating nodes that need
ControllerGetNodeInfo
- Call ControllerGetNodeInfo when:
- A new node registers with only node_id (no topology)
- RESOURCE_EXHAUSTED error is returned from ControllerPublishVolume
- Update CSINode with topology and allocatable count from
ControllerGetNodeInfo response
- Coordinate with attach/detach operations to ensure consistent
published_volume_ids list
Security Benefits
- Node components no longer require cloud API credentials
- Reduced attack surface on nodes
- Cloud credentials are centralized in the controller component
- Meets strict security baseline requirements for sensitive
environments
Example Use Case (Alibaba Cloud)
For Alibaba Cloud ECS:
- NodeGetID returns the ECS instance ID (obtainable from instance
metadata at http://100.100.100.200/latest/meta-data/instance-id)
- ControllerGetNodeInfo queries:
- Zone ID and Region ID via DescribeInstances API
- Supported disk categories via DescribeAvailableResource API
- Current disk attachments via DescribeDisks API (to detect disks
not managed by CSI)
- Total attachable disk count via DescribeInstanceTypes API
*Which issue(s) this PR fixes*:
Fixes #
*Special notes for your reviewer*:
*Does this PR introduce an API-breaking change?*:
Add ControllerGetNodeInfo and NodeGetID RPCs (alpha)
------------------------------
You can view, comment on, or merge this pull request online at:
#603
Commit Summary
- a727d32
<a727d32>
Add ControllerGetNodeInfo and NodeGetID RPCs (alpha)
File Changes
(4 files
<https://github.com/container-storage-interface/spec/pull/603/files>)
- *M* csi.proto
<https://github.com/container-storage-interface/spec/pull/603/files#diff-e099a7ef79268152caab1065df9864ec07353456cad2199895c2c89601f23287>
(94)
- *M* lib/go/csi/csi.pb.go
<https://github.com/container-storage-interface/spec/pull/603/files#diff-da7481db22d3f8409112d2ce13ff879eff4c225908a3cf4a3f949b4b1386890b>
(2996)
- *M* lib/go/csi/csi_grpc.pb.go
<https://github.com/container-storage-interface/spec/pull/603/files#diff-c3ffbc06c3e8b5baaff813a111e0e6f0dadc7677a4d1ae2903197050b5931083>
(74)
- *M* spec.md
<https://github.com/container-storage-interface/spec/pull/603/files#diff-bc6661da34ecae62fbe724bb93fd69b91a7f81143f2683a81163231de7e3b545>
(143)
Patch Links:
- https://github.com/container-storage-interface/spec/pull/603.patch
- https://github.com/container-storage-interface/spec/pull/603.diff
—
Reply to this email directly, view it on GitHub
<#603?email_source=notifications&email_token=AAR5KLC7MOMVE475VLZREQ34R2RG5A5CNFSNUABEM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UF4ZTIMRYGY3DEOJUGKTHEZLBONXW5KTTOVRHGY3SNFRGKZFFMV3GK3TUVRTG633UMVZF6Y3MNFRWW>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR5KLBLQBRNWKPNM4XBKWT4R2RG5AVCNFSM6AAAAACW2CHXVWVHI2DSMVQWIX3LMV43ASLTON2WKOZUGEYTENZQGUZTMOA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
This proposal adds two new RPCs to enable node registration without requiring cloud API credentials on the node side: - NodeGetID: Returns only the node identifier, which can be obtained locally (e.g., from instance metadata) without cloud API access. - ControllerGetNodeInfo: Fetches node topology and max_volumes_per_node from the controller side, where cloud API credentials are available. New capabilities: - NodeServiceCapability.RPC.GET_ID - ControllerServiceCapability.RPC.GET_NODE_INFO Key design decisions: - If SP supports GET_ID, it MUST also support GET_NODE_INFO - published_volume_ids field allows accurate calculation of remaining attachable volumes by distinguishing CSI-managed vs non-CSI volumes - Both RPCs are marked as alpha This addresses security requirements where cloud API credentials should not be distributed to node components. Signed-off-by: 胡玮文 <huweiwen.hww@alibaba-inc.com>
a727d32 to
a6f4f25
Compare
Yes, I will propose this to Kubernetes if it looks fine on the CSI side.
There are not many alternatives actually. The instance metadata service (e.g., http://169.254.169.254 or http://100.100.100.200) typically only provides basic info like instance-id and zone. It does not provide disk attachment limits or supported disk categories, which require cloud API calls. The information required by NodeGetInfo is not fetchable from node with the security requirement. We have considered adding a new private CRD and a new controller in Kubernetes. The controller watches for new node, fetches the required info and store them in a CR, then node watches for the CR to get the info. But this still have major drawbacks:
The core insight is: |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Proposal: Add ControllerGetNodeInfo and NodeGetID RPCs
Motivation
Some user have strict security requirements that prohibit distributing cloud API credentials to node components. However, the current
NodeGetInfoRPC requires cloud API access to retrieve:This proposal introduces two new RPCs to enable node registration without requiring cloud API credentials on the node side.
Proposed Changes
New RPCs
1.
NodeGetID(Node Service)node_id(e.g., cloud instance ID)2.
ControllerGetNodeInfo(Controller Service)node_id,published_volume_ids(volumes CO believes are published, including uncertain status)accessible_topology,max_volumes_per_nodeNew Capabilities
NodeServiceCapability.RPC.GET_ID- Indicates support forNodeGetIDControllerServiceCapability.RPC.GET_NODE_INFO- Indicates support forControllerGetNodeInfoKubernetes Integration
The following diagram illustrates how these new RPCs integrate with Kubernetes components (proposed changes to external-attacher sidecar):
sequenceDiagram box rgba(255,0,0,0.1) Node Side participant kubelet participant csi-node as CSI Node Plugin end box Controller Side participant attacher as external-attacher participant csi-ctrl as CSI Controller Plugin participant scheduler as Scheduler end attacher->>attacher: Watch CSINode/VolumeAttachment kubelet->>+csi-node: NodeGetCapabilities csi-node-->>-kubelet: GET_ID capability attacher->>+csi-ctrl: ControllerGetCapabilities csi-ctrl-->>-attacher: GET_NODE_INFO capability kubelet->>+csi-node: NodeGetID csi-node->>csi-node: Get instance ID from metadata csi-node-->>-kubelet: node_id = instance-xxx kubelet->>attacher: Annotate CSINode with node_id attacher->>+csi-ctrl: ControllerGetNodeInfo(node_id, published_volume_ids) csi-ctrl->>csi-ctrl: Query cloud APIs for topology & capacity csi-ctrl-->>-attacher: topology, max_volumes_per_node attacher->>scheduler: Update CSINode (remove annotation) Note over attacher,csi-ctrl: Volume limit reached scenario attacher->>+csi-ctrl: ControllerPublishVolume csi-ctrl-->>-attacher: RESOURCE_EXHAUSTED attacher->>+csi-ctrl: ControllerGetNodeInfo(node_id, published_volume_ids) csi-ctrl-->>-attacher: Updated max_volumes_per_node attacher->>kubelet: Update CSINode kubelet->>kubelet: Fail affected PodsWhy Integrate with external-attacher?
Reuses existing watches: external-attacher already watches VolumeAttachment resources, providing the list of attached volumes needed for accurate
max_volumes_per_nodecalculation.Consistency guarantee: The attacher can pause attach/detach operations during
ControllerGetNodeInfocalls to ensure consistent results.Backward Compatibility
option (alpha_method) = true)NodeGetInfoRPC remains unchangedGET_ID(node) andGET_NODE_INFO(controller) are supported, use the new flowNodeGetInfoflowRequired Changes in Other Components
This CSI spec change requires corresponding updates in the following Kubernetes components:
1. kubelet (kubernetes/kubernetes)
NodeGetIDRPC when the node plugin advertisesGET_IDcapabilityNodeGetIDis used instead ofNodeGetInfo, kubelet should:node_idin CSINode annotation (e.g.,csi.volume.kubernetes.io/nodeid.{driver})NodeGetInfoifGET_IDis not supported2. external-attacher (kubernetes-csi/external-attacher)
ControllerGetNodeInfoControllerGetNodeInfowhen:node_id(no topology)RESOURCE_EXHAUSTEDerror is returned fromControllerPublishVolumeControllerGetNodeInforesponsepublished_volume_idslistSecurity Benefits
Example Use Case (Alibaba Cloud)
For Alibaba Cloud ECS:
NodeGetIDreturns the ECS instance ID (obtainable from instance metadata athttp://100.100.100.200/latest/meta-data/instance-id)ControllerGetNodeInfoqueries:Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce an API-breaking change?: