Add ControllerGetNodeInfo and NodeGetID RPCs (alpha) by huww98 · Pull Request #603 · container-storage-interface/spec

huww98 · 2026-03-21T14:27:34Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Proposal: Add ControllerGetNodeInfo and NodeGetID RPCs

Motivation

Some user have strict security requirements that prohibit distributing cloud API credentials to node components. However, the current NodeGetInfo RPC requires cloud API access to retrieve:

Topology information (zone, region, supported disk categories)
max_volumes_per_node (requires querying currently attached disks that may not be managed by CSI)

This proposal introduces two new RPCs to enable node registration without requiring cloud API credentials on the node side.

Proposed Changes

New RPCs

1. `NodeGetID` (Node Service)

Input: None
Output: node_id (e.g., cloud instance ID)
Purpose: Returns only the node identifier, which can be obtained locally (e.g., from instance metadata service) without cloud API credentials.

2. `ControllerGetNodeInfo` (Controller Service)

Input: node_id, published_volume_ids (volumes CO believes are published, including uncertain status)
Output: accessible_topology, max_volumes_per_node
Purpose: Fetches node topology and capacity information from the controller side, where cloud API credentials are already available.

New Capabilities

NodeServiceCapability.RPC.GET_ID - Indicates support for NodeGetID
ControllerServiceCapability.RPC.GET_NODE_INFO - Indicates support for ControllerGetNodeInfo

Kubernetes Integration

The following diagram illustrates how these new RPCs integrate with Kubernetes components (proposed changes to external-attacher sidecar):

sequenceDiagram
    box rgba(255,0,0,0.1) Node Side
        participant kubelet
        participant csi-node as CSI Node Plugin
    end
    box Controller Side
        participant attacher as external-attacher
        participant csi-ctrl as CSI Controller Plugin
        participant scheduler as Scheduler
    end

    attacher->>attacher: Watch CSINode/VolumeAttachment

    kubelet->>+csi-node: NodeGetCapabilities
    csi-node-->>-kubelet: GET_ID capability

    attacher->>+csi-ctrl: ControllerGetCapabilities
    csi-ctrl-->>-attacher: GET_NODE_INFO capability

    kubelet->>+csi-node: NodeGetID
    csi-node->>csi-node: Get instance ID from metadata
    csi-node-->>-kubelet: node_id = instance-xxx

    kubelet->>attacher: Annotate CSINode with node_id

    attacher->>+csi-ctrl: ControllerGetNodeInfo(node_id, published_volume_ids)
    csi-ctrl->>csi-ctrl: Query cloud APIs for topology & capacity
    csi-ctrl-->>-attacher: topology, max_volumes_per_node

    attacher->>scheduler: Update CSINode (remove annotation)

    Note over attacher,csi-ctrl: Volume limit reached scenario
    attacher->>+csi-ctrl: ControllerPublishVolume
    csi-ctrl-->>-attacher: RESOURCE_EXHAUSTED
    attacher->>+csi-ctrl: ControllerGetNodeInfo(node_id, published_volume_ids)
    csi-ctrl-->>-attacher: Updated max_volumes_per_node
    attacher->>kubelet: Update CSINode
    kubelet->>kubelet: Fail affected Pods

Why Integrate with external-attacher?

Reuses existing watches: external-attacher already watches VolumeAttachment resources, providing the list of attached volumes needed for accurate max_volumes_per_node calculation.
Consistency guarantee: The attacher can pause attach/detach operations during ControllerGetNodeInfo calls to ensure consistent results.

Note: If the SP does not support PUBLISH_UNPUBLISH_VOLUME, the ControllerGetNodeInfo RPC can still be used to fetch static topology information at node registration time with empty published_volume_ids.

Backward Compatibility

Both RPCs are marked as alpha (option (alpha_method) = true)
Existing NodeGetInfo RPC remains unchanged
COs can detect support via capability discovery:
- If both GET_ID (node) and GET_NODE_INFO (controller) are supported, use the new flow
- Otherwise, fall back to the traditional NodeGetInfo flow

Required Changes in Other Components

This CSI spec change requires corresponding updates in the following Kubernetes components:

1. kubelet (kubernetes/kubernetes)

Add support for calling NodeGetID RPC when the node plugin advertises GET_ID capability
When NodeGetID is used instead of NodeGetInfo, kubelet should:
- Store the node_id in CSINode annotation (e.g., csi.volume.kubernetes.io/nodeid.{driver})
- NOT populate topology and allocatable count in CSINode (leave for external-attacher)
Maintain backward compatibility: use NodeGetInfo if GET_ID is not supported

2. external-attacher (kubernetes-csi/external-attacher)

Watch for CSINode annotations indicating nodes that need ControllerGetNodeInfo
Call ControllerGetNodeInfo when:
- A new node registers with only node_id (no topology)
- RESOURCE_EXHAUSTED error is returned from ControllerPublishVolume
Update CSINode with topology and allocatable count from ControllerGetNodeInfo response
Coordinate with attach/detach operations to ensure consistent published_volume_ids list

Security Benefits

Node components no longer require cloud API credentials
Reduced attack surface on nodes
Cloud credentials are centralized in the controller component
Meets strict security baseline requirements for sensitive environments

Example Use Case (Alibaba Cloud)

For Alibaba Cloud ECS:

NodeGetID returns the ECS instance ID (obtainable from instance metadata at http://100.100.100.200/latest/meta-data/instance-id)
ControllerGetNodeInfo queries:
- Zone ID and Region ID via DescribeInstances API
- Supported disk categories via DescribeAvailableResource API
- Current disk attachments via DescribeDisks API (to detect disks not managed by CSI)
- Total attachable disk count via DescribeInstanceTypes API

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce an API-breaking change?:

Add ControllerGetNodeInfo and NodeGetID RPCs (alpha)

jdef · 2026-03-21T14:54:45Z

There's a lot of k8s stuff in this PR desc, probably better suited for a KEP that this PR might reference. This (CSI spec) is not a subproject of k8s, it's CO agnostic, or at least it tries to be. Appreciate the context though. Reading this through it's not immediately clear to me why this isn't solveable some other way- what alternatives were considered and why were they not sufficient? James DeFelice

…

On Sat, Mar 21, 2026, 10:27 AM 胡玮文 ***@***.***> wrote: *What type of PR is this?* /kind feature *What this PR does / why we need it*: Proposal: Add ControllerGetNodeInfo and NodeGetID RPCs Motivation Some user have strict security requirements that prohibit distributing cloud API credentials to node components. However, the current NodeGetInfo RPC requires cloud API access to retrieve: 1. *Topology information* (zone, region, supported disk categories) 2. *max_volumes_per_node* (requires querying currently attached disks that may not be managed by CSI) This proposal introduces two new RPCs to enable node registration without requiring cloud API credentials on the node side. Proposed Changes New RPCs 1. NodeGetID (Node Service) - *Input*: None - *Output*: node_id (e.g., cloud instance ID) - *Purpose*: Returns only the node identifier, which can be obtained locally (e.g., from instance metadata service) without cloud API credentials. 2. ControllerGetNodeInfo (Controller Service) - *Input*: node_id, published_volume_ids (volumes CO believes are published, including uncertain status) - *Output*: accessible_topology, max_volumes_per_node - *Purpose*: Fetches node topology and capacity information from the controller side, where cloud API credentials are already available. New Capabilities - NodeServiceCapability.RPC.GET_ID - Indicates support for NodeGetID - ControllerServiceCapability.RPC.GET_NODE_INFO - Indicates support for ControllerGetNodeInfo Kubernetes Integration The following diagram illustrates how these new RPCs integrate with Kubernetes components (proposed changes to external-attacher sidecar): sequenceDiagram box rgba(255,0,0,0.1) Node Side participant kubelet participant csi-node as CSI Node Plugin end box Controller Side participant attacher as external-attacher participant csi-ctrl as CSI Controller Plugin participant scheduler as Scheduler end attacher->>attacher: Watch CSINode/VolumeAttachment kubelet->>+csi-node: NodeGetCapabilities csi-node-->>-kubelet: GET_ID capability attacher->>+csi-ctrl: ControllerGetCapabilities csi-ctrl-->>-attacher: GET_NODE_INFO capability kubelet->>+csi-node: NodeGetID csi-node->>csi-node: Get instance ID from metadata csi-node-->>-kubelet: node_id = instance-xxx kubelet->>attacher: Annotate CSINode with node_id attacher->>+csi-ctrl: ControllerGetNodeInfo(node_id, published_volume_ids) csi-ctrl->>csi-ctrl: Query cloud APIs for topology & capacity csi-ctrl-->>-attacher: topology, max_volumes_per_node attacher->>scheduler: Update CSINode (remove annotation) Note over attacher,csi-ctrl: Volume limit reached scenario attacher->>+csi-ctrl: ControllerPublishVolume csi-ctrl-->>-attacher: RESOURCE_EXHAUSTED attacher->>+csi-ctrl: ControllerGetNodeInfo(node_id, published_volume_ids) csi-ctrl-->>-attacher: Updated max_volumes_per_node attacher->>kubelet: Update CSINode kubelet->>kubelet: Fail affected Pods Loading Why Integrate with external-attacher? 1. *Reuses existing watches*: external-attacher already watches VolumeAttachment resources, providing the list of attached volumes needed for accurate max_volumes_per_node calculation. 2. *Consistency guarantee*: The attacher can pause attach/detach operations during ControllerGetNodeInfo calls to ensure consistent results. *Note*: If the SP does not support PUBLISH_UNPUBLISH_VOLUME, the ControllerGetNodeInfo RPC can still be used to fetch static topology information at node registration time with empty published_volume_ids. Backward Compatibility - Both RPCs are marked as *alpha* (option (alpha_method) = true) - Existing NodeGetInfo RPC remains unchanged - COs can detect support via capability discovery: - If both GET_ID (node) and GET_NODE_INFO (controller) are supported, use the new flow - Otherwise, fall back to the traditional NodeGetInfo flow Required Changes in Other Components This CSI spec change requires corresponding updates in the following Kubernetes components: 1. kubelet (kubernetes/kubernetes) - Add support for calling NodeGetID RPC when the node plugin advertises GET_ID capability - When NodeGetID is used instead of NodeGetInfo, kubelet should: - Store the node_id in CSINode annotation (e.g., csi.volume.kubernetes.io/nodeid.{driver} <http://csi.volume.kubernetes.io/nodeid.%7Bdriver%7D>) - NOT populate topology and allocatable count in CSINode (leave for external-attacher) - Maintain backward compatibility: use NodeGetInfo if GET_ID is not supported 2. external-attacher (kubernetes-csi/external-attacher) - Watch for CSINode annotations indicating nodes that need ControllerGetNodeInfo - Call ControllerGetNodeInfo when: - A new node registers with only node_id (no topology) - RESOURCE_EXHAUSTED error is returned from ControllerPublishVolume - Update CSINode with topology and allocatable count from ControllerGetNodeInfo response - Coordinate with attach/detach operations to ensure consistent published_volume_ids list Security Benefits - Node components no longer require cloud API credentials - Reduced attack surface on nodes - Cloud credentials are centralized in the controller component - Meets strict security baseline requirements for sensitive environments Example Use Case (Alibaba Cloud) For Alibaba Cloud ECS: - NodeGetID returns the ECS instance ID (obtainable from instance metadata at http://100.100.100.200/latest/meta-data/instance-id) - ControllerGetNodeInfo queries: - Zone ID and Region ID via DescribeInstances API - Supported disk categories via DescribeAvailableResource API - Current disk attachments via DescribeDisks API (to detect disks not managed by CSI) - Total attachable disk count via DescribeInstanceTypes API *Which issue(s) this PR fixes*: Fixes # *Special notes for your reviewer*: *Does this PR introduce an API-breaking change?*: Add ControllerGetNodeInfo and NodeGetID RPCs (alpha) ------------------------------ You can view, comment on, or merge this pull request online at: #603 Commit Summary - a727d32 <a727d32> Add ControllerGetNodeInfo and NodeGetID RPCs (alpha) File Changes (4 files <https://github.com/container-storage-interface/spec/pull/603/files>) - *M* csi.proto <https://github.com/container-storage-interface/spec/pull/603/files#diff-e099a7ef79268152caab1065df9864ec07353456cad2199895c2c89601f23287> (94) - *M* lib/go/csi/csi.pb.go <https://github.com/container-storage-interface/spec/pull/603/files#diff-da7481db22d3f8409112d2ce13ff879eff4c225908a3cf4a3f949b4b1386890b> (2996) - *M* lib/go/csi/csi_grpc.pb.go <https://github.com/container-storage-interface/spec/pull/603/files#diff-c3ffbc06c3e8b5baaff813a111e0e6f0dadc7677a4d1ae2903197050b5931083> (74) - *M* spec.md <https://github.com/container-storage-interface/spec/pull/603/files#diff-bc6661da34ecae62fbe724bb93fd69b91a7f81143f2683a81163231de7e3b545> (143) Patch Links: - https://github.com/container-storage-interface/spec/pull/603.patch - https://github.com/container-storage-interface/spec/pull/603.diff — Reply to this email directly, view it on GitHub <#603?email_source=notifications&email_token=AAR5KLC7MOMVE475VLZREQ34R2RG5A5CNFSNUABEM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UF4ZTIMRYGY3DEOJUGKTHEZLBONXW5KTTOVRHGY3SNFRGKZFFMV3GK3TUVRTG633UMVZF6Y3MNFRWW>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAR5KLBLQBRNWKPNM4XBKWT4R2RG5AVCNFSM6AAAAACW2CHXVWVHI2DSMVQWIX3LMV43ASLTON2WKOZUGEYTENZQGUZTMOA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

This proposal adds two new RPCs to enable node registration without requiring cloud API credentials on the node side: - NodeGetID: Returns only the node identifier, which can be obtained locally (e.g., from instance metadata) without cloud API access. - ControllerGetNodeInfo: Fetches node topology and max_volumes_per_node from the controller side, where cloud API credentials are available. New capabilities: - NodeServiceCapability.RPC.GET_ID - ControllerServiceCapability.RPC.GET_NODE_INFO Key design decisions: - If SP supports GET_ID, it MUST also support GET_NODE_INFO - published_volume_ids field allows accurate calculation of remaining attachable volumes by distinguishing CSI-managed vs non-CSI volumes - Both RPCs are marked as alpha This addresses security requirements where cloud API credentials should not be distributed to node components. Signed-off-by: 胡玮文 <huweiwen.hww@alibaba-inc.com>

huww98 · 2026-03-21T15:26:10Z

There's a lot of k8s stuff in this PR desc, probably better suited for a
KEP that this PR might reference.

Yes, I will propose this to Kubernetes if it looks fine on the CSI side.
I provide them here for additional context.

what alternatives were considered and why were
they not sufficient?

There are not many alternatives actually.

The instance metadata service (e.g., http://169.254.169.254 or http://100.100.100.200) typically only provides basic info like instance-id and zone. It does not provide disk attachment limits or supported disk categories, which require cloud API calls.

The information required by NodeGetInfo is not fetchable from node with the security requirement. We have considered adding a new private CRD and a new controller in Kubernetes. The controller watches for new node, fetches the required info and store them in a CR, then node watches for the CR to get the info. But this still have major drawbacks:

The permission to invoke Cloud API is replaced with a service account with CR read permission, which does not fundamentally solve the security issue.
Requires more network roundtrips
- This proposal: Cloud -> csi-controller -> external-attacher -> APIServer(CSINode) -> scheduler
- With CRD: Cloud -> new controller -> APIServer(CR) -> csi-node -> kubelet -> APIServer(CSINode) -> scheduler
Even harder considering the new capability to fetch max_volumes_per_node immediately after publish failure. Will need some way for node to tell controller to do an update immediately, which requires write permission, making things even worse.
This is Kubernetes specific, not resolving the security requirement for other CO.

The core insight is: max_volumes_per_node and accessible_topology is consumed by scheduler, not node. So this piece of infomation should not appear on node for optimal security, hence this proposal.

huww98 force-pushed the ControllerGetNodeInfo branch from a727d32 to a6f4f25 Compare March 21, 2026 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ControllerGetNodeInfo and NodeGetID RPCs (alpha)#603

Add ControllerGetNodeInfo and NodeGetID RPCs (alpha)#603
huww98 wants to merge 1 commit intocontainer-storage-interface:masterfrom
huww98:ControllerGetNodeInfo

huww98 commented Mar 21, 2026

Uh oh!

jdef commented Mar 21, 2026 via email

Uh oh!

huww98 commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

huww98 commented Mar 21, 2026

Proposal: Add ControllerGetNodeInfo and NodeGetID RPCs

Motivation

Proposed Changes

New RPCs

1. NodeGetID (Node Service)

2. ControllerGetNodeInfo (Controller Service)

New Capabilities

Kubernetes Integration

Why Integrate with external-attacher?

Backward Compatibility

Required Changes in Other Components

1. kubelet (kubernetes/kubernetes)

2. external-attacher (kubernetes-csi/external-attacher)

Security Benefits

Example Use Case (Alibaba Cloud)

Uh oh!

jdef commented Mar 21, 2026 via email

Uh oh!

huww98 commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `NodeGetID` (Node Service)

2. `ControllerGetNodeInfo` (Controller Service)

huww98 commented Mar 21, 2026 •

edited

Loading