Skip to content

Conversation

@neddp
Copy link
Member

@neddp neddp commented Feb 2, 2026

Problem

On AWS Nitro-based instances with NVMe devices, the kernel's PCIe enumeration order is non-deterministic. This means:

  • /dev/nvme0n1 could be the root EBS volume OR instance storage
  • /dev/nvme1n1 could be instance storage OR the root EBS volume
  • The order varies between boots and instance types
  • There is no guaranteed ordering

Solution

Implemented runtime discovery to reliably identify instance storage by excluding EBS volumes.

Discovery Algorithm

  1. Glob all NVMe devices: /dev/nvme*n1
  2. Glob EBS symlinks: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_*
  3. Resolve each symlink to its target device
  4. Subtract EBS devices from all NVMe devices = instance storage
  5. Validate count matches CPI expectations
  6. Partition only the discovered instance storage devices

Why EBS Symlinks Are Reliable

AWS automatically creates persistent symlinks for all EBS volumes via udev rules:

/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol{volume_id}

Backwards Compatibility

Non-NVMe instances: No changes to behavior

  • Traditional Xen instances (/dev/xvdb, /dev/sdb) use CPI paths directly
  • Paravirtual instances work as before

This must be merged together with the CPI changes - cloudfoundry/bosh-aws-cpi-release#196


Pair @Ivaylogi98

Copy link
Contributor

@rkoster rkoster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I would have expected this logic to go into the https://github.com/cloudfoundry/bosh-agent/tree/main/infrastructure/devicepathresolver package.

p.logger.Debug(logTag, "Found NVMe devices: %v", allNvmeDevices)

// Identify EBS volumes via symlinks
ebsSymlinks, err := p.fs.Glob("/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should somehow be passed in via the agent config in the stemcell builder, because it is IaaS specific.

@github-project-automation github-project-automation bot moved this from Inbox to Waiting for Changes | Open for Contribution in Foundational Infrastructure Working Group Feb 3, 2026
@neddp
Copy link
Member Author

neddp commented Feb 3, 2026

In general I would have expected this logic to go into the https://github.com/cloudfoundry/bosh-agent/tree/main/infrastructure/devicepathresolver package.

Thank you for the review! That's was a big oversight on my end, I'll look into it.

@rkoster
Copy link
Contributor

rkoster commented Feb 3, 2026

No worries 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Waiting for Changes | Open for Contribution

Development

Successfully merging this pull request may close these issues.

2 participants