lvm-operator/AGENTS.md at main · openshift/lvm-operator

This file provides guidance to AI agents when working with code in this repository.

This is the LVM Operator repository - part of the Logical Volume Manager Storage (LVMS) solution for OpenShift. It contains:

Kubernetes operator for managing logical volume storage
Custom Resource Definitions (LVMCluster, LVMVolumeGroup, LVMVolumeGroupNodeStatus)
Integration with TopoLVM CSI Driver for dynamic volume provisioning
Volume Group Manager for node-level LVM operations
End-to-end test suite for LVMS functionality

Key Architecture Components

LVMCluster Custom Resource

The primary API for users to configure logical volume storage. It defines:

Device classes and device selectors for identifying available disks
Volume group configurations and thin pool settings
Storage class and volume snapshot class configurations
TopoLVM CSI driver deployment parameters

The operator watches LVMCluster resources and reconciles the desired state by deploying TopoLVM components and managing volume groups across cluster nodes.

Volume Group Manager

The vg-manager component runs as a DaemonSet on each node and is responsible for:

Discovering available block devices matching the device selector
Creating and managing LVM physical volumes and volume groups
Managing thin pools for efficient storage provisioning
Updating LVMVolumeGroupNodeStatus CRs with node-level status

TopoLVM CSI Driver

A forked version of the upstream TopoLVM CSI driver (github.com/openshift/topolvm) that provides:

Dynamic provisioning of logical volumes
Topology-aware scheduling to place pods on nodes with sufficient storage
Support for volume snapshots and clones (experimental, limited to single node)
Thin provisioning capabilities for overcommitment

Controller Architecture

The operator uses controller-runtime and consists of several reconcilers:

LVMCluster controller: Manages TopoLVM deployment and storage configuration
LVMVolumeGroup controller: Monitors volume group status across nodes
Node-level controllers for device management and volume group operations

Common Development Commands

Building

make build              # Build the operator binary
make docker-build       # Build container image
make docker-push        # Push container image to registry

Code Generation

make generate           # Generate code (deepcopy, etc.)
make manifests          # Generate CRD and RBAC manifests

After modifying API types in api/, always run both make generate and make manifests to update generated code and manifests.

Deployment

# Standard deployment (uses kustomize)
make deploy             # Deploy operator to current cluster context
make undeploy           # Remove operator from cluster

# OLM-based deployment
make bundle             # Generate OLM bundle manifests
make bundle-build       # Build bundle image
make bundle-push        # Push bundle image to registry
make catalog-build      # Build catalog image
make catalog-push       # Push catalog image to registry
make deploy-with-olm    # Deploy using Operator Lifecycle Manager
make undeploy-with-olm    # Remove operator from cluster using Operator Lifecycle Manager

Environment Variables for Deployment

export IMAGE_REGISTRY=quay.io           # Container registry (quay.io, docker.io, etc.)
export REGISTRY_NAMESPACE=myusername    # Your registry namespace/username
export IMAGE_TAG=v4.18-dev              # Image tag for built images
export OPERATOR_NAMESPACE=openshift-lvm-storage  # Namespace where operator runs (default)

Testing

make test               # Run unit tests (requires Linux)
make docker-test        # Run unit tests inside a Linux container (useful for macOS/Windows)

# E2E tests require a live cluster
make deploy-local       # Build, push, and deploy local changes
make e2e                # Run end-to-end tests against deployed operator

To run e2e tests:

Set IMAGE_REGISTRY and REGISTRY_NAMESPACE environment variables
Run make deploy-local to build and deploy your changes
Wait for operator pod to be running: oc -n openshift-lvm-storage get pods
Run make e2e to execute the test suite
Clean up with make undeploy

Validation and Verification

make fmt                # Run go fmt on all Go files
make vet                # Run go vet
make verify             # Verify go formatting and generated files

Adding New APIs or Modifying Existing CRDs

Modifying LVMCluster or Other CRDs

Edit the type definition in api/v1alpha1/*_types.go
Add appropriate kubebuilder markers for validation, defaults, etc.
Run make generate to update generated code (deepcopy methods)
Run make manifests to regenerate CRD YAML files
Update or add controller logic to handle the new field
Add unit tests for validation and controller behavior
Add e2e tests if the change affects user workflows
Update documentation in docs/ or code comments

Validation Markers

Use kubebuilder markers to add validation to CRD fields:

// DeviceSelector specifies the criteria for selecting block devices.
// When not specified, all available and supported devices are discovered and added to the volume group.
// +optional
// +kubebuilder:validation:Optional
type DeviceSelector struct {
    // paths is a list of device paths which should be used for creating volume groups.
    // Paths must be absolute paths beginning with "/dev/".
    // +optional
    // +kubebuilder:validation:Optional
    // +kubebuilder:validation:MinItems=1
    Paths []string `json:"paths,omitempty"`

    // optionalPaths is similar to paths but optional devices are allowed to be absent.
    // +optional
    // +kubebuilder:validation:Optional
    OptionalPaths []string `json:"optionalPaths,omitempty"`
}

All validation constraints must be documented in the field's comment.

CRD Version Policy

Current stable API version: v1alpha1 (despite the name, this is the production API)
New fields should generally be optional to maintain backward compatibility
Breaking changes require careful consideration and migration support

Testing Framework

Unit Tests

Located alongside source files (e.g., internal/controllers/*_test.go). Use Ginkgo/Gomega for behavior-driven tests:

var _ = Describe("LVMCluster Controller", func() {
    Context("When reconciling a new LVMCluster", func() {
        It("Should create the TopoLVM deployment", func() {
            // Test implementation
        })
    })
})

E2E Tests

Located in test/e2e/. Tests real cluster scenarios:

Creating LVMCluster with various configurations
Provisioning PVCs using LVMS storage class
Testing volume snapshots and clones
Validating device discovery and volume group creation

E2E tests require:

A real Kubernetes/OpenShift cluster with available block devices
Cluster admin permissions
The operator deployed from your local build

Integration Tests (Migration of QE tests into this folder is still in progress)

The test/ directory contains integration tests that verify:

Controller reconciliation logic
RBAC permissions
Webhook validation
Metrics collection

Working with LVM and Storage

Important Safety Considerations

This operator manages physical storage devices and performs destructive operations:

Data Loss Risk: LVM operations can wipe disks. Always verify device selectors carefully.
Idempotency: Controllers must handle partial states and retries safely.
Node Operations: VG Manager runs privileged operations on nodes.
Cleanup: Finalizers ensure proper cleanup, including removing volume groups.

Device Selection

The operator filters out unsafe devices automatically:

Read-only devices
Devices with existing filesystems (unless LVM2_member with no children)
Devices with partitions labeled as boot/bios/reserved
ROM devices and existing LVM partitions
Loop devices in use by Kubernetes

See "Unsupported Device Types" in the README for complete filter list.

LVM Operations

The VG Manager performs LVM commands on nodes:

vgcreate, vgextend: Manage volume groups
lvcreate: Create thin pools
Device wiping with wipefs before use if forceWipeDevicesAndDestroyAllData field is enabled in the API

All operations are logged and errors are reported via CR status conditions.

Container-based Development

For consistency, you can run builds and tests in containers:

# Using podman (default)
make docker-build

# Using docker
make docker-build IMAGE_BUILD_CMD=docker

The operator image is based on UBI (Universal Base Image) for OpenShift compatibility.

Debugging and Troubleshooting

Checking Operator Status

# View operator pods
oc get pods -n openshift-lvm-storage

# Check operator logs
oc logs -n openshift-lvm-storage deployment/lvms-operator -c manager

# View LVMCluster status
oc get lvmcluster -A -o yaml

# Check volume group status on nodes
oc get lvmvolumegroupnodestatus -A

Common Issues

LVMCluster stuck in pending: Check operator logs and events
No devices found: Verify device selector and check device filters
VG Manager not running: Check DaemonSet status and node selectors
PVC stuck pending: Ensure TopoLVM CSI driver is running, check storage class

See docs/troubleshooting.md for detailed troubleshooting guide.

Monitoring and Metrics

LVMS exposes Prometheus metrics:

TopoLVM metrics (volume capacity, provisioning duration, etc.)
Controller-runtime metrics (reconciliation rate, queue depth, etc.)

Enable cluster monitoring:

oc patch namespace/openshift-lvm-storage -p '{"metadata": {"labels": {"openshift.io/cluster-monitoring": "true"}}}'

Access metrics via OpenShift Console → Observe → Metrics.

Release and Versioning

The operator follows semantic versioning
Bundle versions align with OpenShift versions (4.x)
CSV (ClusterServiceVersion) is generated for OLM deployments
Image tags should include OpenShift version for production builds

Known Limitations

Be aware of these limitations when developing:

Single LVMCluster: Only one LVMCluster CR is supported per cluster
No Multi-Node Snapshots: Snapshots/clones work only on the same node as source
No LVM RAID: Use mdraid instead for redundancy
Dynamic Discovery: Not recommended for production (use explicit device paths)
No Upgrades from 4.10/4.11: Breaking API changes prevent upgrades

Contributing

See CONTRIBUTING.md for:

Code review process
Commit message format
PR submission guidelines
Community standards

For the latest information about usage and installation of LVMS (Logical Volume Manager Storage) in OpenShift, use the official product documentation (https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html/storage/configuring-persistent-storage#persistent-storage-using-lvms).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Key Architecture Components

LVMCluster Custom Resource

Volume Group Manager

TopoLVM CSI Driver

Controller Architecture

Common Development Commands

Building

Code Generation

Deployment

Environment Variables for Deployment

Testing

Validation and Verification

Adding New APIs or Modifying Existing CRDs

Modifying LVMCluster or Other CRDs

Validation Markers

CRD Version Policy

Testing Framework

Unit Tests

E2E Tests

Integration Tests (Migration of QE tests into this folder is still in progress)

Working with LVM and Storage

Important Safety Considerations

Device Selection

LVM Operations

Container-based Development

Debugging and Troubleshooting

Checking Operator Status

Common Issues

Monitoring and Metrics

Release and Versioning

Known Limitations

Contributing

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Key Architecture Components

LVMCluster Custom Resource

Volume Group Manager

TopoLVM CSI Driver

Controller Architecture

Common Development Commands

Building

Code Generation

Deployment

Environment Variables for Deployment

Testing

Validation and Verification

Adding New APIs or Modifying Existing CRDs

Modifying LVMCluster or Other CRDs

Validation Markers

CRD Version Policy

Testing Framework

Unit Tests

E2E Tests

Integration Tests (Migration of QE tests into this folder is still in progress)

Working with LVM and Storage

Important Safety Considerations

Device Selection

LVM Operations

Container-based Development

Debugging and Troubleshooting

Checking Operator Status

Common Issues

Monitoring and Metrics

Release and Versioning

Known Limitations

Contributing