This file provides guidance to AI agents when working with code in this repository.
This is the LVM Operator repository - part of the Logical Volume Manager Storage (LVMS) solution for OpenShift. It contains:
- Kubernetes operator for managing logical volume storage
- Custom Resource Definitions (LVMCluster, LVMVolumeGroup, LVMVolumeGroupNodeStatus)
- Integration with TopoLVM CSI Driver for dynamic volume provisioning
- Volume Group Manager for node-level LVM operations
- End-to-end test suite for LVMS functionality
The primary API for users to configure logical volume storage. It defines:
- Device classes and device selectors for identifying available disks
- Volume group configurations and thin pool settings
- Storage class and volume snapshot class configurations
- TopoLVM CSI driver deployment parameters
The operator watches LVMCluster resources and reconciles the desired state by deploying TopoLVM components and managing volume groups across cluster nodes.
The vg-manager component runs as a DaemonSet on each node and is responsible for:
- Discovering available block devices matching the device selector
- Creating and managing LVM physical volumes and volume groups
- Managing thin pools for efficient storage provisioning
- Updating LVMVolumeGroupNodeStatus CRs with node-level status
A forked version of the upstream TopoLVM CSI driver (github.com/openshift/topolvm) that provides:
- Dynamic provisioning of logical volumes
- Topology-aware scheduling to place pods on nodes with sufficient storage
- Support for volume snapshots and clones (experimental, limited to single node)
- Thin provisioning capabilities for overcommitment
The operator uses controller-runtime and consists of several reconcilers:
LVMClustercontroller: Manages TopoLVM deployment and storage configurationLVMVolumeGroupcontroller: Monitors volume group status across nodes- Node-level controllers for device management and volume group operations
make build # Build the operator binary
make docker-build # Build container image
make docker-push # Push container image to registrymake generate # Generate code (deepcopy, etc.)
make manifests # Generate CRD and RBAC manifestsAfter modifying API types in api/, always run both make generate and make manifests to update generated code and manifests.
# Standard deployment (uses kustomize)
make deploy # Deploy operator to current cluster context
make undeploy # Remove operator from cluster
# OLM-based deployment
make bundle # Generate OLM bundle manifests
make bundle-build # Build bundle image
make bundle-push # Push bundle image to registry
make catalog-build # Build catalog image
make catalog-push # Push catalog image to registry
make deploy-with-olm # Deploy using Operator Lifecycle Manager
make undeploy-with-olm # Remove operator from cluster using Operator Lifecycle Managerexport IMAGE_REGISTRY=quay.io # Container registry (quay.io, docker.io, etc.)
export REGISTRY_NAMESPACE=myusername # Your registry namespace/username
export IMAGE_TAG=v4.18-dev # Image tag for built images
export OPERATOR_NAMESPACE=openshift-lvm-storage # Namespace where operator runs (default)make test # Run unit tests (requires Linux)
make docker-test # Run unit tests inside a Linux container (useful for macOS/Windows)
# E2E tests require a live cluster
make deploy-local # Build, push, and deploy local changes
make e2e # Run end-to-end tests against deployed operatorTo run e2e tests:
- Set
IMAGE_REGISTRYandREGISTRY_NAMESPACEenvironment variables - Run
make deploy-localto build and deploy your changes - Wait for operator pod to be running:
oc -n openshift-lvm-storage get pods - Run
make e2eto execute the test suite - Clean up with
make undeploy
make fmt # Run go fmt on all Go files
make vet # Run go vet
make verify # Verify go formatting and generated files- Edit the type definition in
api/v1alpha1/*_types.go - Add appropriate kubebuilder markers for validation, defaults, etc.
- Run
make generateto update generated code (deepcopy methods) - Run
make manifeststo regenerate CRD YAML files - Update or add controller logic to handle the new field
- Add unit tests for validation and controller behavior
- Add e2e tests if the change affects user workflows
- Update documentation in
docs/or code comments
Use kubebuilder markers to add validation to CRD fields:
// DeviceSelector specifies the criteria for selecting block devices.
// When not specified, all available and supported devices are discovered and added to the volume group.
// +optional
// +kubebuilder:validation:Optional
type DeviceSelector struct {
// paths is a list of device paths which should be used for creating volume groups.
// Paths must be absolute paths beginning with "/dev/".
// +optional
// +kubebuilder:validation:Optional
// +kubebuilder:validation:MinItems=1
Paths []string `json:"paths,omitempty"`
// optionalPaths is similar to paths but optional devices are allowed to be absent.
// +optional
// +kubebuilder:validation:Optional
OptionalPaths []string `json:"optionalPaths,omitempty"`
}All validation constraints must be documented in the field's comment.
- Current stable API version:
v1alpha1(despite the name, this is the production API) - New fields should generally be optional to maintain backward compatibility
- Breaking changes require careful consideration and migration support
Located alongside source files (e.g., internal/controllers/*_test.go). Use Ginkgo/Gomega for behavior-driven tests:
var _ = Describe("LVMCluster Controller", func() {
Context("When reconciling a new LVMCluster", func() {
It("Should create the TopoLVM deployment", func() {
// Test implementation
})
})
})Located in test/e2e/. Tests real cluster scenarios:
- Creating LVMCluster with various configurations
- Provisioning PVCs using LVMS storage class
- Testing volume snapshots and clones
- Validating device discovery and volume group creation
E2E tests require:
- A real Kubernetes/OpenShift cluster with available block devices
- Cluster admin permissions
- The operator deployed from your local build
The test/ directory contains integration tests that verify:
- Controller reconciliation logic
- RBAC permissions
- Webhook validation
- Metrics collection
This operator manages physical storage devices and performs destructive operations:
- Data Loss Risk: LVM operations can wipe disks. Always verify device selectors carefully.
- Idempotency: Controllers must handle partial states and retries safely.
- Node Operations: VG Manager runs privileged operations on nodes.
- Cleanup: Finalizers ensure proper cleanup, including removing volume groups.
The operator filters out unsafe devices automatically:
- Read-only devices
- Devices with existing filesystems (unless LVM2_member with no children)
- Devices with partitions labeled as boot/bios/reserved
- ROM devices and existing LVM partitions
- Loop devices in use by Kubernetes
See "Unsupported Device Types" in the README for complete filter list.
The VG Manager performs LVM commands on nodes:
vgcreate,vgextend: Manage volume groupslvcreate: Create thin pools- Device wiping with
wipefsbefore use ifforceWipeDevicesAndDestroyAllDatafield is enabled in the API
All operations are logged and errors are reported via CR status conditions.
For consistency, you can run builds and tests in containers:
# Using podman (default)
make docker-build
# Using docker
make docker-build IMAGE_BUILD_CMD=dockerThe operator image is based on UBI (Universal Base Image) for OpenShift compatibility.
# View operator pods
oc get pods -n openshift-lvm-storage
# Check operator logs
oc logs -n openshift-lvm-storage deployment/lvms-operator -c manager
# View LVMCluster status
oc get lvmcluster -A -o yaml
# Check volume group status on nodes
oc get lvmvolumegroupnodestatus -A- LVMCluster stuck in pending: Check operator logs and events
- No devices found: Verify device selector and check device filters
- VG Manager not running: Check DaemonSet status and node selectors
- PVC stuck pending: Ensure TopoLVM CSI driver is running, check storage class
See docs/troubleshooting.md for detailed troubleshooting guide.
LVMS exposes Prometheus metrics:
- TopoLVM metrics (volume capacity, provisioning duration, etc.)
- Controller-runtime metrics (reconciliation rate, queue depth, etc.)
Enable cluster monitoring:
oc patch namespace/openshift-lvm-storage -p '{"metadata": {"labels": {"openshift.io/cluster-monitoring": "true"}}}'Access metrics via OpenShift Console → Observe → Metrics.
- The operator follows semantic versioning
- Bundle versions align with OpenShift versions (4.x)
- CSV (ClusterServiceVersion) is generated for OLM deployments
- Image tags should include OpenShift version for production builds
Be aware of these limitations when developing:
- Single LVMCluster: Only one LVMCluster CR is supported per cluster
- No Multi-Node Snapshots: Snapshots/clones work only on the same node as source
- No LVM RAID: Use mdraid instead for redundancy
- Dynamic Discovery: Not recommended for production (use explicit device paths)
- No Upgrades from 4.10/4.11: Breaking API changes prevent upgrades
See CONTRIBUTING.md for:
- Code review process
- Commit message format
- PR submission guidelines
- Community standards
For the latest information about usage and installation of LVMS (Logical Volume Manager Storage) in OpenShift, use the official product documentation (https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html/storage/configuring-persistent-storage#persistent-storage-using-lvms).