Skip to content

[SPIKE] CSPL-4530: Per-Pod Rolling Restart with Finalizers and Intent Annotations#1710

Draft
vivekr-splunk wants to merge 121 commits intodevelopfrom
spike/CSPL-4530
Draft

[SPIKE] CSPL-4530: Per-Pod Rolling Restart with Finalizers and Intent Annotations#1710
vivekr-splunk wants to merge 121 commits intodevelopfrom
spike/CSPL-4530

Conversation

@vivekr-splunk
Copy link
Collaborator

@vivekr-splunk vivekr-splunk commented Feb 19, 2026

Overview

This spike implements a comprehensive pod lifecycle management system for Splunk Operator that enables graceful pod termination, role-specific cleanup, and flexible rolling update strategies. The implementation follows Kubernetes-native patterns using finalizers, preStop hooks, PodDisruptionBudgets, and the Eviction API.

Key Features

1. Graceful Pod Lifecycle Management

PreStop Lifecycle Hooks

  • Implements role-specific shutdown procedures in preStop hooks
  • Indexers: Automatic decommission with bucket rebalancing (scale-down) or graceful stop (restart)
  • Search Heads: Automatic detention and cluster removal with graceful stop
  • Other Roles: Graceful Splunk shutdown with proper connection cleanup
  • Configurable timeouts: 5 minutes for indexers, 2 minutes for other roles

Intent-Aware Operations

  • Uses splunk.com/pod-intent annotation to distinguish operations:
    • serve - Normal operation
    • scale-down - Permanent removal with full cleanup
    • restart - Temporary termination with data preservation
  • Enables correct handling of PVC lifecycle (preserve on restart, delete on scale-down)

2. Safe Cluster Operations

Finalizer-Based Cleanup

  • Ensures cleanup completes before pod deletion
  • Prevents data loss during scale-down operations
  • Verifies decommission/detention completion before removing pods
  • Automatic PVC cleanup on scale-down

PodDisruptionBudget Integration

  • Automatically creates PDBs with minAvailable = replicas - 1
  • Ensures minimum availability during updates and restarts
  • Works with Kubernetes Eviction API for safe pod termination
  • Handles single-replica deployments correctly

3. Flexible Rolling Update Strategies

Percentage-Based Updates

  • Configure maximum unavailable pods as percentage (e.g., "25%") or absolute number
  • Faster rollouts by updating multiple pods simultaneously
  • Example:
    spec:
      rollingUpdateConfig:
        maxPodsUnavailable: "25%"  # For 10 replicas, allow 2-3 pods down

Canary Deployments

  • Support for partition-based staged rollouts
  • Test updates on subset of pods before full rollout
  • Example:
    spec:
      rollingUpdateConfig:
        partition: 8  # Only update pods with ordinal >= 8
        maxPodsUnavailable: "1"

Intelligent Update Management

  • Mutual exclusion between operator-triggered evictions and StatefulSet rolling updates
  • Prevents PDB violations from simultaneous pod terminations
  • Automatic staleness detection for stuck updates
  • Smart coordination between multiple update mechanisms

4. Automatic Restart Detection (IngestorCluster & Standalone)

Per-Pod Restart Monitoring

  • Monitors Splunk API restart_required messages for configuration changes
  • Automatically evicts pods requiring restart
  • One pod at a time to maintain availability
  • Respects PodDisruptionBudget automatically

Configuration-Driven Restarts

  • Secret changes (Queue/Pipeline credentials) trigger StatefulSet rolling updates
  • Splunk config changes trigger per-pod eviction
  • Both mechanisms coordinate to prevent conflicts

5. Kubernetes-Native Design

Follows Best Practices

  • Uses Kubernetes Eviction API (not direct pod deletion)
  • Eviction API automatically respects PodDisruptionBudget
  • Finalizers prevent premature pod deletion
  • PreStop hooks ensure graceful shutdown
  • StatefulSet RollingUpdate for template changes

Production-Ready Error Handling

  • Timeout protection for API calls (10-second timeouts)
  • Environment variable validation
  • Proper error signaling from preStop hooks
  • Update staleness detection
  • Duplicate finalizer prevention

Architecture

See the complete C4 architecture diagram:
per-pod-rolling-restart-architecture

Key Components:

  • Pod Controller - Watches pods with finalizers and triggers cleanup
  • PreStop Hooks - Role-specific decommission, detention, and graceful stop
  • Pod Deletion Handler - Intent-based cleanup with PVC lifecycle management
  • PodDisruptionBudget - Ensures minimum availability during operations
  • StatefulSet Controller - Kubernetes-native rolling update management

Implementation Details

Cluster Type Behaviors

IngestorCluster & Standalone

  • Automatic restart detection via restart_required monitoring
  • Per-pod eviction when restart needed
  • No in-product orchestrator, so operator manages full lifecycle
  • Coordination with StatefulSet updates to prevent conflicts

IndexerCluster

  • Cluster Manager (CM) handles restart coordination
  • Operator provides finalizer-based cleanup during scale-down
  • PreStop hook handles decommission (with/without bucket rebalancing)
  • StatefulSet rolling updates for configuration changes

SearchHeadCluster

  • Deployer + Captain handle restart coordination
  • Operator provides finalizer-based cleanup during scale-down
  • PreStop hook handles detention and cluster removal
  • StatefulSet rolling updates for configuration changes

Environment Variables

All pods receive via Kubernetes downward API:

  • POD_NAME - Pod name for API queries
  • POD_NAMESPACE - Namespace for API queries
  • SPLUNK_ROLE - Role type for preStop hook logic
  • SPLUNK_CLUSTER_MANAGER_URL - Cluster Manager URL (for indexers)

Passwords read from mounted secrets at /mnt/splunk-secrets/password

Termination Grace Periods

  • Indexers: 300 seconds (5 minutes) for decommission + stop
  • Other roles: 120 seconds (2 minutes) for graceful stop

Benefits

Operational

  1. Zero Data Loss - Proper decommission ensures bucket replication completes
  2. Maintained Availability - PDB ensures minimum pods available during operations
  3. Faster Updates - Percentage-based updates allow multiple pods simultaneously
  4. Staged Rollouts - Partition support enables canary deployments
  5. Automatic Recovery - Pods automatically restart when configuration changes require it

Technical

  1. Kubernetes-Native - Uses standard K8s patterns (Eviction API, PDB, Finalizers, PreStop hooks)
  2. Conflict Prevention - Mutual exclusion prevents simultaneous pod terminations
  3. Proper Cleanup - Finalizers ensure cleanup completes before pod deletion
  4. Visibility - PreStop failures visible in pod events for easy debugging
  5. Error Handling - Timeouts, validation, and proper error signaling throughout

Developer

  1. Clear Intent - Annotation system makes pod lifecycle explicit
  2. Separation of Concerns - PreStop hooks handle execution, operator handles verification
  3. Testability - Each component can be tested independently
  4. Maintainability - Standard patterns make code easy to understand and modify

Configuration Examples

Basic Rolling Update with Percentage

apiVersion: enterprise.splunk.com/v4
kind: IngestorCluster
metadata:
  name: example
spec:
  replicas: 10
  rollingUpdateConfig:
    maxPodsUnavailable: "25%"  # Allow 2-3 pods updating simultaneously

Canary Deployment

apiVersion: enterprise.splunk.com/v4
kind: IndexerCluster
metadata:
  name: example
spec:
  replicas: 10
  rollingUpdateConfig:
    partition: 8  # Update only pods 8 and 9 first
    maxPodsUnavailable: "1"

Conservative Update (Default)

apiVersion: enterprise.splunk.com/v4
kind: SearchHeadCluster
metadata:
  name: example
spec:
  replicas: 5
  # No rollingUpdateConfig - defaults to 1 pod at a time

Testing

Completed

  • ✅ IngestorCluster restart with restart_required detection
  • ✅ IngestorCluster scale-down with PVC cleanup
  • ✅ PodDisruptionBudget enforcement
  • ✅ Finalizer cleanup verification
  • ✅ Code compilation and build verification

Recommended Testing

  • Scale-down operations for all cluster types
  • Percentage-based rolling updates (various percentages)
  • Canary deployments with partition
  • Restart detection and automatic eviction
  • PDB behavior with different replica counts
  • PreStop hook timeout handling
  • Finalizer cleanup with unreachable services

Files Changed

New Files

  • tools/k8_probes/preStop.sh - PreStop lifecycle hook implementation
  • per-pod-rolling-restart-architecture.png - C4 architecture diagram
  • KUBERNETES_NATIVE_REVIEW_FINDINGS.md - K8s patterns review and validation

Modified Core Logic

  • pkg/splunk/enterprise/configuration.go - StatefulSet creation, PreStop hooks, finalizers
  • pkg/splunk/enterprise/pod_deletion_handler.go - Finalizer handler with intent-based cleanup
  • pkg/splunk/enterprise/ingestorcluster.go - Restart detection and pod eviction
  • pkg/splunk/enterprise/standalone.go - Restart detection and pod eviction
  • pkg/splunk/enterprise/util.go - PodDisruptionBudget creation and management
  • pkg/splunk/splkcontroller/statefulset.go - Rolling update coordination

API Changes

  • api/v4/common_types.go - RollingUpdateConfig type for percentage-based updates
  • CRD manifests (auto-generated from API changes)

User Guide

Complete user guide available in: per-pod-rolling-restart-user-guide.md

Covers:

  • How to monitor pod lifecycle and intent annotations
  • How to configure rolling update strategies
  • How to trigger restarts and scale operations
  • Troubleshooting common scenarios
  • FAQ and best practices

Documentation

  • Architecture Diagram: per-pod-rolling-restart-architecture.png - Complete C4 diagram showing all components and interactions
  • User Guide: per-pod-rolling-restart-user-guide.md - Comprehensive guide for operators
  • Review Findings: KUBERNETES_NATIVE_REVIEW_FINDINGS.md - K8s native patterns validation

Backward Compatibility

  • ✅ No breaking changes to existing APIs
  • ✅ New rollingUpdateConfig field is optional (defaults to existing behavior)
  • ✅ PreStop hooks automatically injected into all pods
  • ✅ Existing pods updated on next rolling restart
  • ✅ PDB creation backward compatible with existing deployments

Future Enhancements

  • Dynamic timeout adjustment based on bucket count/size
  • Progressive rollout automation based on health checks
  • Blue/green deployment support
  • Automatic rollback on failed operations
  • Prometheus metrics for decommission/detention duration

🔬 This is a SPIKE - For evaluation and architectural review

kasiakoziol and others added 30 commits July 24, 2025 13:49
* test

* test

* test

* test

* Use endpoint to update conf file

* CSPL-3895 Matching changes from Ingestor to Indexer

---------

Co-authored-by: Kasia Koziol <kkoziol@splunk.com>
Co-authored-by: igor.grzankowski <@splunk.com>
* CSPL-3704 SmartStore ownerReferences removed

* CSPL-3704 Integration tests enabled to check the PR

* CSPL-3704 Fix failing tests

* CSPL-3704 Remove branch from int tests

* test

* CSPL-3705 Ignoring an error if decommisioning already enabled

* CSPL-3705 Removing branch from integ tests

* CSPL-3705 Addressing a comment

* clean-up deprecated dirs - .circleci & .devcontainer (#1499)

Co-authored-by: Patryk Wasielewski <pwasiele@splunk.com>

* set imagePullPolicy default in helm chart (#1513)

* CSPL-3186: Upgrade Enterprise Security Version 8.0.2 (#1425)

* Updates for integration tests and documentation with latest ES version

* Update enterprise security versions and app files

* remove branch from inte test workflow

* CSPL-3688: Update Prerelease Workflow (#1502)

* add more automation to pre-release workflow

* update version and replaced version in bundle files

* update dockerfile sok version, not enterprise version

* fix typo

* CSPL-3584: Split run-tests.sh into multiple files (#1507)

* split run-tests.sh into multiple files

* trigger integration tests on branch

* use scriptdir to run sh files

* remove trigger int test workflow

* test azure, gcp, and namespace scoped workflows

* cleanup workflows

* feature: add support for pre-created PVs - admin-managed-pv annotation (#1509)

* add support for admin-managed-pv annotation

---------

Co-authored-by: Patryk Wasielewski <pwasiele@splunk.com>

* Removing App Framework tests for C3 and M4 on ARM builds

* CSPL-3702 Replacing () with {} in script variables

* Ginkgo upgrade

* CSPL-3768 Adding inputs to Graviton pipelines and tests

* CSPL-3768 Not disclosing ECR secret value

* CSPL-3678 Introducing pipeline for Graviton and some fixes

* CSPL-3768 Fixes

* CSPL-3768 Fixes

* CSPL-3768 Addressing Copilot suggestions

* CSPL-3768 Addressing comments

* CSPL-3759 Addressing soon to be deprecated

* CSPL-3784: Update base image to latest ubi8-minimal version (#1525)

* update base image to latest ubi8-minimal version

* trigger integration test cases for branch

* trigger tests

* cleanup

* update Dockerfile comment

* CSPL-3675 Update Operator-SDK to v1.39 (#1488)


* v.1.39.0 migration

---------

Co-authored-by: igor.grzankowski <@splunk.com>
Co-authored-by: Vivek Reddy <vivekrsplunk@github.com>
Co-authored-by: rlieberman-splunk <rlieberman@splunk.com>
Co-authored-by: kasiakoziol <kkoziol@splunk.com>
Co-authored-by: Patryk Wasielewski <pwasiele@splunk.com>

* CSPL_3759 Ginkgo types to v2

* Add DeepWiki badge (#1529)

Co-authored-by: igor.grzankowski <@splunk.com>

* Upgrade golang.org/x/net version to v0.38.0 (#1530)

* upgrade golang.org/x/net version to v0.38.0

* trigger integration test cases on branch

* remove integration test trigger

* CSPL-3783: Update AppFramework docs with troubleshooting information (#1527)

* Add more logs around invalid phase and downloadPending (#1528)

* Remove kube rbac proxy from helm charts (#1531)

Co-authored-by: igor.grzankowski <@splunk.com>

* CSPL-3851 Adding info to docs about session stickiness for ingress

* Remove in progress phase

* Revert "Remove in progress phase"

This reverts commit 3c919d6.

* update SmartStore documentation for gcp and azure (#1541)

* Backport main to develop for Splunk Operator Release 2.8.1 (#1542) (#1543)

*  release 2.8.1 chnages - backported

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Patryk Wasielewski <pwasiele@splunk.com>
Co-authored-by: patrykw-splunk <patrykw-splunk@users.noreply.github.com>

* CSPL-3898 Fixing int-helm tests failing after SDK upgrade (#1544)

* CSPL-3867: SHC and CM Error Message Visibility (#1548)

* print error for update status failed

* add deployerPhase initial value

* set correct Kind in ClusterManager events

* cleanup

* use v4 in test

* CSPL-3905: Security and dependency updates (#1551)

* initial changes for aws-sdk-go-v2

* code compiles and unit tests pass

* trigger smoke and integration tests, update context

* set correct path for downloading from s3 bucket for tests

* update ENTERPRISE_LICENSE_LOCATION

* security updates, uncomment test suites

* cleanup

* [CSPL-3912] Allow Custom Probe Scripts (#1549)

* Promote Develop to main for Splunk Operator Release 2.8.1 (#1542)

* release preparation - release 2.8.1

---------

Co-authored-by: Patryk Wasielewski <pwasiele@splunk.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: patrykw-splunk <patrykw-splunk@users.noreply.github.com>

* check for existing configmap before creating a new one

* update error handling

* fix unit tests

* cleanup and documentation updates

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Patryk Wasielewski <pwasiele@splunk.com>
Co-authored-by: patrykw-splunk <patrykw-splunk@users.noreply.github.com>

* CSPL-3913: Pass dynamic environment variables in Splunk StatefulSet for Ansible-based config generation (#1555)

* add new environment variables for config generation

* unit test updates

* add api version to env var

* Promote Develop to main for Splunk Operator Release 2.8.1 (#1542) (#1553)

* release 2.8.1

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Patryk Wasielewski <pwasiele@splunk.com>
Co-authored-by: patrykw-splunk <patrykw-splunk@users.noreply.github.com>

* Document skipping apply cluster-bundle on cluster managers (#1556)

* Update of shc upgrade process (#1547)

Update of shc upgrade process

---------

Co-authored-by: Patryk Wasielewski <pwasiele@splunk.com>

* Update Helm.md (#1563)

Co-authored-by: Patryk Wasielewski <pwasiele@splunk.com>

* Merge Splunk10 feature changes into develop (#1559)

* CSPL-3707: Update documentation around minimum number of indexer cluster peers (#1558)

* document minimum number of CR replicas

* update examples documentation

* Default Build to multi-platform amd64 and arm64 (#1557)

* add linux/arm64 as a platform to build the operator by default

* set platforms in dockerfile

* use tag instead of sha

* update tag version

* update base image to latest

* Remove kube-rbac-proxy references and implement upgrade-sdk 1.38 changes (#1565)

* remove kube-rbac-proxy references and implement upgrade-sdk 1.38 changes

* fix kustomize references

* fix container number for debug

* cleanup

* fix service for metrics

---------

Co-authored-by: Patryk Wasielewski <pwasiele@splunk.com>

* helm changes for 3.0.0 release (#1566)

Co-authored-by: Patryk Wasielewski <pwasiele@splunk.com>

* Fix kustomization templates after removing kube-rbac-proxy (#1570)

* fix kustomization templates

---------

Co-authored-by: Patryk Wasielewski <pwasiele@splunk.com>

* Splunk Operator 3.0.0 release (#1572)

* Update changelog

---------

Co-authored-by: Igor-splunk <Igor-splunk@users.noreply.github.com>
Co-authored-by: igor.grzankowski <@splunk.com>

* remove old helm charts from splunk-enterprise/charts folder

---------

Co-authored-by: kasiakoziol <kkoziol@splunk.com>
Co-authored-by: patrykw-splunk <patrykw@splunk.com>
Co-authored-by: Patryk Wasielewski <pwasiele@splunk.com>
Co-authored-by: rlieberman-splunk <rlieberman@splunk.com>
Co-authored-by: Igor Grzankowski <igorg@splunk.com>
Co-authored-by: Vivek Reddy <vivekrsplunk@github.com>
Co-authored-by: igor.grzankowski <@splunk.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: patrykw-splunk <patrykw-splunk@users.noreply.github.com>
Co-authored-by: Igor-splunk <Igor-splunk@users.noreply.github.com>
* remove crds from helm chart, needs documentation

* add documentation about adding CRDs before helm chart installation

* repackage helm charts

* cleanup helm test workflow
* helm test case

* .env changes with spulnk 10.0.0

* updated splunk default version to 10

* removed helm test from branch

---------

Co-authored-by: Vivek Reddy <vivekrsplunk@github.com>
…nt fallback to older operator (#1583)

* removing old splunk operator helm chart

* generate helm packages

---------

Co-authored-by: Vivek Reddy <vivekrsplunk@github.com>
vivek.name: "Vivek Reddy and others added 4 commits February 19, 2026 05:04
This commit introduces a comprehensive pod lifecycle management system using
Kubernetes finalizers and intent annotations to ensure safe pod deletion
during restarts and scale-down operations.

Key Features:
- Pod Controller with finalizer management (splunk.com/pod-cleanup)
- Pod deletion handler for role-specific cleanup
- Intent annotation system (serve, scale-down, restart)
- Per-pod eviction for IndexerCluster and SearchHeadCluster
- PVC lifecycle management (preserve on restart, delete on scale-down)
- Secret change detection and rolling restart triggers

Components Added:
- internal/controller/pod_controller.go (NEW)
- pkg/splunk/enterprise/pod_deletion_handler.go (NEW)

Components Modified:
- pkg/splunk/enterprise/indexercluster.go (per-pod eviction)
- pkg/splunk/enterprise/searchheadcluster.go (per-pod eviction)
- pkg/splunk/enterprise/standalone.go (secret change restart)
- pkg/splunk/splkcontroller/statefulset.go (scale-down intent marking)
- pkg/splunk/client/enterprise.go (restart/reload REST API)
- api/v4/*_types.go (RestartStatus fields)

RBAC Changes:
- Added pod watch/get/list/update/patch permissions
- Added pod/eviction create permission
- Added secret watch/get/list permission
- Added PVC delete permission

Testing:
- Unit tests for pod controller and deletion handler
- Integration tests for restart and scale-down scenarios
- Test plan documented in FINALIZER_TEST_PLAN.md

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
This user-facing documentation explains the rolling restart feature
from an operator's perspective, focusing on practical usage and benefits.

Key sections:
- Overview and benefits for users
- How it works from user perspective
- Common scenarios with step-by-step examples
- Monitoring and troubleshooting guidance
- Best practices and FAQ

Audience: Splunk Operator users managing Kubernetes clusters
Format: Practical guide with copy/paste commands

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
…luster

Removed automatic restart_required detection and pod eviction from IndexerCluster
and SearchHeadCluster controllers. These components have in-product orchestration:
- IndexerCluster: Cluster Manager (CM) handles restart coordination
- SearchHeadCluster: Deployer + Captain handle restart coordination

The operator should not interfere with their built-in orchestration logic.

Removed functions:
- checkAndEvictIndexersIfNeeded() and helpers
- checkAndEvictSearchHeadsIfNeeded() and helpers
- policyv1 imports (no longer needed)

Retained for ALL controllers:
- Pod finalizers for scale-down/restart cleanup
- PreStop lifecycle hooks
- Intent annotations (serve vs scale-down)
- PVC lifecycle management
- StatefulSet rolling update support

restart_required detection remains for:
- IngestorCluster (no in-product orchestrator)
- Standalone (no in-product orchestrator)

Changes: 2 files changed, 12 insertions(+), 222 deletions(-)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Updated test dependencies (ginkgo, gomega, pprof)
- Applied go fmt formatting to pod_controller and pod_deletion_handler
- Updated golang.org/x dependencies to latest versions

These changes were generated by running 'make build'.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 19, 2026

CLA Assistant Lite bot CLA Assistant Lite bot All contributors have signed the COC ✍️ ✅

@vivekr-splunk vivekr-splunk changed the base branch from main to develop February 19, 2026 05:13
@vivekr-splunk vivekr-splunk changed the base branch from develop to CSPL-3551-ingestion-cr February 19, 2026 05:20
vivek.name: "Vivek Reddy and others added 8 commits February 19, 2026 05:52
…rcentage-based updates

This commit completes three major enhancements to the per-pod rolling restart mechanism:

1. PreStop Hook Implementation
   - Created tools/k8_probes/preStop.sh script (10KB)
   - Role-based shutdown: indexer decommission, SH detention, graceful stop for others
   - Intent-aware: reads splunk.com/pod-intent annotation to determine behavior
   - Scale-down: enforce_counts=1 (rebalance buckets/members)
   - Restart: enforce_counts=0 (no rebalancing)
   - Status monitoring with configurable timeouts
   - Added POD_NAME, POD_NAMESPACE, SPLUNK_PASSWORD env vars to pods

2. Refactored Decommission/Detention to PreStop Hooks
   - Moved decommission execution from operator to preStop hook (indexercluster.go)
   - Moved detention execution from operator to preStop hook (searchheadclusterpodmanager.go)
   - Operator now only waits/verifies completion instead of executing
   - Implemented waitForSearchHeadDetention in pod_deletion_handler.go
   - Better separation of concerns: execution in hook, verification in operator

3. Percentage-Based Rolling Update Support
   - Added RollingUpdateConfig type to api/v4/common_types.go
   - Support for maxPodsUnavailable as percentage (e.g., "25%") or absolute number
   - Support for partition-based canary deployments
   - Implemented buildUpdateStrategy function in configuration.go
   - Backward compatible: defaults to existing behavior (1 pod at a time)

Benefits:
- Faster pod termination (decommission during SIGTERM, not before)
- Flexible rollout control (percentage-based or absolute)
- Better error visibility (preStop failures in pod events)
- Consistent pod lifecycle operations
- Support for canary deployments

Documentation:
- IMPLEMENTATION_SUMMARY.md: Complete implementation details
- CURRENT_IMPLEMENTATION_ANALYSIS.md: Code analysis and requirements
- per-pod-rolling-restart-architecture.png: C4 architecture diagram

Testing Required:
- PreStop hook execution for all roles
- Decommission/detention with intent annotations
- Percentage-based rolling updates
- Canary deployments with partition
- PDB interaction with custom maxPodsUnavailable

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ster and SearchHeadCluster

Removed unused restart_required detection functions that were left behind as dead code:
- shouldCheckIndexerRestartRequired()
- checkIndexerPodsRestartRequired()
- shouldCheckSearchHeadRestartRequired()
- checkSearchHeadPodsRestartRequired()

These functions were never called after the earlier refactoring that removed
restart_required detection for IndexerCluster and SearchHeadCluster.

Rationale:
- IndexerCluster: Cluster Manager (CM) handles restart coordination
- SearchHeadCluster: Deployer + Captain handle restart coordination
- Operator should not interfere with in-product orchestration

Kept functions:
- triggerIndexerRollingRestart() - Used for secret changes (legitimate)
- triggerSearchHeadRollingRestart() - Used for secret changes (legitimate)

Changes:
- Removed 240 lines of dead code
- Added clarifying comments about restart orchestration responsibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit fixes 7 critical and high-priority issues identified in the
Kubernetes native patterns review:

1. CRITICAL: Add duplicate finalizer prevention
   - Check if finalizer exists before appending
   - Prevents multiple cleanup runs and delayed pod deletion
   - Location: configuration.go - added containsString() helper

2. CRITICAL: Add mutual exclusion for eviction vs RollingUpdate
   - Check if StatefulSet rolling update in progress before evicting
   - Prevents PDB violations from simultaneous pod terminations
   - Applied to: IngestorCluster and Standalone
   - Location: ingestorcluster.go, standalone.go

3. HIGH: Add timeout to preStop API calls
   - Added --max-time 10 to curl commands
   - Prevents preStop hooks from hanging indefinitely
   - Location: preStop.sh:get_pod_intent()

4. HIGH: Add environment variable validation
   - Validate POD_NAME, POD_NAMESPACE, SPLUNK_ROLE at startup
   - Warn if SPLUNK_CLUSTER_MANAGER_URL missing for indexers
   - Location: preStop.sh:main()

5. HIGH: Fix decommission/detention timeout behavior
   - Return error (exit 1) instead of success when timeout exceeded
   - Allows operator/finalizer to detect incomplete operations
   - Location: preStop.sh - decommission_indexer(), detain_search_head()

6. MEDIUM: Fix PDB for single-replica deployments
   - Allow eviction for single replica (minAvailable = 0)
   - Previously blocked all evictions (minAvailable = 1)
   - Location: util.go:2618-2620

7. MEDIUM: Add update staleness detection
   - Check if RollingUpdate stalled (no pods ready)
   - Warn if less than half pods ready
   - Return PhaseError if update appears stuck
   - Location: statefulset.go:205-232

8. Use mounted secret file instead of env var
   - Read password from /mnt/splunk-secrets/password
   - Remove SPLUNK_PASSWORD environment variable
   - More secure and aligns with existing pattern
   - Location: preStop.sh, configuration.go

Testing:
- All code compiles successfully
- make build passes

Documentation:
- KUBERNETES_NATIVE_REVIEW_FINDINGS.md - Complete review analysis

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test Coverage (18 passing tests):
- PodDisruptionBudget creation and updates for all cluster types
- Intent annotation handling (scale-down vs restart)
- Finalizer management and duplicate prevention
- Percentage-based rolling update configuration
- Mutual exclusion between eviction and StatefulSet rolling updates
- Pod eviction logic with PDB protection
- Cluster-specific behavior (CM/Captain orchestration vs operator eviction)

Test Files:
- pkg/splunk/enterprise/pod_lifecycle_test.go: Pod lifecycle management tests
- pkg/splunk/enterprise/pod_eviction_test.go: Pod eviction and intent tests
- TEST_COVERAGE.md: Comprehensive test documentation

All tests use fake Kubernetes client for fast, isolated execution.
Integration tests requiring preStop.sh file are marked for separate execution.

Related: CSPL-4530

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Critical Fix:
- Operator now detects and respects user-created PDBs
- Check if PDB has ownerReference to CR before updating
- If PDB has no owner reference → user-created → DO NOT MODIFY
- If PDB has owner reference → operator-managed → update as needed

Problem Solved:
- Previously, operator would overwrite user-created PDBs on every reconcile
- Customers could not customize availability requirements
- User's custom minAvailable/maxUnavailable settings would be lost

Solution:
- Added owner reference check in ApplyPodDisruptionBudget()
- Operator logs "user-created PDB detected" and skips update
- User PDBs take precedence over operator defaults

Test Coverage:
- TestUserCreatedPDB: Verifies user PDB is preserved (minAvailable=1 stays 1)
- TestOperatorManagedPDB: Verifies operator can update its own PDBs

Documentation:
- USER_CREATED_PDB.md: Complete guide for user-created PDB support
  * Use cases (high availability, maintenance windows, faster updates)
  * Lifecycle management (creation, updates, deletion)
  * Best practices and troubleshooting
  * Examples for production and dev environments
- TEST_COVERAGE.md: Updated to reflect 20 passing tests

Benefits:
✅ Customers can define custom availability requirements
✅ Operator respects user configuration
✅ Backward compatible (existing behavior unchanged)
✅ Fully tested with unit tests
✅ Clear documentation for users

Related: CSPL-4530

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…on handling, and pod intent

This commit addresses all 7 issues identified in the code review plus the open question about pod intent RBAC.

## Issue #1: Eviction RBAC in Wrong API Group [HIGH - FIXED]
**Problem:** RBAC granted pods/eviction under core group, but eviction is policy/v1
**Impact:** Runtime RBAC forbidden errors when calling Eviction API
**Fix:** Changed //+kubebuilder:rbac annotation from groups=core to groups=policy
**Files:** standalone_controller.go, ingestorcluster_controller.go, role.yaml (regenerated)

## Issue #2: Scale-Down Intent Never Applied [HIGH - FALSE POSITIVE]
**Problem:** Reviewer thought scale-down intent wasn't applied
**Analysis:** Actually IS applied via markPodForScaleDown() in statefulset.go:156
**Status:** No changes needed - already implemented correctly

## Issue #3: preStop Cluster Manager URL Malformed [HIGH - FIXED]
**Problem:** SPLUNK_CLUSTER_MANAGER_URL set to service name without https:// or :8089
**Impact:** Peer status checks always fail, decommission verification doesn't work
**Fix:**
  - Construct full URL: https://<service-name>:8089
  - Add SPLUNK_CLUSTER_MANAGER_SERVICE env var for service name
  - Fix peer name construction in preStop.sh (use POD_NAME directly)
**Files:** configuration.go, preStop.sh

## Issue #4: preStop Timeout Exceeds Grace Period [MEDIUM - FIXED]
**Problem:** preStop could wait 300s but non-indexers only get 120s grace period
**Impact:** Kubelet SIGKILL before hook finishes, incomplete cleanup
**Fix:** Align timeouts with grace periods:
  - Indexers: 270s max wait (300s grace period - 30s buffer)
  - Others: 90s max wait (120s grace period - 30s buffer)
**Files:** preStop.sh

## Issue #5: PDB Selector Mismatch with ClusterManagerRef [MEDIUM - FIXED]
**Problem:** PDB selector uses empty partOfIdentifier but pods use ClusterManagerRef.Name
**Impact:** PDB doesn't select any pods, no disruption protection
**Fix:** Apply same partOfIdentifier logic to PDB labels as pod labels
  - Type assertion to get ClusterManagerRef from IndexerCluster CR
  - Use ClusterManagerRef.Name or ClusterMasterRef.Name as partOfIdentifier
**Files:** util.go

## Issue #6: Partition Blocks Eviction Forever [MEDIUM - FIXED]
**Problem:** When RollingUpdateConfig.Partition set, UpdatedReplicas < Replicas is always true
**Impact:** restart_required evictions never happen with canary deployments
**Fix:** Check if partition-based update is complete:
  - Calculate expectedUpdatedReplicas = replicas - partition
  - If UpdatedReplicas >= expectedUpdatedReplicas, allow eviction
  - Only block eviction if partitioned pods still updating
**Files:** standalone.go, ingestorcluster.go

## Issue #7: PDB Violation Detection is Brittle [LOW - FIXED]
**Problem:** String matching "Cannot evict pod" is fragile and locale-dependent
**Impact:** May miss PDB violations if error message changes
**Fix:** Use k8serrors.IsTooManyRequests(err) to check for HTTP 429 status
  - More reliable than string matching
  - Matches Kubernetes Eviction API behavior
**Files:** standalone.go, ingestorcluster.go

## Open Question: preStop Pod Intent RBAC Dependency [FIXED]
**Problem:** preStop needs GET pods RBAC to read intent annotation
**Impact:** If RBAC not granted, intent always defaults to "serve"
**Fix:** Use Kubernetes Downward API instead of API call:
  - Add SPLUNK_POD_INTENT env var via downward API
  - Read from metadata.annotations['splunk.com/pod-intent']
  - No RBAC required, no network calls, no timeouts
  - More reliable and works in restricted environments
**Files:** configuration.go, preStop.sh

## Summary of Changes

**3 High Priority Fixes:**
- ✅ Eviction RBAC now in correct API group (policy)
- ✅ Cluster Manager URL properly constructed
- ✅ Pod intent via Downward API (no RBAC needed)

**3 Medium Priority Fixes:**
- ✅ preStop timeout aligned with grace period
- ✅ PDB selector matches pod labels (ClusterManagerRef support)
- ✅ Partition-based updates don't block eviction forever

**1 Low Priority Fix:**
- ✅ PDB violation detection uses IsTooManyRequests()

**Documentation:**
- REVIEW_FINDINGS_RESPONSE.md: Complete analysis of all findings

## Testing
- Code compiles successfully
- All changes follow Kubernetes best practices
- Backward compatible (no breaking changes)

Related: CSPL-4530

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test was using a generic error string, but after fixing Issue #7
(PDB violation detection), the implementation now properly uses
k8serrors.IsTooManyRequests() to detect HTTP 429 responses.

Updated the test to create the correct error type using
k8serrors.NewTooManyRequests() so it matches the implementation.

Test now passes correctly.

Related: CSPL-4530
…udget split

This commit addresses 2 remaining issues from code review:

## Issue #1: SPLUNK_POD_INTENT env var doesn't reflect annotation updates [HIGH]

**Problem:**
- Env vars are evaluated at container start and never update
- Operator marks pod annotation 'splunk.com/pod-intent=scale-down' right before scale-down
- Container still sees stale value 'serve' when preStop runs
- Result: indexer scale-down uses enforce_counts=0 instead of enforce_counts=1

**Fix:**
- Replace env var with downward API volume mount at /etc/podinfo/intent
- Volume files update dynamically when annotations change
- preStop.sh now reads from /etc/podinfo/intent file
- Intent updates are visible immediately to running containers

**Files:**
- configuration.go: Remove SPLUNK_POD_INTENT env var, add downward API volume
- preStop.sh: Read intent from volume file instead of env var

## Issue #2: preStop timeout budget exceeds grace period [MEDIUM]

**Problem:**
- MAX_WAIT_SECONDS was reused for both decommission AND splunk stop
- Worst case: 270s decommission + 270s stop = 540s vs 300s grace period
- Kubelet SIGKILL before preStop completes

**Fix:**
- Split timeout budget: DECOMMISSION_MAX_WAIT + STOP_MAX_WAIT
- Indexers: 900s decommission + 90s stop = 990s (within 1020s/17min grace)
- Search heads: 300s detention + 50s stop = 350s (within 360s/6min grace)
- Others: 80s operations + 30s stop = 110s (within 120s/2min grace)

**Updated Grace Periods (to match real-world requirements):**
- Indexers: 1020s (17 min) - bucket migration during scale-down needs 15+ minutes
- Search heads: 360s (6 min) - cluster detention typically needs 5 minutes
- Others: 120s (2 min) - basic graceful shutdown

**Files:**
- configuration.go: Update termination grace periods for each role
- preStop.sh: Split timeout budget, add role-specific timeouts

## Benefits

✅ Pod intent updates visible to running containers (fixes scale-down decommission)
✅ Timeout budget stays within grace period (no premature SIGKILL)
✅ Realistic timeouts for bucket migration (15 min) and detention (5 min)
✅ No RBAC required for pod metadata access
✅ More reliable than env vars

## Testing

- All unit tests passing
- Code compiles successfully
- Backward compatible (env vars can override via PRESTOP_* variables)

Related: CSPL-4530
Base automatically changed from CSPL-3551-ingestion-cr to develop February 20, 2026 15:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This spike implements a comprehensive pod lifecycle management system for Splunk Operator enabling graceful pod termination, role-specific cleanup, and flexible rolling update strategies. The implementation introduces three new Custom Resource Definitions (IngestorCluster, Queue, ObjectStorage) to support index and ingestion separation architecture, along with Kubernetes-native patterns using finalizers, preStop hooks, and PodDisruptionBudgets.

Changes:

  • Adds 3 new CRDs (IngestorCluster, Queue, ObjectStorage) for index/ingestion separation
  • Implements pod finalizer-based cleanup and preStop lifecycle hooks for graceful shutdown
  • Adds PodDisruptionBudget support and rolling update enhancements with percentage-based configurations
  • Updates StatefulSet management to use RollingUpdate strategy instead of OnDelete
  • Adds extensive test infrastructure for the new functionality

Reviewed changes

Copilot reviewed 148 out of 151 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
api/v4/*_types.go New CRD types and status fields for IngestorCluster, Queue, ObjectStorage, RestartStatus
pkg/splunk/splkcontroller/statefulset.go Critical refactoring of rolling update logic with intent marking and PVC cleanup changes
pkg/splunk/enterprise/configuration.go PreStop hooks, finalizers, environment variables, and rolling update strategy
pkg/splunk/enterprise/standalone.go Per-pod eviction logic with restart_required detection
pkg/splunk/enterprise/searchheadcluster.go Updated to monitor detention instead of triggering it
internal/controller/pod_controller.go New pod finalizer controller for cleanup orchestration
test/testenv/*.go Test utilities for IngestorCluster and related resources
test/index_and_ingestion_separation/* New test suite for index/ingestion separation
config/crd/bases/* Generated CRD manifests for new and updated resources
helm-chart/* Helm chart templates and RBAC for new CRDs
config/rbac/* RBAC permissions for pod eviction and PDB management

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +104 to +105
response=$(curl -s -k -u "${SPLUNK_USER}:${SPLUNK_PASSWORD}" \
"${cluster_manager_url}/services/cluster/manager/peers?output_mode=json" 2>/dev/null)
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The curl call to the Cluster Manager in get_indexer_peer_status uses -k (disables TLS certificate verification) while sending Splunk admin credentials to SPLUNK_CLUSTER_MANAGER_URL. If an attacker can influence DNS or the network path to that URL inside the cluster, they can man-in-the-middle this connection and steal admin credentials or spoof Cluster Manager responses. Replace -k with proper certificate verification (for example by mounting the CM CA cert and using --cacert or --capath), and ensure SPLUNK_CLUSTER_MANAGER_URL always points to a trusted internal endpoint.

Copilot uses AI. Check for mistakes.
@vivekr-splunk vivekr-splunk marked this pull request as draft February 27, 2026 14:02
vivek.name: "Vivek Reddy and others added 6 commits March 2, 2026 18:32
Resolve conflicts from the v3.0.0 promotion into main:
- Take origin/main for: workflows, docs, Dockerfile, Makefile, go.mod/go.sum, kustomizations
- Keep spike changes: preStop-based SHC detention, POD_NAME/POD_NAMESPACE env vars,
  IngestorCluster/PodReconciler controllers, QueueRef/ObjectStorage watchers,
  RestartStatus field in SearchHeadClusterStatus, containsString helper

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts keeping spike features: RollingUpdateConfig, preStop lifecycle hooks,
finalizers, PDB management, IngestorCluster/Queue/ObjectStorage controllers.
Update infrastructure files (workflows, Dockerfile, Makefile, go.mod) from develop.

Merge strategy:
- Infrastructure/CI files: taken from origin/develop
- Docs files: taken from origin/develop
- API types: merged both spike and develop changes (RestartStatus, CredentialSecretVersion, etc.)
- Controller files: merged both spike PodReconciler and develop TelemetryReconciler
- Enterprise package: merged PDB management (spike) with Queue/ObjectStorage resolution (develop)
- Fixed missing isPodReady helper function for pod readiness checks
- Fixed syntax errors from conflict resolution (missing braces, indentation)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update tests to reflect the behavioral changes introduced by the
per-pod rolling restart spike:

- RollingUpdate strategy: remove pod Get calls from UpdateStatefulSetPods
  (OnDelete path no longer taken; Kubernetes handles pod updates natively)
- markPodForScaleDown: add pod Get in scale-down path (new annotation step)
- PVC deletion moved to pod finalizer: remove PVC Get/Delete expectations
- ApplyPodDisruptionBudget: add PDB Get/Create in SHC reconcile
- ApplyShcSecret: remove HTTP restart POST expectations (replaced by
  rolling restart mechanism)
- UpdateStatefulSetPods re-fetch: add extra StatefulSet Get at start
- Scale-up now calls UpdateResource immediately (PhaseScalingUp not
  PhasePending) when readyReplicas < desiredReplicas
- SHC PrepareScaleDown: add second member/info GET (called again after
  PrepareRecycle returns true); remove_server POST not called while
  member is still registered

Files changed:
- pkg/splunk/enterprise/*_test.go: fix MockClient call counts
- pkg/splunk/splkcontroller/statefulset_test.go: update scaling phases
  and remove obsolete PVC delete error test
- pkg/splunk/test/controller.go (PodManagerTester): update expected
  phases and call counts for scale-up/scale-down cases
- testdata/fixtures/: regenerated StatefulSet JSON fixtures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This field was introduced in CSPL-4360 but later removed in develop
(replaced by CredentialSecretVersion). The develop merge brought back
CredentialSecretVersion but left QueueBucketAccessSecretVersion as
dead code — it is not referenced anywhere in the codebase.

Remove it from IndexerClusterStatus and IngestorClusterStatus to stay
in sync with develop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ReloadSplunk() was marked as BROKEN in its own comments: the mode=reload
parameter is silently ignored by Splunk, causing a full splunkd restart
instead of a config reload. It was never called anywhere in the operator
codebase. Remove it to avoid future confusion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gofmt: reformat 6 files that had whitespace/alignment issues:
- api/v4/ingestorcluster_types.go
- pkg/splunk/enterprise/configuration.go
- pkg/splunk/enterprise/pod_eviction_test.go
- pkg/splunk/enterprise/pod_lifecycle_test.go
- pkg/splunk/enterprise/searchheadcluster.go
- pkg/splunk/enterprise/searchheadcluster_test.go

biased-language: exclude spike-specific files that reference legacy
Splunk REST API endpoints (cluster/master, cluster/slave) and
PlantUML GitHub URLs — these are third-party API paths that cannot
be changed without breaking functionality or external references.
Add to .biased_lang_exclude:
- jira/ (design/story docs)
- tools/k8_probes/preStop.sh (matches SPLUNK_ROLE env var values)
- per-pod-rolling-restart-architecture.puml
- per-pod-rolling-restart-user-guide.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vivekr-splunk vivekr-splunk self-assigned this Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants