-
Notifications
You must be signed in to change notification settings - Fork 130
Merge slog feature branch #1859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Igor-splunk
wants to merge
20
commits into
develop
Choose a base branch
from
feature/slog-as-default-logger
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
ab7a57f
Draft: CSPL-4347 (#1662)
Igor-splunk 29397cc
CSPL-4357: Secrets logs (#1711)
Igor-splunk e940ed2
Merge branch 'develop' into feature/slog-as-default-logger
kasiakoziol 4da56b7
Fixes
kasiakoziol e7f00eb
Add logs to controllers
Igor-splunk 611d2c7
Use slog.default
Igor-splunk 570e625
CSPL-4513 Events for Index & Ingestion separation (#1706)
kasiakoziol beeec17
Remove double logging
Igor-splunk b03f446
Add logs (#1755)
Igor-splunk 81f7810
Refine events (#1751)
Igor-splunk 3586758
merge
Igor-splunk f97cd81
Update lefrover calls
Igor-splunk 168d8b0
Update lefrover calls
Igor-splunk 125a85d
Remove build versions from logs
Igor-splunk f2b0c40
remove build info
Igor-splunk 4f1dd0e
Resolve merge conflicts
Igor-splunk db67a94
Fix error wraping
Igor-splunk 429ea9a
Remove leftover logs
Igor-splunk 3f82444
Standardize log messages.
Igor-splunk 29f342a
Use PascalCase for CRD name in logs
Igor-splunk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,277 @@ | ||
| --- | ||
| title: Logging and Events | ||
| parent: Reference | ||
| nav_order: 6 | ||
| --- | ||
|
|
||
| # Logging & Events | ||
|
|
||
| ## Logging | ||
|
|
||
| The operator uses Go's `log/slog` package for structured logging. The global logger is configured once at startup in `cmd/main.go` via `logging.SetupLogger()` and set as the default with `slog.SetDefault()`. | ||
|
|
||
| ### Getting a Logger | ||
|
|
||
| | Where | How | | ||
| |---|---| | ||
| | **Controller `Reconcile`** | `logger := slog.Default().With("controller", "Name", "name", req.Name, "namespace", req.Namespace, "reconcileID", controller.ReconcileIDFromContext(ctx))` | | ||
| | **Business logic (pkg)** | `logger := logging.FromContext(ctx).With("func", "FunctionName")` | | ||
|
|
||
| Controllers inject the logger into context so downstream code can retrieve it: | ||
|
|
||
| ```go | ||
| logger := slog.Default().With("controller", "Standalone", "name", req.Name, "namespace", req.Namespace, "reconcileID", controller.ReconcileIDFromContext(ctx)) | ||
| ctx = logging.WithLogger(ctx, logger) | ||
| ``` | ||
|
|
||
| The `reconcileID` is a unique identifier assigned by controller-runtime to each reconcile invocation. It is automatically propagated to all downstream log calls via context, making it possible to correlate every log line produced during a single reconcile pass — even across deeply nested function calls. | ||
|
|
||
| ### Log Levels | ||
|
|
||
| | Level | Use for | | ||
| |---|---| | ||
| | `Debug` | Verbose diagnostics (state dumps, intermediate values) | | ||
| | `Info` | Normal operations (reconcile start, requeue, status changes) | | ||
| | `Warn` | Recoverable issues (deprecated config, fallback behavior) | | ||
| | `Error` | Failures that need attention (API errors, broken invariants) | | ||
|
|
||
| Configure at runtime via `LOG_LEVEL` env var or `--log-level` flag. Values: `debug`, `info`, `warn`, `error`. | ||
|
|
||
| ### Writing Log Messages | ||
|
|
||
| **Message format:** short, lowercase, past tense or present participle. Describe *what happened*, not the function name. Use **PascalCase** for CRD type names in messages — `ClusterManager`, `IndexerCluster`, `SearchHeadCluster`, `LicenseManager`, `LicenseMaster`, `ClusterMaster`, `MonitoringConsole`, `Standalone`. | ||
|
|
||
| ```go | ||
| // Good | ||
| logger.InfoContext(ctx, "ClusterManager types not found in namespace", "error", err) | ||
| logger.InfoContext(ctx, "scaling up IndexerCluster", "replicas", replicas) | ||
|
|
||
| // Bad - inconsistent casing | ||
| logger.InfoContext(ctx, "clusterManager types not found") | ||
| logger.InfoContext(ctx, "scaling up indexer cluster") | ||
| ``` | ||
|
|
||
| ```go | ||
| // Good | ||
| logger.InfoContext(ctx, "statefulset updated", "replicas", replicas) | ||
| logger.ErrorContext(ctx, "failed to fetch secret", "error", err, "secret", name) | ||
|
|
||
| // Bad - don't restate the function name or use generic messages | ||
| logger.InfoContext(ctx, "ApplyStandalone") | ||
| logger.InfoContext(ctx, "error occurred") | ||
| ``` | ||
|
|
||
| **Rules:** | ||
|
|
||
| 1. Always use `*Context(ctx, ...)` variants (`InfoContext`, `ErrorContext`, etc.). | ||
| 2. Always pass `"error", err` as a key-value pair — never as the message string. | ||
| 3. Use consistent key names across the codebase. Key names **must** be `camelCase` — no spaces, no Title Case, no special characters like parentheses. | ||
|
|
||
| ```go | ||
| // Good | ||
| logger.InfoContext(ctx, "start", "crVersion", instance.GetResourceVersion()) | ||
| logger.InfoContext(ctx, "Requeued", "periodSeconds", int(result.RequeueAfter/time.Second)) | ||
| logger.InfoContext(ctx, "Getting Apps list", "bucket", client.BucketName) | ||
|
|
||
| // Bad - spaces and inconsistent casing in keys | ||
| logger.InfoContext(ctx, "start", "CR version", instance.GetResourceVersion()) | ||
| logger.InfoContext(ctx, "Requeued", "period(seconds)", int(result.RequeueAfter/time.Second)) | ||
| logger.InfoContext(ctx, "Getting Apps list", "AWS S3 Bucket", client.BucketName) | ||
| ``` | ||
|
|
||
| Common key names: | ||
|
|
||
| | Key | Meaning | | ||
| |---|---| | ||
| | `"error"` | The `error` value | | ||
| | `"name"` | CR or resource name | | ||
| | `"namespace"` | Kubernetes namespace | | ||
| | `"controller"` | Controller name | | ||
| | `"func"` | Function name (in pkg layer) | | ||
| | `"reconcileID"` | Unique ID per reconcile pass (set by controller) | | ||
| | `"replicas"` | Replica count | | ||
| | `"phase"` | CR phase | | ||
| | `"podName"` | Pod name | | ||
| | `"appName"` | App name | | ||
| | `"bucket"` | Storage bucket name (S3/GCS/Azure) | | ||
| | `"crVersion"` | CR resource version | | ||
| | `"periodSeconds"` | Requeue delay in seconds | | ||
|
|
||
| 4. **Don't log and return the same error.** controller-runtime automatically logs every non-nil error returned from `Reconcile()` via `log.Error(err, "Reconciler error")`. If you also log it in the business logic, the same error appears twice. Instead, wrap the error with context using `fmt.Errorf("description: %w", err)` so the single controller-runtime log line is descriptive. | ||
|
|
||
| ### Error Wrapping | ||
|
|
||
| The `Apply*` functions in `pkg/splunk/enterprise/` wrap validation errors before returning them. This adds context about which resource type failed, producing a clear chain when controller-runtime logs the final error: | ||
|
|
||
| ``` | ||
| validate clustermanager spec: license not accepted, please adjust SPLUNK_GENERAL_TERMS ... | ||
| ``` | ||
|
|
||
| **Pattern:** | ||
|
|
||
| ```go | ||
| err = validateClusterManagerSpec(ctx, client, cr) | ||
| if err != nil { | ||
| eventPublisher.Warning(ctx, EventReasonValidateSpecFailed, | ||
| fmt.Sprintf("Spec validation failed for %s — check operator logs", cr.GetName())) | ||
| return result, fmt.Errorf("validate clustermanager spec: %w", err) | ||
| } | ||
| ``` | ||
|
|
||
| **Rules:** | ||
|
|
||
| - Use `%w` (not `%v`) so callers can inspect the original error with `errors.Is` / `errors.Unwrap`. | ||
| - The prefix should be lowercase and describe the operation that failed, e.g. `"validate standalone spec"`, `"apply splunk config"`. | ||
| - Don't wrap and log the same error — pick one. Wrapping is preferred because it keeps context in the error chain and avoids double-logging. | ||
|
|
||
| ```go | ||
| // Good — wrap and return, no explicit log needed | ||
| return result, fmt.Errorf("validate standalone spec: %w", err) | ||
|
|
||
| // Bad — double-logged: once here, once by controller-runtime | ||
| scopedLog.Error(err, "Failed to validate standalone spec") | ||
| return result, err | ||
| ``` | ||
|
|
||
| **When writing tests** that assert on wrapped errors, use `strings.Contains` rather than an exact match so the test doesn't break if the wrapping prefix changes: | ||
|
|
||
| ```go | ||
| // Good — resilient to wrapping changes | ||
| if !strings.Contains(err.Error(), "license not accepted") { | ||
| t.Errorf("Unexpected error: %v", err) | ||
| } | ||
|
|
||
| // Bad — brittle, breaks if wrapping prefix is renamed | ||
| if err.Error() != "validate standalone spec: license not accepted, ..." { | ||
| t.Errorf("Unexpected error: %v", err) | ||
| } | ||
| ``` | ||
| 5. Sensitive data (passwords, tokens, secrets) is automatically redacted by the handler. | ||
|
|
||
| ### Configuration | ||
|
|
||
| | Env Var | Flag | Default | Description | | ||
| |---|---|---|---| | ||
| | `LOG_LEVEL` | `--log-level` | `info` | Minimum log level | | ||
| | `LOG_FORMAT` | `--log-format` | `json` | `json` or `text` | | ||
| | `LOG_ADD_SOURCE` | `--log-add-source` | `false` | Include source file/line (auto-enabled at debug) | | ||
|
|
||
| Flags take precedence over env vars. | ||
|
|
||
| --- | ||
|
|
||
| ## Kubernetes Events | ||
|
|
||
| Events are user-facing signals visible via `kubectl describe`. Use them for significant state changes that an operator user should see — not for internal debugging. | ||
|
|
||
| ### How It Works | ||
|
|
||
| 1. Controllers pass the event recorder into context: | ||
| ```go | ||
| ctx = context.WithValue(ctx, splcommon.EventRecorderKey, r.Recorder) | ||
| ``` | ||
| 2. Business logic retrieves a publisher: | ||
| ```go | ||
| eventPublisher := GetEventPublisher(ctx, cr) | ||
| ``` | ||
| 3. Emit events using constants from `pkg/splunk/enterprise/event_reasons.go`: | ||
| ```go | ||
| eventPublisher.Normal(ctx, EventReasonScaledUp, | ||
| fmt.Sprintf("Successfully scaled %s from %d to %d replicas", cr.GetName(), old, new)) | ||
| eventPublisher.Warning(ctx, EventReasonValidateSpecFailed, | ||
| fmt.Sprintf("Spec validation failed for %s — check operator logs", cr.GetName())) | ||
| ``` | ||
|
|
||
| ### Event Reason Constants | ||
|
|
||
| All event reasons are defined as constants in `pkg/splunk/enterprise/event_reasons.go`. **Never use string literals for event reasons** — always use the `EventReason*` constants. This ensures consistency across controllers and makes reasons searchable. | ||
|
|
||
| **Normal reasons:** | ||
|
|
||
| | Constant | Value | Use for | | ||
| |---|---|---| | ||
| | `EventReasonScaledUp` | `ScaledUp` | Successful scale up | | ||
| | `EventReasonScaledDown` | `ScaledDown` | Successful scale down | | ||
| | `EventReasonClusterInitialized` | `ClusterInitialized` | Cluster first becomes ready | | ||
| | `EventReasonClusterQuorumRestored` | `ClusterQuorumRestored` | Quorum recovered | | ||
| | `EventReasonPasswordSyncCompleted` | `PasswordSyncCompleted` | Secret sync finished | | ||
|
|
||
| **Warning reasons (common):** | ||
|
|
||
| | Constant | Value | Use for | | ||
| |---|---|---| | ||
| | `EventReasonValidateSpecFailed` | `ValidateSpecFailed` | CR spec validation errors | | ||
| | `EventReasonApplySplunkConfigFailed` | `ApplySplunkConfigFailed` | General config apply failures | | ||
| | `EventReasonAppFrameworkInitFailed` | `AppFrameworkInitFailed` | App framework init errors | | ||
| | `EventReasonApplyServiceFailed` | `ApplyServiceFailed` | Service create/update failures | | ||
| | `EventReasonStatefulSetFailed` | `StatefulSetFailed` | StatefulSet get failures | | ||
| | `EventReasonStatefulSetUpdateFailed` | `StatefulSetUpdateFailed` | StatefulSet update failures | | ||
| | `EventReasonDeleteFailed` | `DeleteFailed` | CR deletion failures | | ||
| | `EventReasonSecretMissing` | `SecretMissing` | Required secret not found | | ||
| | `EventReasonUpgradeCheckFailed` | `UpgradeCheckFailed` | Upgrade path validation errors | | ||
|
|
||
| See [`event_reasons.go`](https://github.com/splunk/splunk-operator/blob/main/pkg/splunk/enterprise/event_reasons.go) for the full list. | ||
|
|
||
| To add a new event reason: add a constant to `event_reasons.go`, then use it in the code. | ||
|
|
||
| ### When to Use Events vs Logs | ||
|
|
||
| | Scenario | Log | Event | | ||
| |---|---|---| | ||
| | Reconcile start/requeue | Yes | No | | ||
| | API call failed (transient) | Yes | No | | ||
| | Spec validation failed | Yes | Yes (Warning) | | ||
| | Successful scale up/down | Yes | Yes (Normal) | | ||
| | Phase transition | Yes | Yes (Normal) | | ||
| | Security-related failure | Yes | Yes (Warning) | | ||
|
|
||
| ### Writing Event Messages | ||
|
|
||
| **Reason:** always use an `EventReason*` constant — e.g. `EventReasonScaledUp`, `EventReasonValidateSpecFailed`. | ||
|
|
||
| **Message:** sentence case, user-friendly, include the resource name. **Never include raw `err.Error()` in events** — error details belong in logs only. Events are visible to any user with `kubectl describe` access and may leak internal paths, secret names, or stack traces. Instead, write a user-actionable summary and point to logs for details. | ||
|
|
||
| ```go | ||
| // Good — uses constant, event gives a user-actionable summary, log has full error | ||
| logger.ErrorContext(ctx, "smartstore volume key validation failed", "error", err) | ||
| eventPublisher.Warning(ctx, EventReasonRemoteVolumeKeyCheckFailed, | ||
| fmt.Sprintf("Remote volume key change check failed for %s — check operator logs", cr.GetName())) | ||
|
|
||
| eventPublisher.Normal(ctx, EventReasonScaledUp, | ||
| fmt.Sprintf("Successfully scaled %s from %d to %d replicas", cr.GetName(), oldCount, newCount)) | ||
|
|
||
| // Bad — string literal instead of constant | ||
| eventPublisher.Warning(ctx, "ValidationFailed", "...") | ||
|
|
||
| // Bad — leaking err.Error() into events | ||
| eventPublisher.Warning(ctx, EventReasonValidateSpecFailed, | ||
| fmt.Sprintf("Invalid smartstore config: %s", err.Error())) | ||
|
|
||
| // Bad — too vague, no context | ||
| eventPublisher.Warning(ctx, "Error", "something went wrong") | ||
| ``` | ||
|
|
||
| **Log + Event combo for errors:** | ||
|
|
||
| ```go | ||
| err = validateStandaloneSpec(ctx, client, cr) | ||
| if err != nil { | ||
| // Log: full error for operator developers / debugging | ||
| logger.ErrorContext(ctx, "spec validation failed", "error", err) | ||
|
|
||
| // Event: user-actionable summary visible in kubectl describe | ||
| eventPublisher.Warning(ctx, EventReasonValidateSpecFailed, | ||
| fmt.Sprintf("Spec validation failed for %s — check operator logs", cr.GetName())) | ||
|
|
||
| return result, err | ||
| } | ||
| ``` | ||
|
|
||
| **Rules:** | ||
|
|
||
| 1. Always use `EventReason*` constants — never string literals for event reasons. | ||
| 2. Use `Normal` for successful operations the user should know about. | ||
| 3. Use `Warning` for failures or degraded states that may need user intervention. | ||
| 4. Always include the resource name in the message. | ||
| 5. Never pass `err.Error()` into event messages — log the full error, keep events clean. | ||
| 6. Don't emit events for routine internal operations — keep the event stream actionable. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this to the "Reference" section of the github docs similar to the examples file: https://github.com/splunk/splunk-operator/blob/main/docs/Examples.md?plain=1#L4