diff --git a/docs/architecture.md b/docs/architecture.md index e2ef5bd..398b388 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -95,3 +95,35 @@ Use the `update-version.sh` script to manage operator versions: Supported components: `strimzi`, `apicurio-registry`, `streamshub-console`, `prometheus-operator` The script updates the remote resource URLs in the relevant `kustomization.yaml` files to point to the new version's release artifacts. + +## Scaling the Kafka Cluster + +The default deployment uses the upstream Strimzi [`kafka-single-node.yaml`](https://github.com/strimzi/strimzi-kafka-operator/blob/0.51.0/examples/kafka/kafka-single-node.yaml) example with a single broker. +To scale to 3 replicas, edit `components/core/stack/kafka/kustomization.yaml` and change the resource URL to use Strimzi's [`kafka-with-dual-role-nodes.yaml`](https://github.com/strimzi/strimzi-kafka-operator/blob/0.51.0/examples/kafka/kafka-with-dual-role-nodes.yaml) example instead: + +```yaml +resources: + - https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/refs/tags/0.51.0/examples/kafka/kafka-with-dual-role-nodes.yaml + - namespace.yaml +``` + +This example is structurally identical to the single-node version (same KafkaNodePool name, listeners, and storage) but configures 3 replicas with the following replication settings: + +| Property | Value | Notes | +|----------|-------|-------| +| `offsets.topic.replication.factor` | 3 | | +| `transaction.state.log.replication.factor` | 3 | | +| `transaction.state.log.min.isr` | 2 | replicas − 1 | +| `default.replication.factor` | 3 | | +| `min.insync.replicas` | 2 | replicas − 1 | + +All existing patches (cluster rename, resource limits, entity operator config) apply without changes. + +For replica counts other than 1 or 3, start from either example and add patches for `spec.replicas` on the KafkaNodePool and the replication config values on the Kafka CR. + +**Considerations:** + +- **ISR values** should be `replicas − 1`, not equal to `replicas`. Setting `min.insync.replicas` equal to the replica count means a single broker failure blocks all writes +- **KRaft quorum** — the cluster uses KRaft (no ZooKeeper) with dual-role nodes (controller + broker). An odd number of replicas (3 or 5) is recommended for controller leader election +- **Resource usage** scales linearly — 3 replicas requires 3× the CPU and memory of a single node. You may need to increase cluster resources (e.g. `minikube start --cpus=8 --memory=12g`) +- **Local changes** require `LOCAL_DIR=.` when using the install script, which otherwise fetches manifests from GitHub. See [Install from a Local Checkout](installation.md#install-from-a-local-checkout) diff --git a/docs/installation.md b/docs/installation.md index c74e0ad..3be6cc2 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -57,7 +57,7 @@ If you prefer step-by-step control, the stack is installed in two phases. ### Phase 1 — Operators and CRDs ```shell -kubectl apply -k 'https://github.com/streamshub/developer-quickstart//overlays/core/base?ref=main' +kubectl apply --server-side --force-conflicts -k 'https://github.com/streamshub/developer-quickstart//overlays/core/base?ref=main' ``` Optionally, you can wait for the operators to become ready using the commands below: diff --git a/docs/overlays/core.md b/docs/overlays/core.md index 6f99a5f..1455b26 100644 --- a/docs/overlays/core.md +++ b/docs/overlays/core.md @@ -20,7 +20,7 @@ No `OVERLAY` variable is needed — the core overlay is used by default. ```shell # Phase 1 — Operators and CRDs -kubectl apply -k 'https://github.com/streamshub/developer-quickstart//overlays/core/base?ref=main' +kubectl apply --server-side --force-conflicts -k 'https://github.com/streamshub/developer-quickstart//overlays/core/base?ref=main' # Optionally, wait for the operators to be ready kubectl wait --for=condition=Available deployment/strimzi-cluster-operator -n strimzi --timeout=120s diff --git a/docs/overlays/metrics.md b/docs/overlays/metrics.md index 50be009..cd44115 100644 --- a/docs/overlays/metrics.md +++ b/docs/overlays/metrics.md @@ -19,7 +19,7 @@ If you prefer step-by-step control, the metrics overlay uses `overlays/metrics` ```shell # Phase 1 — Operators and CRDs (includes Prometheus Operator) -kubectl create -k 'https://github.com/streamshub/developer-quickstart//overlays/metrics/base?ref=main' +kubectl apply --server-side --force-conflicts -k 'https://github.com/streamshub/developer-quickstart//overlays/metrics/base?ref=main' # Optionally, wait for the operators to be ready kubectl wait --for=condition=Available deployment/prometheus-operator -n monitoring --timeout=120s @@ -107,6 +107,7 @@ curl -s http://localhost:9090/api/v1/targets | grep -o '"health":"up"' | wc -l Open the StreamsHub Console UI — Kafka cluster CPU and memory usage should show up straight away. However, other metrics such as those for topics will only show once topics have been created and messages are flowing through them. +On minikube, disk usage metrics are not available — see [Disk Usage Metrics Empty on Minikube](#disk-usage-metrics-empty-on-minikube) below. ## Troubleshooting @@ -135,3 +136,19 @@ kubectl get kafka/dev-cluster -n kafka -o jsonpath='{.spec.kafka.metricsConfig}' - PodMonitor label mismatch — Prometheus selects PodMonitors with `app: strimzi`; verify the label is present - Kafka metrics not enabled — the metrics overlay patches the Kafka CR to add `metricsConfig`; check that it was applied +### Disk Usage Metrics Empty on Minikube + +The Console UI shows CPU and memory graphs but the disk usage panel is empty: + +```shell +# Check if volume stats are available in Prometheus +kubectl exec -n monitoring prometheus-prometheus-0 -c prometheus -- \ + wget -qO- 'http://localhost:9090/api/v1/query?query=kubelet_volume_stats_used_bytes' \ + | grep -c '"result":\[\]' +# Output of 1 means no volume stats are present +``` + +**Cause:** + +- This is a minikube platform limitation, not a configuration issue. The `kubelet_volume_stats_*` and `container_fs_usage_bytes` metrics are not exposed by minikube's kubelet and cAdvisor, particularly with the Docker driver. On production clusters (OpenShift, EKS, GKE) these metrics are available and disk usage displays correctly +